Jonathan Goodall
University of Virginia | Associate Professor
Subject Areas: | Hydrology, Hydroinformatics, Water Resources |
Recent Activity
ABSTRACT:
This HydroShare resource provides raw spatial input data for executing RHESSys workflows at 1- Coweeta Subbasin 18, North Carolina, 2- Scotts Level Branch, Maryland, and 3- Spout Run, Virginia. Assessing the conventional data distribution approach, these spatial datasets were manually collected and shared at the file level through small files.
Additoinally, the GeoServer and TDS approach will only use the observation data from this resource.
ABSTRACT:
This HydroShare resource offers Jupyter Notebooks for the RHESSys modeling workflow, employing the conventional, GeoServer and THREDDS approaches across Coweeta Subbasin 18, NC; Spout Run, VA; and Scotts Level Branch, MD. For instructions on running the Jupyter Notebooks, please refer to the provided README file within this resource.
ABSTRACT:
This resource holds the data and models used by Ercan et al. (2020). The goal of their study was to quantify possible changes in the water balance of a 1373 km2 watershed in North Carolina, the Upper Neuse watershed, due to climate change. To accomplish this, they used a SWAT model to quantify possible changes in the water balance. They first analyzed sensitivity to determine their study area's most sensitive model parameters. Next, they calibrated and validated the SWAT model using daily streamflow records within the watershed. Finally, they used the SWAT model forced with different climate scenarios for baseline, mid-century, and end-century periods using five different downscaled General Circulation Models.
Ercan et al. (2020) did not formally publish the data or Model Instances (MI) used in their study, which is not uncommon. In this resource, we published their data and MIs as an example to demonstrate the design capabilities of Maghami et al. (2023)'s extensible schema for capturing environmental model metadata and show its implementation in HydroShare.
This resource includes the raw input data and preprocessing codes to prepare them as MIs for the SWAT model, four MIs, one Model Program (MP), and postprocessing codes Ercan et al. (2020) used summarize the model results as figures and tables. The contents are organized into the following seven folders:
1- InputDataAndPreprocessing
2- MI_1_SensitivityAnalysis
3- MI_2_CalibrationAndValidation
4- MI_4_ClimateModels_Historical_AfterCalibration
5- MI_5_ClimateModels_Future_AfterCalibration
6- MP
7- Postprocessing
A detailed explanation of the MIs and the MP is available in Maghami et al. (2023). It is important to note that our model metadata design treats the entire raw input data, custom preprocessing, and postprocessing tools (e.g., codes to process raw input data), along with the processed input data, as a single MI. However, since most of the raw input data, preprocessing, and postprocessing tools are common among the four MIs, to avoid repetition, we have organized them into dedicated folders. Each MI now specifically includes only the processed input data for the SWAT model.
ABSTRACT:
ATTENTION: All 9 workflows for RHESSys modeling are now condensed to one. This resource is kept only for archiving purpose.
This HydroShare resource offers Jupyter Notebooks for the RHESSys modeling workflow, employing the GeoServer approach at Spout Run, VA. For instructions on running the Jupyter Notebooks, please refer to the provided README file within this resource.
ABSTRACT:
This resource, configured for execution in connected JupyterHub compute platforms using the CyberGIS-Jupyter for Water (CJW) environment's supported High-Performance Computing (HPC) resources (Expanse or Virtual ROGER) through CyberGIS-Compute Service, helps the modelers to reproduce and build on the results from the VB study (Van Beusekom et al., 2022) as explained by Maghami et el. (2023).
For this purpose, four different Jupyter notebooks are developed and included in this resource which explore the paper goal for four example CAMELS site and a pre-selected period of 60-month simulation to demonstrate the capabilities of the notebooks. The first notebook processes the raw input data from CAMELS dataset to be used as input for SUMMA model. The second notebook utilizes the CJW environment's supported HPC resource (Expanse or Virtual ROGER) through CyberGIS-Compute Service to executes SUMMA model. This notebook uses the input data from first notebook using original and altered forcing, as per further described in the notebook. The third notebook utilizes the outputs from notebook 2 and visualizes the sensitivity of SUMMA model outputs using Kling-Gupta Efficiency (KGE). The fourth notebook, only developed for the HPC environment (and only currently working with Expanse HPC), enables transferring large data from HPC to the scientific cloud service (i.e., CJW) using Globus service integrated by CyberGIS-Compute in a reliable, high-performance and fast way. More information about each Jupyter notebook and a step-by-step instructions on how to run the notebooks can be found in the Readme.md fie included in this resource. Using these four notebooks, modelers can apply the methodology mentioned above to any (one to all) of the 671 CAMELS basins and simulation periods of their choice. As this resource uses HPC, it enables a high-speed running of simulations which makes it suitable for larger simulations (even as large as the entire 671 CAMELS sites and the whole 60-month simulation period used in the paper) practical and much faster than when no HPC is used.
Contact
(Log in to send email) |
All | 0 |
Collection | 0 |
Resource | 0 |
App Connector | 0 |
ABSTRACT:
This is raw environmental time series data stored in a sqlite database with a data schema loosely based off of ODM1.1. This scheme is shown in the data model figure included in the resource. The geographical location of these data is in the Hampton Roads region in South East Virginia. The variables of the time series are rainfall, tide, wind, and water table elevations. These data were processed and used as input for data-driven modeling for street flood severity prediction. The processing and modeling are described in this Journal of Hydrology Paper: https://doi.org/10.1016/j.jhydrol.2018.01.044.
Created: July 24, 2017, 4:03 p.m.
Authors: Jeff Sadler
ABSTRACT:
This resource aggregates several resources related to street flood severity modeling in Norfolk, Virginia USA. The resources include raw and pre-processed data, scripts used to perform the pre-processing, scripts used to train data-driven algorithms, and results from the models. The models used crowd-sourced street flood reports as target values and environmental data as input values. The resources in this aggregate resource are used to generate the results for this Journal of Hydrology paper: https://doi.org/10.1016/j.jhydrol.2018.01.044.
A diagram showing how these resources relate is shown in the "Resource workflow diagram for street flood severity modeling in Norfolk, VA 2010-2016" resource.
ABSTRACT:
SUMMA (Clark et al., 2015a;b;c) is a hydrologic modeling framework that can be used for the systematic analysis of alternative model conceptualizations with respect to flux parameterizations, spatial configurations, and numerical solution techniques. It can be used to configure a wide range of hydrological model alternatives and we anticipate that systematic model analysis will help researchers and practitioners understand reasons for inter-model differences in model behavior. When applied across a large sample of catchments, SUMMA may provide insights in the dominance of different physical processes and regional variability in the suitability of different modeling approaches. An important application of SUMMA is selecting specific physics options to reproduce the behavior of existing models – these applications of "model mimicry" can be used to define reference (benchmark) cases in structured model comparison experiments, and can help diagnose weaknesses of individual models in different hydroclimatic regimes.
SUMMA is built on a common set of conservation equations and a common numerical solver, which together constitute the “structural core” of the model. Different modeling approaches can then be implemented within the structural core, enabling a controlled and systematic analysis of alternative modeling options, and providing insight for future model development.
The important modeling features are:
The formulation of the conservation model equations is cleanly separated from their numerical solution;
Different model representations of physical processes (in particular, different flux parameterizations) can be used within a common set of conservation equations; and
The physical processes can be organized in different spatial configurations, including model elements of different shape and connectivity (e.g., nested multi-scale grids and HRUs).
ABSTRACT:
Simulations from Celia, 1990 (Water Resources Research)
Created: Dec. 21, 2017, 5:12 p.m.
Authors: Jeff Sadler
ABSTRACT:
This is tabular output data from two data-driven models used to predict flood severity, Poisson regression and Random Forest regression. Both outputs from the training and testing phases of the modeling are included in the resource. Additionally, results indicating the relative importance of each predictor variable in the Random Forest model are provided in the "rf_impo_out.csv" file. This work is described in the following paper published in the Journal of Hydrology: https://doi.org/10.1016/j.jhydrol.2018.01.044.
Created: Dec. 21, 2017, 5:14 p.m.
Authors: Jeff Sadler
ABSTRACT:
This is tabular input data originally used in two data-driven models (Poisson regression and Random Forest) for predicting flood severity. The inputs to the model (or predictor variables) are environmental conditions such as cumulative rainfall, high and low tides, etc. The outputs (or target variable) of the model is the number of flood reports per storm event. This data was used in work that is described in the following paper published in the Journal of Hydrology: https://doi.org/10.1016/j.jhydrol.2018.01.044.
Created: Dec. 21, 2017, 5:16 p.m.
Authors: Jeff Sadler
ABSTRACT:
This is a script written in the R programming language. The script is used to train and apply two data-driven models, Random Forest and Poisson regression. The target variable is the number of flood reports per storm event in Norfolk, VA USA. The input variables for the models are environmental conditions on an event time scale (or daily if no flood reports were made for an event). This script was used to produce results published in a paper in the Journal of Hydrology: https://doi.org/10.1016/j.jhydrol.2018.01.044.
---
Original run configurations:
R version = 3.3.3
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Packages used:
'randomForest' (version 4.6-12)
'caret' (version 6.0-73)
Created: Jan. 2, 2018, 9:20 p.m.
Authors: Jeff Sadler
ABSTRACT:
Street flooding reports made by mostly City of Norfolk staff from 2010-2016. The coordinate system used for the X and Y coordinates is "Virginia state plane, south zone, feet (NAD83)." These data were processed and used as target values for street data-driven flood prediction severity modeling. This modeling is described in this Journal of Hydrology paper: https://doi.org/10.1016/j.jhydrol.2018.01.044.
ABSTRACT:
Script and accompanying notebook written in Python 2.7 for processing street flood reports made by City of Norfolk staff. The output data from this script were used as target values for street data-driven flood prediction severity modeling. This modeling is described in this Journal of Hydrology paper: https://doi.org/10.1016/j.jhydrol.2018.01.044.
Created: Jan. 2, 2018, 9:24 p.m.
Authors: Jeff Sadler
ABSTRACT:
Processed street flooding data from street flood reports made by City of Norfolk, VA staff 2010-2016. These data were used as target values for street data-driven flood prediction severity modeling. This modeling is described in this Journal of Hydrology paper: https://doi.org/10.1016/j.jhydrol.2018.01.044.
Created: Jan. 2, 2018, 9:33 p.m.
Authors: Jeff Sadler
ABSTRACT:
Script and accompanying notebook written in Python 2.7 for combining flood report data (output) and environmental data (input) into a format suitable for a data-driven model. These data used as target values for street data-driven flood prediction severity modeling for Norfolk, VA 2010-2016. This modeling is described in this Journal of Hydrology paper: https://doi.org/10.1016/j.jhydrol.2018.01.044.
Created: Jan. 2, 2018, 9:37 p.m.
Authors: Jeff Sadler
ABSTRACT:
Daily observations data for rainfall, wind, tide, and water table levels. These variables are more fully defined in the raw source data. These data are used as input for data-driven prediction of street flood severity in Norfolk, VA 2010-2016. This modeling is described in this Journal of Hydrology paper: https://doi.org/10.1016/j.jhydrol.2018.01.044.
Created: Jan. 2, 2018, 9:47 p.m.
Authors: Jeff Sadler
ABSTRACT:
Script and accompanying ipython notebook written in Python 2.7 for aggregating sub-daily environmental data (rainfall, tide, wind, groundwater) to a daily timescale. The input data are from Norfolk, Virginia. Several different methods of aggregation are used including averages and maximums. The processed/aggregated data are combined with street flood report data to be used in data-driven, predictive modeling. The script in this resource was used in the analysis described in this Journal of Hydrology paper: https://doi.org/10.1016/j.jhydrol.2018.01.044.
Created: Feb. 7, 2018, 7:40 p.m.
Authors: Jeff Sadler
ABSTRACT:
Diagram depicting the relationship between 10 different HydroShare resources used to produce results for data-driven street flood severity modeling done for Norfolk, VA for 2010-2016. The analysis is described in this Journal of Hydrology paper: https://doi.org/10.1016/j.jhydrol.2018.01.044.
ABSTRACT:
This resource contains the sciunit package for reproducing The total ET for the Ball Berry stomatal resistance methods from Clark et al., 2015:
Created: Aug. 8, 2018, 2:52 p.m.
Authors: YOUNGDON CHOI · Jonathan Goodall · Jeff Sadler · Andrew Bennett · Bart Nijssen · Anthony Michael Castronova · Martyn Clark · David Tarboton
ABSTRACT:
This presentation was given at the iEMSs conference in Fort Collins, CO in June 2018. http://iemss2018.engr.colostate.edu/
The Structure for Unifying Multiple Modeling Alternatives (SUMMA) is a hydrologic modeling framework that allows hydrologists to systematically test alternative model conceptualizations. The objective of this project is to create a Python library for wrapping the SUMMA modeling framework called pySUMMA. Using this library, hydrologists can create Python scripts that document the alternative model conceptualizations tested within different experiments. To this end, pySUMMA provides an object-oriented means for updating SUMMA model configurations, executing SUMMA model runs, and visualizing SUMMA model outputs. This work is part of the HydroShare web-based hydrologic information system operated by the Consortium of Universities for the Advancement of Hydrologic Science Inc. (CUAHSI) that seeks to make hydrologic data and models discoverable and shareable online. Creating pySUMMA is a first step toward the longer-term goal of creating an interactive SUMMA-based modeling system by combining HydroShare.org, JupyterHub, and High Performance Computing (HPC) resources. In the current version of HydroShare, different data and model resources can be uploaded, shared, and published. This current development will result in a tighter integration between the SUMMA modeling process and HydroShare.org with the goal of making hydrologic models more open, reusable, and reproducible. Ultimately, SUMMA serves as a use case for modeling in HydroShare that advances a general approach for leveraging JupyterHub and HPC that can be repeated for other modeling systems.
Created: Aug. 8, 2018, 6:20 p.m.
Authors: Bakinam Tarik Essawy · Jonathan Goodall · Daniel Voce · W Zell · Mohamed Morsy · Jeff Sadler · Zhihao Yuan · Tanu Malik
ABSTRACT:
This presentation was given at the iEMSs conference held in Fort Collins, CO in June 2018. http://iemss2018.engr.colostate.edu/
Reproducibility of computational workflows is an important challenge that calls for open and reusable code and data, well-documented workflows, and controlled environments that allow others to verify published findings. HydroShare (http://www.hydroshare.org) and GeoTrust (http://geotrusthub.org/), two new cyberinfrastructure tools under active development, can be used to improve reproducibility in computational hydrology. HydroShare is a web-based system for sharing hydrologic data and model resources. HydroShare allows hydrologists to upload model input data resources, add detailed hydrologic-specific metadata to these resources, and use the data directly within HydroShare for collaborative modeling using tools like JupyterHub. GeoTrust provides tools for scientists to efficiently reproduce, track and share geoscience applications by building ‘sciunits,’ which are efficient, lightweight, self-contained packages of computational experiments that can be guaranteed to repeat or reproduce regardless of deployment challenges. We will present a use case example focusing on a workflow that uses the MODFLOW model to demonstrate how HydroShare and GeoTrust can be integrated to easily and efficiently reproduce computational workflows. This use case example automates pre-processing of model inputs, model execution, and post-processing of model output. This work demonstrates how the integration of HydroShare and Geotrust ensures the logical and physical preservation of computation workflows and that reproducibility can be achieved by replicating the original sciunit, modifying it to produce a new sciunit and finally, preserving and sharing the newly created sciunit by using HydroShare's JupyterHub.
ABSTRACT:
This resource demonstrates the steps to package the workflow analysis using the Sciunit tool.
These steps are
1.) create a new sciunit “MyAnalysis.” This will create a virtual directory, which will include the captured execution of the computational workflow with all the dependencies and provenance metadata associated with it;
2.) open the “MyAnalysis” sciunit to begin working in the desired sciunit;
3) execute the code required to be packaged as a virtual environment in order to repeat the analysis;
4.) place the packaged sciunit on HydroShare as a digital resource, and
5.) test the runnability of the package by executing the sciunit on the CUAHSI HydroShare JupyterHub app linked to HydroShare and configured to open and execute scripts acting on content from Resources in HydroShare (Note: To run a sciunit again requires the Sciunit tool, which is installed on CUAHSI HydroShare JupyterHub).
This resource contains the sciunit package for reproducing The total ET for the Ball Berry and Jarvis stomatal resistance methods from Clark et al., 2015:
Created: Nov. 4, 2020, 8:57 p.m.
Authors: Naoki Mizukami · Wood, Andrew
ABSTRACT:
This resource was created using CAMELS (https://ral.ucar.edu/solutions/products/camels) `TIME SERIES NLDAS forced model output` from 1980 to 2018.
The original NLDAS (North American Land Data Assimilation System) hourly forcing data was created by NOAA by 0.125 x 0.125 degree grid.
Through creating CAMELS datasets, hourly forcing data were reaggregated to 671 basins in the USA.
In this study, we merged all CAMELS forcing data into one NetCDF file to take advantage of OPeNDAP (http://hyrax.hydroshare.org/opendap/hyrax/) in HydroShare.
Currently, using SUMMA CAMELS notebooks (https://www.hydroshare.org/resource/ac54c804641b40e2b33c746336a7517e/), we can extract forcing data to simulate SUMMA in the particular basins in 671 basins of CAMELS datasets.
Created: Dec. 21, 2020, 8:45 a.m.
Authors: Tarboton, David
ABSTRACT:
This is an example of Geoscience Use Case 4: Height Above the Nearest Drainage (HAND) of "Improving Reproducibility of Geoscience Models with Sciunit" in the Geological Society of America publication. In this resource, there are two notebooks: 1) HANDWorkFlow.ipynb and 2) HAND_Sciunit.ipynb.
Using these two notebooks, we demonstrate the capabilities of Sciunit to encapsulate the HAND TauDEM workflow and create a Sciunit Container, and evaluate differences in HAND due to changing the contributing area threshold used to map the drainage network. During computation of the drainage network, a minimum contributing area threshold is used to identify the channel beginning. With a lower threshold value, the density of the resulting drainage network increases. Scientists running this experiment might be interested in finding out how the threshold makes a difference in the execution and result of the HAND model.
The first notebook demonstrates the general procedure to calculate HAND (Height above the Nearest Drainage) using TauDEM (https://hydrology.usu.edu/taudem/taudem5/).
Then using the second notebook we demonstrate how to create a Sciunit container for HAND Workflow and compare two Sciunit containers (5000 vs 50000 thresholds) using `diff` command.
Created: Jan. 2, 2021, 2:59 p.m.
Authors: Choi, Young-Don
ABSTRACT:
These are example application notebooks to simulate SUMMA using CAMELS datasets.
There are three steps: (STEP-1) Create SUMMA input, (STEP-2) Execute SUMMA, (STEP-3) Visualize SUMMA output
Based on this example, users can change the HRU ID and simulation periods to analyze 671 basins in CAMELS datasets.
(STEP-1) A_1_camels_make_input.ipynb
- The first notebook creates SUMMA input using Camels dataset using `summa_camels_hydroshare.zip` in this resource and OpenDAP(https://www.hydroshare.org/resource/a28685d2dd584fe5885fc368cb76ff2a/).
(STEP-2) B_1_camels_pysumma_default_prob.ipynb, B_2_camels_pysumma_lhs_prob.ipynb, B_3_camels_pysumma_config_prob.ipynb, and
B_4_camels_pysumma_lhs_config_prob.ipynb
- These four notebooks execute SUMMA considering four different parameters and parameterization combinations
(STEP-3) C_1_camels_analyze_output_default_prob.ipynb, C_2_camels_analyze_output_lhs_prob.ipynb, C_3_camels_analyze_output_config_prob.ipynb,
C_4_camels_analyze_output_lhs_config_prob.ipynb
- The final four notebooks visualize SUMMA output of B-1, B-2, B-3, and B-4 notebooks.
Created: Feb. 22, 2021, 7:34 p.m.
Authors: Choi, Young-Don
ABSTRACT:
This notebook is created to support SUMMA general application workflows using CAMELS forcing, watershed attributes, and streamflow observation.
CAMELS datasets cover 671 basins across the USA, so users can apply SUMMA models in 671 basins.
Created: March 4, 2021, 5:49 p.m.
Authors: Choi, Young-Don · Van Beusekom, Ashley · Li, Zhiyu (Drew) · Nijssen, Bart · Hay, Lauren · Bennett, Andrew · Tarboton, David · Maghami, Iman · Goodall, Jonathan · Clark, Martyn P.
ABSTRACT:
This resource, configured for execution in connected JupyterHub compute platforms, helps the modelers to reproduce and build on the results from the paper (Van Beusekom et al., 2021). For this purpose, three different Jupyter notebooks are developed and included in this resource which explore the paper goal for one example CAMELS site and a pre-selected period of 18-month simulation to demonstrate the capabilities of the notebooks. The first notebook processes the raw input data from CAMELS dataset to be used as input for SUMMA model. The second notebook executes SUMMA model using the input data from first notebook using original and altered forcing, as per further described in the notebook. Finally, the third notebook utilizes the outputs from notebook 2 and visualizes the sensitivity of SUMMA model outputs using Kling-Gupta Efficiency (KGE). More information about each Jupyter notebook and a step-by-step instructions on how to run the notebooks can be found in the Readme.md fie included in this resource. Using these three notebooks, modelers can apply the methodology mentioned above to any (one to all) of the 671 CAMELS basins and simulation periods of their choice.
Created: March 21, 2021, 4:11 a.m.
Authors: Maghami, Iman · Goodall, Jonathan · Victor A. L. Sobral · Morsy, Mohamed · John C. Lach
ABSTRACT:
The goal of this Resource is to estimate the fraction of stream length in the contiguous United States covered by dense tree canopy described in greater detail in the research paper Maghami et al. (2021). To find out more information about this Resource and the steps to reproduce this geospatial analysis, please refer to the readme file.
Created: April 6, 2021, 3:10 a.m.
Authors: Choi, Young-Don · Maghami, Iman · Van Beusekom, Ashley · Li, Zhiyu/Drew · Nijssen, Bart · Hay, Lauren · Bennett, Andrew · Tarboton, David · Goodall, Jonathan · Clark, Martyn P. · Wang, Shaowen
ABSTRACT:
The overall goal of this collection is to use the basic strategy and architecture presented by Choi et al. (2021) to make components of a modern and complex hydrologic modeling study (VB study; Van Beusekom et al., 2022) easier to reproduce. The design and implemention of the developed cyberinfrastructure to achieve this goal are fully explained by Maghami et al. (2023).
In VB study, hydrological outputs from the SUMMA model for the 671 CAMELS catchments across the contiguous United States (CONUS) and a 60-month actual simulation period are investigated to understand their dependence on input forcing behavior across CONUS. VB study layes out a simple methodology that can be applied to understand the relative importance of seven model forcings (precipitation rate, air temperature, longwave radiation, specific humidity, shortwave radiation, wind speed, and air pressure).
Choi et al. (2021) integrated three components through seamless data transfers for a reproducible research: (1) online data and model repositories; (2) computational environments leveraging containerization and self-documented computational notebooks; and (3) Application Programming Interfaces (APIs) that provide programmatic control of complex computational models.
Therefore, Maghami et al. (2023), integrated the following three components through seamless data transfers to make components of a modern and complex hydrologic study (VB study) easier to reproduce:
(1) HydroShare as online data and model repository;
(2) CyberGIS-Jupyter for Water for self-documented computational notebooks as computational environment (with and without HPC notebooks);
(3) pySUMMA as Application Programming Interfaces (APIs) that provide programmatic control of complex computational models.
This collection includes three resources:
1- First resource, provides the entire NLDAS forcing datasets used in the VB study.
2- Second resource provides an end-to-end workflow of CAMELS basin modeling with SUMMA for the paper simulations configured for execution in connected JupyterHub compute platforms. This resource is well-suited for a smaller scale exploration: it is preconfigured to explore one example CAMELS site and a period of 60-month actual simulation to demonstrate the capabilities of the notebooks. Users still can change the CAMELS site, the number of sites being explored or even the simulation period. To quickly assess the capabilities of the notebooks in this resource, we even recommend running an actual simulation period as short as 12 months.
3- Third resource, however, uses HPC (High-Performance Computing) through CyberGIS Computing Service. The HPC enables a high-speed running of simulations which makes it suitable for running larger simulations (even as large as the entire 671 CAMELS sites and the whole 60-month actual simulation period used in the VB study) practical and much faster than the second resource. This resource is preconfigured to explore four example CAMELS site and a period of 60-month actual simulation to only demonstrate the capabilities of the notebooks. Users still can change the CAMELS sites, the number of sites being explored or even the simulation period.
Greater details can be found in each resource.
Created: April 7, 2021, 4:54 a.m.
Authors: Choi, Young-Don
ABSTRACT:
This HydroShare resource is an example to demonstrate the vPICO presentations in EGU General Assembly 2021 (https://meetingorganizer.copernicus.org/EGU21/session/40092#vPICO_presentations).
- Session: EOS5.3 session - The evolving open-science landscape in geosciences: open data, software, publications, and community initiatives
- Title: An Approach for Open and Reproducible Hydrological Modeling using Sciunit and HydroShare
Using this notebook, you can test how to create an immutable and interoperable Sciunit Container for open and reproducible hydrological modeling.
You can start using "NB_01_An_Approach_for_Open_and_Reproducible_Hydrological_Modeling_using_Sciunit_and_HydroShare.ipynb" notebook in "CyberGIS-Jupyter for water" after clicking "Open with...". in Right-Above.
Created: April 10, 2021, 1:01 a.m.
Authors: Choi, Young-Don
ABSTRACT:
This HydroShare resource was created to share large extent spatial (LES) datasets in Maryland on GeoServer (https://geoserver.hydroshare.org/geoserver/web/wicket/bookmarkable/org.geoserver.web.demo.MapPreviewPage) and THREDDS (https://thredds.hydroshare.org/thredds/catalog/hydroshare/resources/catalog.html).
Users can access the uploaded LES datasets on HydroShare-GeoServer and THREDDS using this HS resource id. This resource was created using HS 2.
Then, through the RHESSys workflows, users can subset LES datasets using OWSLib and xarray.
Created: April 25, 2021, 12:26 a.m.
Authors: Choi, Young-Don
ABSTRACT:
This HydroShare resource was created to share large extent spatial (LES) datasets in Virginia on GeoServer (https://geoserver.hydroshare.org/geoserver/web/wicket/bookmarkable/org.geoserver.web.demo.MapPreviewPage) and THREDDS (https://thredds.hydroshare.org/thredds/catalog/hydroshare/resources/catalog.html).
Users can access the uploaded LES datasets on HydroShare-GeoServer and THREDDS using this HS resource id. This resource was created using HS 2.
Then, through the RHESSys workflows, users can subset LES datasets using OWSLib and xarray.
Created: April 25, 2021, 12:27 a.m.
Authors: Choi, Young-Don
ABSTRACT:
This HydroShare resource was created to share large extent spatial (LES) datasets in North Carolina on GeoServer (https://geoserver.hydroshare.org/geoserver/web/wicket/bookmarkable/org.geoserver.web.demo.MapPreviewPage) and THREDDS (https://thredds.hydroshare.org/thredds/catalog/hydroshare/resources/catalog.html).
Users can access the uploaded LES datasets on HydroShare-GeoServer and THREDDS using this HS resource id. This resource was created using HS 2.
Then, through the RHESSys workflows, users can subset LES datasets using OWSLib and xarray.
Created: April 29, 2021, 5:10 p.m.
Authors: Choi, Young-Don · Goodall, Jonathan · Maghami, Iman · Ahmad, Raza · Malik, Tanu · Band, Lawrence · Li, Zhiyu/Drew · Wang, Shaowen · Tarboton, David
ABSTRACT:
This HydroShare resource provides the Jupyter Notebooks created for the study "An Approach for Creating Immutable and Interoperable End-to-End Hydrological Modeling Computational Workflows" led by researcher Young-Don Choi submitted to the 2021 EarthCube Annual meeting, Notebook Sessions.
To find out the instructions on how to run Jupyter Notebooks, please refer to the README file provided in this resource.
For the sake of completeness, the abstract for the study submitted to the EarthCube session is mentioned below:
"Reproducibility is a fundamental requirement to advance science. Creating reproducible hydrological models that include all required data, software, and workflows, however, is often burdensome and requires significant work. Computational hydrology is a rapidly advancing field with fast-evolving technologies to support increasingly complex computational hydrologic modeling. The growing model complexity in terms of variety of software and cyberinfrastructure capabilities makes achieving computational reproducibility extremely challenging. Through recent reproducibility research, there have been efforts to integrate three components: 1) (meta)data, 2) computational environments, and 3) workflows. However, each component is still separate, and researchers must interoperate between these three components. These separations make verifying end-to-end reproducibility challenging. Sciunit was developed to assist scientists, who are not programming experts, with encapsulating these three components into a container to enable reproducibility in an immutable form. However, there were still limitations to support interoperable computational environments and apply end-to-end solutions, which are an ultimate goal of reproducible hydrological modeling. Therefore, the objective of this research is to advance the existing Sciunit capabilities to not only support immutable, but also interoperable computational environments and apply an end-to-end modeling workflow using the Regional Hydro-Ecologic Simulation System (RHESSys) hydrologic model as an example. First, we create an end-to-end workflow for RHESSys using pyRHESSys on the CyberGIS-Jupyter for Water platform. Second, we encapsulate the aforementioned three components and create configurations that include lists of encapsulated dependencies using Sciunit. Third, we create two HydroShare resources, one for immutable reproducibility evaluation using Sciunit and the other for interoperable reproducibility evaluation using library configurations created by Sciunit. Finally, we evaluate the reproducibility of Sciunit in MyBinder, which is a different computational environment, using these two resources. This research presents a detailed example of a user-centric case study demonstrating the application of an open and interoperable containerization approach from a hydrologic modeler’s perspective."
Created: May 7, 2021, 11:04 p.m.
Authors: Choi, Young-Don
ABSTRACT:
ATTENTION: All 3 model instances are now presnted in one resouce as 3 model instance aggregations. This resource is kept only for archiving purpose.
This HydroShare resource provides raw spatial input data for executing RHESSys workflows at Coweeta Subbasin18, North Carolina. Assessing the conventional data distribution approach, these spatial datasets were manually collected and shared at the file level through small files.
Created: May 7, 2021, 11:06 p.m.
Authors: Choi, Young-Don
ABSTRACT:
ATTENTION: All 3 model instances are now presnted in one resouce as 3 model instance aggregations. This resource is kept only for archiving purpose.
This HydroShare resource provides raw spatial input data for executing RHESSys workflows at Scotts Level Branch, Maryland. Assessing the conventional data distribution approach, these spatial datasets were manually collected and shared at the file level through small files.
Created: May 7, 2021, 11:07 p.m.
Authors: Choi, Young-Don
ABSTRACT:
ATTENTION: All 3 model instances are now presnted in one resouce as 3 model instance aggregations. This resource is kept only for archiving purpose.
This HydroShare resource provides raw spatial input data for executing RHESSys workflows at Spout Run, Virginia. Assessing the conventional data distribution approach, these spatial datasets were manually collected and shared at the file level through small files.
Created: May 13, 2021, 10:38 p.m.
Authors: Choi, Young-Don
ABSTRACT:
We implemented automated workflows using Jupyter notebooks for each state. The GIS processing, crucial for merging, extracting, and projecting GeoTIFF data, was performed using ArcPy—a Python package for geographic data analysis, conversion, and management within ArcGIS (Toms, 2015). After generating state-scale LES(Large Extent Spatial) datasets in GeoTIFF format, we utilized the xarray and rioxarray Python packages to convert GeoTIFF to NetCDF. Xarray is a Python package to work with multi-dimensional arrays and rioxarray is rasterio xarray extension. Rasterio is a Python library to read and write GeoTIFF and other raster formats. Xarray facilitated data manipulation and metadata addition in the NetCDF file, while rioxarray was used to save GeoTIFF as NetCDF. These procedures resulted in the creation of three composite HydroShare resources (HS 2, HS 3 and HS 4) for sharing state-scale LES datasets. Notably, due to licensing constraints with ArcGIS Pro, a commercial GIS software, the Jupyter notebook development was undertaken on a Windows OS.
Created: May 13, 2021, 10:40 p.m.
Authors: Choi, Young-Don
ABSTRACT:
ATTENTION: All 9 workflows for RHESSys modeling are now condensed to one. This resource is kept only for archiving purpose.
This HydroShare resource offers Jupyter Notebooks for the RHESSys modeling workflow, employing the conventional approach at Coweeta Subbasin18, NC. For instructions on running the Jupyter Notebooks, please refer to the provided README file within this resource.
Created: May 13, 2021, 10:41 p.m.
Authors: Choi, Young-Don
ABSTRACT:
ATTENTION: All 9 workflows for RHESSys modeling are now condensed to one. This resource is kept only for archiving purpose.
This HydroShare resource offers Jupyter Notebooks for the RHESSys modeling workflow, employing the GeoServer approach at Coweeta Subbasin18, NC. For instructions on running the Jupyter Notebooks, please refer to the provided README file within this resource.
Created: May 13, 2021, 10:41 p.m.
Authors: Choi, Young-Don
ABSTRACT:
ATTENTION: All 9 workflows for RHESSys modeling are now condensed to one. This resource is kept only for archiving purpose.
This HydroShare resource offers Jupyter Notebooks for the RHESSys modeling workflow, employing the THREDDS approach at Coweeta Subbasin18, NC. For instructions on running the Jupyter Notebooks, please refer to the provided README file within this resource.
Created: May 13, 2021, 10:42 p.m.
Authors: Choi, Young-Don
ABSTRACT:
ATTENTION: All 9 workflows for RHESSys modeling are now condensed to one. This resource is kept only for archiving purpose.
This HydroShare resource offers Jupyter Notebooks for the RHESSys modeling workflow, employing the conventional approach at Scotts Level Branch, MD. For instructions on running the Jupyter Notebooks, please refer to the provided README file within this resource.
Created: May 13, 2021, 10:43 p.m.
Authors: Choi, Young-Don
ABSTRACT:
ATTENTION: All 9 workflows for RHESSys modeling are now condensed to one. This resource is kept only for archiving purpose.
This HydroShare resource offers Jupyter Notebooks for the RHESSys modeling workflow, employing the GeoServer approach at Scotts Level Branch, MD. For instructions on running the Jupyter Notebooks, please refer to the provided README file within this resource
Created: May 13, 2021, 10:43 p.m.
Authors: Choi, Young-Don
ABSTRACT:
ATTENTION: All 9 workflows for RHESSys modeling are now condensed to one. This resource is kept only for archiving purpose.
This HydroShare resource offers Jupyter Notebooks for the RHESSys modeling workflow, employing the THREDDS approach at Scotts Level Branch, MD. For instructions on running the Jupyter Notebooks, please refer to the provided README file within this resource.
Created: May 13, 2021, 10:47 p.m.
Authors: Choi, Young-Don
ABSTRACT:
ATTENTION: All 9 workflows for RHESSys modeling are now condensed to one. This resource is kept only for archiving purpose.
This HydroShare resource offers Jupyter Notebooks for the RHESSys modeling workflow, employing the conventional approach at Spout Run, VA. For instructions on running the Jupyter Notebooks, please refer to the provided README file within this resource.
Created: May 13, 2021, 10:51 p.m.
Authors: Choi, Young-Don
ABSTRACT:
ATTENTION: All 9 workflows for RHESSys modeling are now condensed to one. This resource is kept only for archiving purpose.
This HydroShare resource offers Jupyter Notebooks for the RHESSys modeling workflow, employing the THREDDS approach at Spout Run, VA. For instructions on running the Jupyter Notebooks, please refer to the provided README file within this resource.
Created: May 13, 2021, 10:52 p.m.
Authors: Choi, Young-Don
ABSTRACT:
This HydroShare resource aims to assess data consistency among two server-side methods (GeoServer and THREDDS Data Server) and the conventional data distribution approach (manually collecting and sharing at file-level). The evaluation spans three different-sized watersheds: Coweeta subbasin18, Scotts Level Branch, and Spout Run with 10, 30, and 60 m DEM resolutions, respectively. The workflow for resulting nine case studies, derived from the combination of three methods and three watersheds, are presented in one HydroShare resource (HS 7), yielding a total of nine RHESSys daily streamflow output files.
Within this resource, we include these nine output files and provide three Jupyter notebooks for conducting evaluations. Each notebook is dedicated to a specific watershed and focuses on the three methods, facilitating a comprehensive analysis of data consistency.
Created: May 14, 2021, 2:59 a.m.
Authors: Choi, Young-Don · Goodall, Jonathan · Band, Lawrence · Maghami, Iman · Lin, Laurence · Saby, Linnea · Li, Zhiyu/Drew · Wang, Shaowen · Calloway, Chris · Seul, Martin · Ames, Dan · Tarboton, David · Yi, Hong
ABSTRACT:
Ensuring the reproducibility of scientific studies is crucial for advancing research, with effective data management serving as a cornerstone for achieving this goal. Ensuring the reproducibility of scientific studies is crucial for advancing research, with effective data management serving as a cornerstone for achieving this goal. In hydrologic and environmental modeling, spatial data is used as model input and sharing of this spatial data is a main step in the data management process. However, by focusing only on sharing data at the file level through small files rather than providing the ability to Find, Access, Interoperate with, and directly Reuse subsets of larger datasets, online data repositories are missing an opportunity to foster more reproducible science. This leads to challenges when accommodating large files which benefit from consistent data quality and seamless geographic extent. To utilize the benefits of large datasets, the objective of this study is therefore to create and test an approach for exposing large extent spatial (LES) datasets to support catchment-scale hydrologic modeling needs. GeoServer and THREDDS Data Server connected to HydroShare were used to provide seamless access to LES datasets. The approach is demonstrated using the Regional Hydro-Ecologic Simulation System (RHESSys) for three different sized watersheds in the US. We assessed data consistency across three different data acquisition approaches: the ‘conventional’ approach, which involves sharing data at the file level through small files, as well as GeoServer, and THREDDS Data Server. This assessment is conducted using RHESSys to evaluate differences in model streamflow output. This approach provides an opportunity to serve datasets needed to create catchment models in a consistent way that can be accessed and processed to serve individual modeling needs.
This collection resource (HS 1) comprises 7 individual HydroShare resources (HS 2-8), each containing different datasets or workflows. These 7 HydroShare resources consist of the following: three resources for three state-scale LES datasets (HS 2-4), one resource with Jupyter notebooks for three different approaches and three different watersheds (HS 5), one resource for RHESSys model instances (i.e., input) of the conventional approach and observation data for all data access approaches in three different watersheds (HS 6), one resource with Jupyter notebooks for automated workflows to create LES datasets (HS 7), and finally one resource with Jupyter notebooks for the evaluation of data consistency (HS 8). More information on each resource is provided within it.
Created: May 20, 2021, 12:35 a.m.
Authors: Choi, Young-Don · Maghami, Iman · Van Beusekom, Ashley · Li, Zhiyu/Drew · Nijssen, Bart · Hay, Lauren · Bennett, Andrew · Tarboton, David · Goodall, Jonathan · Clark, Martyn P. · Wang, Shaowen
ABSTRACT:
This resource, configured for execution in connected JupyterHub compute platforms, helps the modelers to reproduce and build on the results from the VB study (Van Beusekom et al., 2022) as explained by Maghami et el. (2023). For this purpose, three different Jupyter notebooks are developed and included in this resource which explore the paper goal for one example CAMELS site and a pre-selected period of 60-month actual simulation to demonstrate the capabilities of the notebooks. For even a faster assesment of the capabilities of the notebooks, users are recommended to opt for a shorter simulation period (e.g., 12 months of actual simulation and six months of initialization) and one example CAMELS site. The first notebook processes the raw input data from CAMELS dataset to be used as input for SUMMA model. The second notebook executes SUMMA model using the input data from first notebook using original and altered forcing, as per further described in the notebook. Finally, the third notebook utilizes the outputs from notebook 2 and visualizes the sensitivity of SUMMA model outputs using Kling-Gupta Efficiency (KGE). More information about each Jupyter notebook and a step-by-step instructions on how to run the notebooks can be found in the Readme.md fie included in this resource. Using these three notebooks, modelers can apply the methodology mentioned above to any (one to all) of the 671 CAMELS basins and simulation periods of their choice.
Created: May 20, 2021, 12:35 a.m.
Authors: Choi, Young-Don · Maghami, Iman · Van Beusekom, Ashley · Li, Zhiyu/Drew · Nijssen, Bart · Hay, Lauren · Bennett, Andrew · Tarboton, David · Goodall, Jonathan · Clark, Martyn P. · Wang, Shaowen
ABSTRACT:
This resource, configured for execution in connected JupyterHub compute platforms using the CyberGIS-Jupyter for Water (CJW) environment's supported High-Performance Computing (HPC) resources (Expanse or Virtual ROGER) through CyberGIS-Compute Service, helps the modelers to reproduce and build on the results from the VB study (Van Beusekom et al., 2022) as explained by Maghami et el. (2023).
For this purpose, four different Jupyter notebooks are developed and included in this resource which explore the paper goal for four example CAMELS site and a pre-selected period of 60-month simulation to demonstrate the capabilities of the notebooks. The first notebook processes the raw input data from CAMELS dataset to be used as input for SUMMA model. The second notebook utilizes the CJW environment's supported HPC resource (Expanse or Virtual ROGER) through CyberGIS-Compute Service to executes SUMMA model. This notebook uses the input data from first notebook using original and altered forcing, as per further described in the notebook. The third notebook utilizes the outputs from notebook 2 and visualizes the sensitivity of SUMMA model outputs using Kling-Gupta Efficiency (KGE). The fourth notebook, only developed for the HPC environment (and only currently working with Expanse HPC), enables transferring large data from HPC to the scientific cloud service (i.e., CJW) using Globus service integrated by CyberGIS-Compute in a reliable, high-performance and fast way. More information about each Jupyter notebook and a step-by-step instructions on how to run the notebooks can be found in the Readme.md fie included in this resource. Using these four notebooks, modelers can apply the methodology mentioned above to any (one to all) of the 671 CAMELS basins and simulation periods of their choice. As this resource uses HPC, it enables a high-speed running of simulations which makes it suitable for larger simulations (even as large as the entire 671 CAMELS sites and the whole 60-month simulation period used in the paper) practical and much faster than when no HPC is used.
Created: May 20, 2021, 5:54 a.m.
Authors: Choi, Young-Don
ABSTRACT:
ATTENTION: All 9 workflows for RHESSys modeling are now condensed to one. This resource is kept only for archiving purpose.
This HydroShare resource offers Jupyter Notebooks for the RHESSys modeling workflow, employing the GeoServer approach at Spout Run, VA. For instructions on running the Jupyter Notebooks, please refer to the provided README file within this resource.
Created: Oct. 19, 2021, 11:27 p.m.
Authors: Ercan, Mehmet · Maghami, Iman · Bowes, Benjamin · Goodall, Jonathan · Morsy, Mohamed
ABSTRACT:
This resource holds the data and models used by Ercan et al. (2020). The goal of their study was to quantify possible changes in the water balance of a 1373 km2 watershed in North Carolina, the Upper Neuse watershed, due to climate change. To accomplish this, they used a SWAT model to quantify possible changes in the water balance. They first analyzed sensitivity to determine their study area's most sensitive model parameters. Next, they calibrated and validated the SWAT model using daily streamflow records within the watershed. Finally, they used the SWAT model forced with different climate scenarios for baseline, mid-century, and end-century periods using five different downscaled General Circulation Models.
Ercan et al. (2020) did not formally publish the data or Model Instances (MI) used in their study, which is not uncommon. In this resource, we published their data and MIs as an example to demonstrate the design capabilities of Maghami et al. (2023)'s extensible schema for capturing environmental model metadata and show its implementation in HydroShare.
This resource includes the raw input data and preprocessing codes to prepare them as MIs for the SWAT model, four MIs, one Model Program (MP), and postprocessing codes Ercan et al. (2020) used summarize the model results as figures and tables. The contents are organized into the following seven folders:
1- InputDataAndPreprocessing
2- MI_1_SensitivityAnalysis
3- MI_2_CalibrationAndValidation
4- MI_4_ClimateModels_Historical_AfterCalibration
5- MI_5_ClimateModels_Future_AfterCalibration
6- MP
7- Postprocessing
A detailed explanation of the MIs and the MP is available in Maghami et al. (2023). It is important to note that our model metadata design treats the entire raw input data, custom preprocessing, and postprocessing tools (e.g., codes to process raw input data), along with the processed input data, as a single MI. However, since most of the raw input data, preprocessing, and postprocessing tools are common among the four MIs, to avoid repetition, we have organized them into dedicated folders. Each MI now specifically includes only the processed input data for the SWAT model.
Created: March 25, 2024, 9:07 a.m.
Authors: Maghami, Iman
ABSTRACT:
This HydroShare resource offers Jupyter Notebooks for the RHESSys modeling workflow, employing the conventional, GeoServer and THREDDS approaches across Coweeta Subbasin 18, NC; Spout Run, VA; and Scotts Level Branch, MD. For instructions on running the Jupyter Notebooks, please refer to the provided README file within this resource.
Created: April 2, 2024, 9:53 a.m.
Authors: Maghami, Iman
ABSTRACT:
This HydroShare resource provides raw spatial input data for executing RHESSys workflows at 1- Coweeta Subbasin 18, North Carolina, 2- Scotts Level Branch, Maryland, and 3- Spout Run, Virginia. Assessing the conventional data distribution approach, these spatial datasets were manually collected and shared at the file level through small files.
Additoinally, the GeoServer and TDS approach will only use the observation data from this resource.