Donghui Li
Princeton University
Subject Areas: | Water Resources Management, Hydrology |
Recent Activity
ABSTRACT:
This HydroShare Resorce provides the scripts for data retrievel and processing, model running and postanalysis, and figure creation for the manuscript under review by JOH. The abstract of the manuscript is as follows: Seasonal soil freezing and thawing processes significantly influence runoff generation dynamics during cold periods, affecting various hydrological and agricultural systems, including flood generation, soil erosion, and plant health. Representing frozen soil conditions in land surface or hydrological models is therefore crucial. While fully distributed models implement the process by solving energy-mass balance equations to obtain soil temperature profiles, parsimonious models using “snow tanks” or frozen ground states can provide suitable modeling solutions with reduced computational demands. However, even these parsimonious approaches to representing frozen ground typically require some additional complexity through additional inputs or surface energy balance calculations. This study evaluates the applicability of a simplified soil temperature prediction model that determines frozen/unfrozen ground states using only air temperature and snow cover data, reducing model complexity. We first validate the model performance using AmeriFlux network in-situ measurements across the United States and Canada. Furthermore, we provide a comprehensive assessment at the global scale with ERA5-LAND reanalysis data (1980-2020). The model demonstrates robust performance globally, achieving an average true frozen rate of 0.90 and false frozen rate of 0.06. We also investigate the model performance by month, and, while monthly analyses show drops in model performance for certain months, these lower scores are primarily due to the limited number of freeze-thaw events during these periods, which makes the model appear less accurate than it actually is. In terms of spatial performance, the model shows reduced accuracy in mountainous regions, including the Tibetan Plateau, Rocky Mountains, and Andes, suggesting the need for region-specific parameter calibration in orographic settings. Nevertheless, this parsimonious soil temperature model demonstrates significant potential as a computationally efficient solution for incorporating frozen ground effects in distributed hydrological models with simple conceptual runoff generation schemes.
ABSTRACT:
The extensive construction of dams exerts significant human perturbance on river systems and largely changes surface water hydrology. However, reservoir operation has long been simplified or ignored in large-scale hydrological and water resources simulation, partially due to the inaccessibility of operation manuals for most reservoirs. This dataset provides empirical operation rules documented and discussed in Li et al. (https://doi.org/10.1029/2023WR036686), covering 450+ large reservoirs in the Conterminous United States (CONUS), derived from daily inflow and storage records using the machine learning-based generic data-driven operation model (GDROM, Chen et al. 2022, https://doi.org/10.1016/j.advwatres.2022.104274). Among the reservoirs, those mainly operated for flood control take the largest portion (43%), which are primarily located in Eastern and Central United States; followed by flooding control is irrigation (23%), mostly distributed in the Western United States. We also have hydropower reservoirs (17%) primarily located in the Southeastern United States and the Pacific Northwest, water supply reservoirs (9%), recreation reservoirs (5%), and navigation reservoirs (3%) in the various CONUS regions. The majority length of the records is 15+ years, most of which is sufficiently long to contain inter-annual operation patterns and long-term changes.
The dataset contains 1) the daily operation records from multiple data sources used for model training and validation, and 2) derived operation rules, expressed as "if-then" rules, for each of the 450+ reservoirs. The raw data were processed for training the GDROM, including a) computing "net inflow" to replace the observed inflow to account for storage change due to precipitation, evaporation, seepage, and interaction with groundwater (discharge and recharge); b) detecting and removing the dates with missing data to make continuous time series, and c) correcting outliers (e.g., those with abnormal sudden storage changes). In addition, for each of the reservoirs, the inflow, storage, and release are normalized by the maximum historical storage during the observation period, which enables comparing the extracted operation modules among reservoirs with various sizes. The normalization reduces the time required for hyperparameter tuning, especially the minimum impurity decrease, of which the range of candidate values is considerably decreased. The operation rules for each reservoir contain one or multiple representative operation modules and the hydroclimatic conditions under which the modules are applied. Both the modules and the module application conditions are derived from the Decision Tree; the data-driven model composed of the modules and module application conditions are provided as "if-then" statements.
(Update - January 2025) The processed daily operation records for 256 selected reservoirs, each with a minimum of 25 years of data (spanning from 1990 to 2014 or later), are available in another HydroShare repository (Chen and Cai, 2025: http://www.hydroshare.org/resource/092720588e2e4524bf2674235ff69d81).
Contact
(Log in to send email) |
Author Identifiers
All | 2 |
Collection | 0 |
Resource | 2 |
App Connector | 0 |

Created: Sept. 1, 2022, 3:50 a.m.
Authors: Li, Donghui · Yanan Chen · Ximing Cai · Qiankun Zhao
ABSTRACT:
The extensive construction of dams exerts significant human perturbance on river systems and largely changes surface water hydrology. However, reservoir operation has long been simplified or ignored in large-scale hydrological and water resources simulation, partially due to the inaccessibility of operation manuals for most reservoirs. This dataset provides empirical operation rules documented and discussed in Li et al. (https://doi.org/10.1029/2023WR036686), covering 450+ large reservoirs in the Conterminous United States (CONUS), derived from daily inflow and storage records using the machine learning-based generic data-driven operation model (GDROM, Chen et al. 2022, https://doi.org/10.1016/j.advwatres.2022.104274). Among the reservoirs, those mainly operated for flood control take the largest portion (43%), which are primarily located in Eastern and Central United States; followed by flooding control is irrigation (23%), mostly distributed in the Western United States. We also have hydropower reservoirs (17%) primarily located in the Southeastern United States and the Pacific Northwest, water supply reservoirs (9%), recreation reservoirs (5%), and navigation reservoirs (3%) in the various CONUS regions. The majority length of the records is 15+ years, most of which is sufficiently long to contain inter-annual operation patterns and long-term changes.
The dataset contains 1) the daily operation records from multiple data sources used for model training and validation, and 2) derived operation rules, expressed as "if-then" rules, for each of the 450+ reservoirs. The raw data were processed for training the GDROM, including a) computing "net inflow" to replace the observed inflow to account for storage change due to precipitation, evaporation, seepage, and interaction with groundwater (discharge and recharge); b) detecting and removing the dates with missing data to make continuous time series, and c) correcting outliers (e.g., those with abnormal sudden storage changes). In addition, for each of the reservoirs, the inflow, storage, and release are normalized by the maximum historical storage during the observation period, which enables comparing the extracted operation modules among reservoirs with various sizes. The normalization reduces the time required for hyperparameter tuning, especially the minimum impurity decrease, of which the range of candidate values is considerably decreased. The operation rules for each reservoir contain one or multiple representative operation modules and the hydroclimatic conditions under which the modules are applied. Both the modules and the module application conditions are derived from the Decision Tree; the data-driven model composed of the modules and module application conditions are provided as "if-then" statements.
(Update - January 2025) The processed daily operation records for 256 selected reservoirs, each with a minimum of 25 years of data (spanning from 1990 to 2014 or later), are available in another HydroShare repository (Chen and Cai, 2025: http://www.hydroshare.org/resource/092720588e2e4524bf2674235ff69d81).

Created: Jan. 15, 2025, 1:25 a.m.
Authors: Li, Donghui
ABSTRACT:
This HydroShare Resorce provides the scripts for data retrievel and processing, model running and postanalysis, and figure creation for the manuscript under review by JOH. The abstract of the manuscript is as follows: Seasonal soil freezing and thawing processes significantly influence runoff generation dynamics during cold periods, affecting various hydrological and agricultural systems, including flood generation, soil erosion, and plant health. Representing frozen soil conditions in land surface or hydrological models is therefore crucial. While fully distributed models implement the process by solving energy-mass balance equations to obtain soil temperature profiles, parsimonious models using “snow tanks” or frozen ground states can provide suitable modeling solutions with reduced computational demands. However, even these parsimonious approaches to representing frozen ground typically require some additional complexity through additional inputs or surface energy balance calculations. This study evaluates the applicability of a simplified soil temperature prediction model that determines frozen/unfrozen ground states using only air temperature and snow cover data, reducing model complexity. We first validate the model performance using AmeriFlux network in-situ measurements across the United States and Canada. Furthermore, we provide a comprehensive assessment at the global scale with ERA5-LAND reanalysis data (1980-2020). The model demonstrates robust performance globally, achieving an average true frozen rate of 0.90 and false frozen rate of 0.06. We also investigate the model performance by month, and, while monthly analyses show drops in model performance for certain months, these lower scores are primarily due to the limited number of freeze-thaw events during these periods, which makes the model appear less accurate than it actually is. In terms of spatial performance, the model shows reduced accuracy in mountainous regions, including the Tibetan Plateau, Rocky Mountains, and Andes, suggesting the need for region-specific parameter calibration in orographic settings. Nevertheless, this parsimonious soil temperature model demonstrates significant potential as a computationally efficient solution for incorporating frozen ground effects in distributed hydrological models with simple conceptual runoff generation schemes.