Checking for non-preferred file/folder path names (may take a long time depending on the number of files/folders) ...
This resource contains some files/folders that have non-preferred characters in their name. Show non-conforming files/folders.
This resource contains content types with files that need to be updated to match with metadata changes. Show content type files that need updating.
Code and data for "Machine learning surrogates for efficient hydrologic modeling: Insights from stochastic simulations of managed aquifer recharge"
Authors: |
|
|
---|---|---|
Owners: |
|
This resource does not have an owner who is an active HydroShare user. Contact CUAHSI (help@cuahsi.org) for information on this resource. |
Type: | Resource | |
Storage: | The size of this resource is 3.9 GB | |
Created: | Dec 16, 2024 at 2:24 a.m. | |
Last updated: | Jan 09, 2025 at 2:09 p.m. | |
Published date: | Jan 09, 2025 at 2:09 p.m. | |
DOI: | 10.4211/hs.f0a31fbc3de148a98deb36795b4fac53 | |
Citation: | See how to cite this resource |
Sharing Status: | Published |
---|---|
Views: | 108 |
Downloads: | 76 |
+1 Votes: | Be the first one to this. |
Comments: | No comments (yet) |
Abstract
This repository contains the data and code associated with the paper "Machine Learning Surrogates for Efficient Hydrologic Modeling: Insights from Stochastic Simulations of Managed Aquifer Recharge" by Dai et al. (2025) in the Journal of Hydrology (https://doi.org/10.1016/j.jhydrol.2024.132606) The study evaluates a hybrid modeling framework that combines process-based hydrologic simulations (with the integrated hydrologic code ParFlow-CLM) and machine learning (ML) surrogates to efficiently simulate managed aquifer recharge. This repository includes:
1) Sample ParFlow-CLM output for all three simulation stages
2) PyTorch dataset modules and utility functions that construct PyTorch tensors from raw ParFlow-CLM outputs
3) PyTorch modules to implement each of the eight ML architectures described in the paper (CNN3d, CNN4d, U-FNO3d, U-FNO4d, ViT3d, ViT4d, PredRNN++, and a CNN autoencoder)
4) PyTorch modules for custom layers implemented in each architecture
5) A PyTorch module that implements a normalized L2 loss function
6) Scripts to train and evaluate each surrogate architecture, including the autoencoder
Though this repository only contains sample ParFlow-CLM simulation output, complete ParFlow output files for all simulations used in the paper are available to the public in a separate repository (https://doi.org/10.25740/hj302gv2126)
Subject Keywords
Coverage
Spatial
Content
README.md
Code and data for "Machine learning surrogates for efficient hydrologic modeling: Insights from stochastic simulations of managed aquifer recharge"
Overview
This repository contains the code for the paper "Machine Learning Surrogates for Efficient Hydrologic Modeling: Insights from Stochastic Simulations of Managed Aquifer Recharge" by Dai et al. (2025) in the Journal of Hydrology. The study evaluates a hybrid modeling framework that combines process-based hydrologic simulations (with the integrated hydrologic code ParFlow-CLM) and machine learning (ML) surrogates to efficiently simulate managed aquifer recharge.
This repository is organized as follows:
data/sample_data
contains sample output for all three simulation stages and sample data for autoencoder training. Instructions for unzipping the data are provided in the Installation section below.data
also contains PyTorch dataset modules and utility functions that construct PyTorch tensors from raw ParFlow-CLM outputs.models
contains PyTorch implementations of the 8 surrogate architectures used in the study (CNN3d, CNN4d, U-FNO3d, U-FNO4d, ViT3d, ViT4d, PredRNN++, and a CNN autoencoder).layers
contains custom PyTorch layers used in some of the surrogate architectures above.losses
contains a PyTorch implementation of the normalized $L^p$-norm used as a loss function in this study.- Finally, the base directory contains scripts to train and evaluate each surrogate architecture.
Installation
Install all required modules with pip install -r requirements.txt
.
For complete compatibility, create your virtual environment with Python 3.8.20.
Other versions of Python have not been tested but may also work.
Get sample data
Unzip the sample data and set up the data directory hierarchy with sh data/sample_data/unzip_all.sh
.
For users who wish to train on the complete dataset used in the paper, complete ParFlow output files are available to the public in a separate repository.
Any external data, provided through the --data_dir
option, must have its directory hierarchy structured similarly to the sample data.
Training
All surrogate architectures described in the paper can be trained using the train.py
script.
The script uses argparse
to take in several command line arguments to specify the model, dataset, and hyperparameters.
To view all command-line options, run python train.py --help
.
As a warning, training models on stages 1, 2 or 3 can be computationally expensive. Each architecture requires anywhere from 2 to 32 GB of memory to train (see Table 3 in the paper) and can take several hours on a single GPU.
To train an autoencoder
python train.py --name <name> \
--mode autoencoder \
--model CNNAutoencoder \
--data_dir data/sample_data/autoencoder \
[--OPTIONS]
where <name>
is the name of the experiment (e.g., my_first_autoencoder
) and CNNAutoencoder
is the architecture to be used.
To train a Stage 1 surrogate
python train.py --name <name> \
--mode stage1 \
--model <model> \
--data_dir data/sample_data/stage1 \
[--OPTIONS]
where <name>
is the name of the experiment and <model>
is the name of the architecture to be used.
Note that the --model
option must be one of the following: CNN3d
, CNN4d
, PredRNN
, UFNO3d
, UFNO4d
, ViT3d
or ViT4d
.
All other options can be viewed with python train.py --help
.
To train a Stage 2 surrogate
python train.py --name <name> \
--mode stage2 \
--model <model> \
--data_dir data/sample_data/stage2 \
[--OPTIONS]
To train a Stage 3 surrogate
python train.py --name <name> \
--mode stage3 \
--model <model> \
--data_dir data/sample_data/stage3 \
--autoencoder_ckpt_path <autoencoder_ckpt_path> \
[--OPTIONS]
Instead of providing an autoencoder checkpoint in Stage 3 training, users can also use a randomly initialized autoencoder by omitting the --autoencoder_ckpt_path
option.
Notable options:
- Start
--name
with "test" to run without saving checkpoints or tensorboard data. - Use the
--use_dummy_dataset
flag to quickly load correctly sized but randomly initialized tensors.
Evaluation
Testing occurs automatically at the end of training when the --train_only
flag is not set.
However, testing can also be initiated separately with the commands below.
To test an autoencoder
python test.py \
--mode autoencoder \
--ckpt <ckpt> \
--data_dir data/sample_data/autoencoder \
[--OPTIONS]
To test a Stage 1 surrogate
python test.py \
--mode stage1 \
--ckpt <ckpt> \
--data_dir data/sample_data/stage1 \
[--OPTIONS]
To test a Stage 2 surrogate
python test.py \
--mode stage2 \
--ckpt <ckpt> \
--data_dir data/sample_data/stage2 \
[--OPTIONS]
To test a Stage 3 surrogate
python test.py \
--mode stage3 \
--ckpt <ckpt> \
--data_dir data/sample_data/stage3 \
[--OPTIONS]
End-to-end (E2E) evaluation
Three checkpoints can be tested together in an end-to-end fashion using the following command:
python e2e.py \
--name <name> \
--stage1_ckpt <stage1_ckpt> \
--stage2_ckpt <stage2_ckpt> \
--stage3_ckpt <stage3_ckpt> \
--stage1_data_dir data/sample_data/stage1 \
--stage2_data_dir data/sample_data/stage2 \
--stage3_data_dir data/sample_data/stage3 \
--autoencoder_ckpt_path <autoencoder_ckpt_path> \
[--OPTIONS]
Related Resources
This resource is referenced by | Timothy Dai, Kate Maher, Zach Perzan, Machine learning surrogates for efficient hydrologic modeling: Insights from stochastic simulations of managed aquifer recharge, Journal of Hydrology, Volume 652, 2025, 132606, ISSN 0022-1694, https://doi.org/10.1016/j.jhydrol.2024.132606. |
How to Cite
This resource is shared under the Creative Commons Attribution CC BY.
http://creativecommons.org/licenses/by/4.0/
Comments
There are currently no comments
New Comment