Amber Spackman Jones
Utah State University | Research Engineer
Subject Areas: | Hydrology, Water Quality Monitoring, Water Quality Modeling, Hydroinformatics, Cyberinfrastructure, Data Management |
Recent Activity
ABSTRACT:
This resource was created for the 2024 New Zealand Hydrological Society Data Workshop in Queenstown, NZ. This resource contains Jupyter Notebooks with examples for conducting quality control post processing for in situ aquatic sensor data. The code uses the Python pyhydroqc package to detect anomalies. This resource consists of 3 example notebooks and associated data files. For more information, see the original resource from which this was derived: http://www.hydroshare.org/resource/451c4f9697654b1682d87ee619cd7924.
Notebooks:
1. Example 1: Import and plot data
2. Example 2: Perform rules-based quality control
3. Example 3: Perform model-based quality control (ARIMA)
4. Example 4: Model-based quality control (ARIMA) with user data
Data files:
Data files are available for 6 aquatic sites in the Logan River Observatory. Each file contains data for one site for a single year. Each file corresponds to a single year of data. The files are named according to monitoring site (FranklinBasin, TonyGrove, WaterLab, MainStreet, Mendon, BlackSmithFork) and year. The files were sourced by querying the Logan River Observatory relational database, and equivalent data could be obtained from the LRO website or on HydroShare. Additional information on sites, variables, and methods can be found on the LRO website (http://lrodata.usu.edu/tsa/) or HydroShare (https://www.hydroshare.org/search/?q=logan%20river%20observatory). Each file has the same structure indexed with a datetime column (mountain standard time) with three columns corresponding to each variable. Variable abbreviations and units are:
- temp: water temperature, degrees C
- cond: specific conductance, μS/cm
- ph: pH, standard units
- do: dissolved oxygen, mg/L
- turb: turbidity, NTU
- stage: stage height, cm
For each variable, there are 3 columns:
- Raw data value measured by the sensor (column header is the variable abbreviation).
- Technician quality controlled (corrected) value (column header is the variable abbreviation appended with '_cor').
- Technician labels/qualifiers (column header is the variable abbreviation appended with '_qual').
There is also a file "data.csv" for use with Example 4. If any user wants to bring their own data file, they should structure it similarly to this file with a single column of datetime values and a single column of numeric observations labeled "raw".
ABSTRACT:
This resource contains the results of interviews and surveys of instructors of hydroinformatics and water data science courses in the United States conducted in Fall 2021. Potential participants were initially identified via investigator connections, review of relevant literature, and information on institutional and personal websites discovered by Internet searches. Target participants were selected based on their experience teaching hydroinformatics, water data science, or related subject matter at an institution of higher education. We used email to invite contacts to participate, and participants elected to respond to questions either via online survey or recorded interview. During each interview or survey, participants were asked to identify any additional instructors who might be a good fit for the project. The survey was composed using Qualtrics software and administered with links personalized for each participant. Interviews were conducted over Zoom, recorded, and subsequently transcribed. Each interview lasted approximately 45-60 minutes. Procedures were approved by the Utah State University Institutional Review Board for Human Subjects Research with participation limited to instructors within the United States.
This resource contains the list of questions asked to each participant, interview transcripts, and survey responses. Participant names and institutions have been removed from the files.
This resource contains supporting data for the paper Jones AS, Horsburgh JS, Bastidas Pacheco CJ, Flint CG and Lane BA (2022) Advancing Hydroinformatics and Water Data Science Instruction: Community Perspectives and Online Learning Resources. Front. Water 4:901393. doi: 10.3389/frwa.2022.901393.
ABSTRACT:
This collection is comprised of resources with code examples that support educational materials for hydroinformatics and water data science. Each resource contains Jupyter notebooks and associated datasets. Complete learning module materials are found in HydroLearn: Jones, A.S., Horsburgh, J.S., Bastidas Pacheco, C.J. (2022). Hydroinformatics and Water Data Science. HydroLearn. https://edx.hydrolearn.org/courses/course-v1:USU+CEE6110+2022/about.
The resources and code examples are:
1. Programmatic Data Access with USGS Data Retrieval
2. Sensor Data Quality Control with pyhydroqc
3. Databases and SQL in Python
4. Introduction to Machine Learning with Residential Water Use Data
ABSTRACT:
This resource contains Jupyter Notebooks with examples that illustrate how to work with SQLite databases in Python including database creation and viewing and querying with SQL. The resource is part of set of materials for hydroinformatics and water data science instruction. Complete learning module materials are found in HydroLearn: Jones, A.S., Horsburgh, J.S., Bastidas Pacheco, C.J. (2022). Hydroinformatics and Water Data Science. HydroLearn. https://edx.hydrolearn.org/courses/course-v1:USU+CEE6110+2022/about..
This resources consists of 3 example notebooks and a SQLite database.
Notebooks:
1. Example 1: Querying databases using SQL in Python
2. Example 2: Python functions to query SQLite databases
3. Example 3: SQL join, aggregate, and subquery functions
Data files:
These examples use a SQLite database that uses the Observations Data Model structure and is pre-populated with Logan River temperature data.
ABSTRACT:
This resource contains Jupyter Notebooks with examples that are an introduction to machine learning classification based on residential water use data. The resource is part of set of materials for hydroinformatics and water data science instruction. Complete learning module materials are found in HydroLearn: Jones, A.S., Horsburgh, J.S., Bastidas Pacheco, C.J. (2022). Hydroinformatics and Water Data Science. HydroLearn. https://edx.hydrolearn.org/courses/course-v1:USU+CEE6110+2022/about.
This resources consists of 4 example notebooks and a data files.
Notebooks:
1. Example 1: Data import and exploration
2. Example 2: Implementing a first machine learning model
3. Example 3: Comparing multiple machine learning models
4. Example 4: Model optimization by hyperparameter tuning
Data files:
The data is contained in a flat file and is a record of water use data from a single residential property with manually applied labels to classify the water uses. Columns are:
- StartTime: Start date and time of each individual event. Format: 'YYYY-MM-DD HH:MM:SS'
- EndTime: End date and time of each individual event. Format: 'YYYY-MM-DD HH:MM:SS'
- Duration: Duration of each individual event (end time - start time). Units: Minutes
- Volume: Volume of water used in each individual event. Unit: Gallons
- FlowRate: Average flow rate of each individual event. Unit: Gallons per minute
- Peak: Maximum value observed in each 4-seconds period within each event. Unit: Gallons
- Mode: Most frequent value observed in an event. Unit: Gallons
- Label: Event classification. Values: faucet, toilet, shower, clotheswasher, bathtub
Contact
(Log in to send email) |
All | 0 |
Collection | 0 |
Resource | 0 |
App Connector | 0 |
Created: Aug. 25, 2016, 9:30 p.m.
Authors: Jeffery S. Horsburgh · Amber Jones
ABSTRACT:
iUTAH (innovative Urban Transitions and Aridregion Hydrosustainability) is a collaborative research and training program in Utah. As part of project requirements, iUTAH developed a data policy that seeks to maximize the impact and broad use of datasets collected within iUTAH facilities and by iUTAH research teams. This policy document focuses on assisting iUTAH investigators in creating and sharing high-quality data. The policy defines the data types generated as part of iUTAH and clarifies timelines for associated data publication. It specifies the requirements for submittal of a data collection plan, the creation of metadata, and the publication of datasets. It clarifies requirements for cases involving human subjects as well as raw data and analytical products. The Policy includes guidelines for data and metadata standards, storage and archival, curation, and data use and citation. Agreements for data publishers and data use are also included as appendices.
Created: Jan. 26, 2017, midnight
Authors: Jeffery S. Horsburgh · Amber Jones
ABSTRACT:
iUTAH (innovative Urban Transitions and Aridregion Hydrosustainability) is a collaborative research and training program in Utah. As part of project requirements, iUTAH developed a data policy that seeks to maximize the impact and broad use of datasets collected within iUTAH facilities and by iUTAH research teams. This policy document focuses on assisting iUTAH investigators in creating and sharing high-quality data. The policy defines the data types generated as part of iUTAH and clarifies timelines for associated data publication. It specifies the requirements for submittal of a data collection plan, the creation of metadata, and the publication of datasets. It clarifies requirements for cases involving human subjects as well as raw data and analytical products. The Policy includes guidelines for data and metadata standards, storage and archival, curation, and data use and citation. Agreements for data publishers and data use are also included as appendices.
Created: Feb. 22, 2018, 11:37 p.m.
Authors: Amber Jones · Dave Eiriksson · Jeffery S. Horsburgh
ABSTRACT:
These are data resulting from and related to an effort to examine subjectivity in the process of performing quality control on water quality data measured by in situ sensors. Participants (n=27) included novices unfamiliar with and technicians experienced in quality control. Each participant performed quality control post processing on the same datasets: one calendar year (2014) of water temperature, pH, and specific conductance. Participants were provided with a consistent set of guidelines, field notes, and tools. Participants used ODMTools (https://github.com/ODM2/ODMToolsPython/) to perform the quality control exercise. This resource consists of:
1. Processed Results: Each file in this folder corresponds to one of the variables for which quality control was performed. Each row corresponds to a single time stamp and each column corresponds to the processed results generated by each participant. The first column corresponds to the original, raw data.
2. Survey Data: The files in this folder are related to an exit survey administered to participants upon completion of the exercise. It includes the survey questions (pdf), the full Qualtrics output (QualityControlSurvey.pdf), data and metadata files organized and encoded for display in the Survey Data Viewer (http://data.iutahepscor.org/surveys/survey/QCEXP) (QCExperimentSurveyDataFile.csv, QCExperimentSurveyMetadata.csv), and a file used to organize data for plots for the associated paper.
3. Field Record: Participants were provided this document, which gives information about the field maintenance activities relevant to performing QC.
4. Scripts: Each file in this folder corresponds to a script automatically generated by ODMTools while performing quality control. The files are organized by user ID and by variable.
5. Code and Analysis: Script used to generate the figures for this work in the associated paper. It is important to note that novice users correspond to IDs 1-22 and experienced users correspond to IDs 25-38. This folder also includes subsets of the data organized in supporting files used to generate Figure 6 (ExpGapVals.xlsx) and Table 5 (NoDataCount.xlsx).
Created: Nov. 20, 2018, 8:45 p.m.
Authors: Amber Jones
ABSTRACT:
This resource contains a Jupyter Notebook that uses Python to access and visualize data for the USGS flow gage on the Colorado River at Lee’s Ferry, AZ (09380000). This site monitors water quantity and quality for water released from Glen Canyon Dam that then flows through the Grand Canyon. To call these services in Python, the suds-py3 package was used. Using this package, a “GetValuesObject” request, as defined by WaterOneFlow, was passed to the server using inputs for the web service url, site code, variable code, and dates of interest. For this case, 15-minute discharge from August 1, 2018 to the current date was used. The web service returned an object from which the dates and the data values were obtained, as well as the site name. The Python libraries Pandas and Matplotlib were used to manipulate and view the results. The time series data were converted to lists and then to a Pandas series object. Using the “resample” function of Pandas, values for mean, minimum, and maximum were determined on a daily basis from the 15-minute data. Using Matplotlib, a figure object was created to which Pandas series objects were added using the Pandas plot method. The daily mean, minimum, maximum, and the 15-minute flow values were added to illustrate the differences in the daily ranges of data.
Created: Dec. 2, 2018, 3:27 a.m.
Authors: Hyrum Tennant · Amber Spackman Jones
ABSTRACT:
For environmental data measured by a variety of sensors and compiled from various sources, practitioners need tools that facilitate data access and data analysis. Data are often organized in formats that are incompatible with each other and that prevent full data integration. Furthermore, analyses of these data are hampered by the inadequate mechanisms for storage and organization. Ideally, data should be centrally housed and organized in an intuitive structure with established patterns for analyses. However, in reality, the data are often scattered in multiple files without uniform structure that must be transferred between users and called individually and manually for each analysis. This effort describes a process for compiling environmental data into a single, central database that can be accessed for analyses. We use the Logan River watershed and observed water level, discharge, specific conductance, and temperature as a test case. Of interest is analysis of flow partitioning. We formatted data files and organized them into a hierarchy, and we developed scripts that import the data to a database with structure designed for hydrologic time series data. Scripts access the populated database to determine baseflow separation, flow balance, and mass balance and visualize the results. The analyses were compiled into a package of scripts in Python, which can be modified and run by scientists and researchers to determine gains and losses in reaches of interest. To facilitate reproducibility, the database and associated scripts were shared to HydroShare as Jupyter Notebooks so that any user can access the data and perform the analyses, which facilitates standardization of these operations.
Created: Feb. 17, 2019, 3:22 a.m.
Authors: Amber Jones · William Rhoads · Jeffery S. Horsburgh
ABSTRACT:
Hurricane Maria is an example of a natural disaster that caused disruptions to infrastructure resulting in concerns with water treatment failures and potential contamination of drinking water supplies. This dataset is focused on the water quality data collected in Puerto Rico after Hurricane Maria and is part of the larger collaborative RAPID Hurricane Maria project.
This resource consists of Excel workbooks and a SQLite database. Both were populated with data and metadata corresponding to discrete water quality analysis of drinking water systems in Puerto Rico impacted by Hurricane Maria collected as part of the RAPID Maria project. Sampling and analysis was performed by a team from Virginia Tech in February-April 2018. Discrete samples were collected and returned to the lab for ICPMS analysis. Sampling was also conducted in the field for temperature, pH, free and total chlorine, turbidity, and dissolved oxygen. Complete method and variable descriptions are contained in the workbooks and database. There are two separate workbooks: one for ICPMS data and one for field data. All results are contained in the single database. Sites were sampled corresponding to several water distribution systems and source streams in southwestern Puerto Rico. Coordinates are included for the stream sites, but to preserve the security of the water distribution sites, the locations are only identified as within Puerto Rico.
The workbooks follow the specifications for YAML Observations Data Archive (YODA) exchange format (https://github.com/ODM2/YODA-File). The workbooks are templates with sheets containing tables that are mapped to entities in the Observations Data Model 2 (ODM2 - https://github.com/ODM2). Each sheet in the workbook contains directions for its completion and brief descriptions of the attributes. The data in the sheets was converted to an SQLite database following the ODM2 schema that is also contained in this resource. Conversion was performed using a prototype Python translation software (https://github.com/ODM2/YODA-Tools).
Created: April 30, 2019, 8:23 p.m.
Authors: Amber Spackman Jones · Sara Madison Alger · Homa Salehabadi
ABSTRACT:
Since the closing of Glen Canyon Dam, the clear waters of the Colorado River have stripped sediment from beaches and sandbars in the Grand Canyon. In an attempt to distribute sand to rebuild beaches, high flow experiments (HFE) have been conducted wherein large releases from Glen Canyon Dam are made over several days. The HFE events are timed to follow the summer/fall monsoon season when sand delivery from the Paria River is typically high given that the Paria is the primary source of sand to the Colorado River in Marble Canyon. Unrelated reservoir operating rules coordinate annual releases from Lake Powell so that the storage contents of Lakes Powell and Mead are equalized. If these “equalization flows” are released when there is relatively little sand supplied from the Paria River, they are likely to erode downstream sandbars, including those created by HFEs. Currently, there is no connection between the operations for reservoir equalization and for implementation of HFEs. Our analysis examines potential changes to the equalization protocols to explore whether equalization flows can be delayed to avoid releases that cause sandbar depletion. Results indicate that delaying equalization in favor of sediment supply results in some inequity for Lakes Powell and Mead, but the imbalance is less than anticipated and less than with no equalization present. Jointly considering sediment supply and equalization could help retain sediment within the Grand Canyon, however, even in years of sand load that meets the threshold for HFE experiments, the sediment supply may not be sufficient to balance out the volumes of equalization flows.
This data resource consists of the files used to support this work. The word document and the power point presentation present the results of this work. The folder CRSS contains two other folders. One folder, 'model' contains a saved version of the Colorado River Simulation System - a model that may be implemented in Riverware. This saved model includes slots corresponding to estimated sediment and slots generated by the implemented ruleset to govern equalization (Sediment Equalization Trigger, Years Without Sediment, 1-yr, 2-yr, 3-yr Equalization Delay). The 'ruleset' folder contains rulesets used in this analysis. There are four rulesets - each corresponding to scenarios run. The folder Data contains R code for running statistical analysis on input sediment data and flow data. The raw input files to run the code are included that correspond to natural flow inputs(obtained from the Bureau of Reclamation) and sand load from the Paria River (obtained from the Grand Canyon Monitoring and Research Center). The Results folder includes 1. a table of Estimated Summer Sandload and 2. a spreadsheet of CRSS results for the various scenarios run along with plots for comparing between them.
Created: May 1, 2019, 6:16 p.m.
Authors: Amber Spackman Jones · Sara Madison Alger · Homa Salehabadi · Abigail Repko
ABSTRACT:
This project used Budyko-based methods to determine the elasticity and sensitivity of 29 subbasins in the Colorado River Basin. Elasticity and sensitivity are metrics used to determine the relative expected changes in runoff given changes in precipitation and temperature, respectively. We used publicly available data to determine long term averages for temperature, precipitation, and runoff for principal Colorado River subbasins. Given those data, we used Budyko-based methods to estimate the elasticity and sensitivity of each subbasin to changes in temperature and precipitation. We determined the aridity index of each subbasin and Budyko parameter (w), which aggregates watershed storage characteristics. Subcatchments located in the Upper Basin, driven mostly by snowmelt, have a lower aridity index and higher w value than those in the Lower Basin, driven by monsoonal storm events. The Paria and the Little Colorado River subbasins are particularly sensitive to changes in precipitation and temperature. To identify the initialization of direct human impacts, we used a double mass curve break point analysis on a single subcatchment. Two breakpoints were identified, 1963 and 1988, corresponding to human impact and climate change, respectively.
This data resource includes a document and power point reporting the key findings of this work. We include the code, input, and output files used to perform analyses, all of which are described in the readme.
Created: June 11, 2019, 5:35 p.m.
Authors: Christina Bandaragoda · Amber Spackman Jones · Jeffery S. Horsburgh · Liza Brazil
ABSTRACT:
CUAHSI’s Water Data Services are community developed, open access, and available to everyone. Workshops are used to share and learn how these services can help researchers and teams on a variety of research tasks. We include an overview of how to develop data management plans, which are increasingly required by most funders. Materials describe how to discover and find a broad array of water data-time series, samples, spatial coverages, published datasets, and case study workflows. CUAHSI apps and tools are introduced for expediting and documenting workflows. We have provided interactive curriculum and tutorials with examples of how toShare your data within a group and publish your data with a DOI. Future training opportunities and funding opportunities for graduate students are listed.
This workshop was a featured event at the 2019 UCOWR Annual Water Resources Conference, Tuesday, June 11 from 1:00 p.m. – 3:50 p.m., White Pine Meeting Room, Cliff Lodge Snowbird, Utah
Created: Dec. 9, 2020, 4:21 a.m.
Authors: Jones, Amber Spackman
ABSTRACT:
This resource contains an example script for using the software package pyhydroqc. pyhydroqc was developed to identify and correct anomalous values in time series data collected by in situ aquatic sensors. For more information, see the code repository: https://github.com/AmberSJones/pyhydroqc and the documentation: https://ambersjones.github.io/pyhydroqc/. The package may be installed from the Python Package Index.
This script applies the functions to data from a single site in the Logan River Observatory, which is included in the repository. The data collected in the Logan River Observatory are sourced at http://lrodata.usu.edu/tsa/ or on HydroShare: https://www.hydroshare.org/search/?q=logan%20river%20observatory.
Anomaly detection methods include ARIMA (AutoRegressive Integrated Moving Average) and LSTM (Long Short Term Memory). These are time series regression methods that detect anomalies by comparing model estimates to sensor observations and labeling points as anomalous when they exceed a threshold. There are multiple possible approaches for applying LSTM for anomaly detection/correction.
- Vanilla LSTM: uses past values of a single variable to estimate the next value of that variable.
- Multivariate Vanilla LSTM: uses past values of multiple variables to estimate the next value for all variables.
- Bidirectional LSTM: uses past and future values of a single variable to estimate a value for that variable at the time step of interest.
- Multivariate Bidirectional LSTM: uses past and future values of multiple variables to estimate a value for all variables at the time step of interest.
The correction approach uses piecewise ARIMA models. Each group of consecutive anomalous points is considered as a unit to be corrected. Separate ARIMA models are developed for valid points preceding and following the anomalous group. Model estimates are blended to achieve a correction.
The anomaly detection and correction workflow involves the following steps:
1. Retrieving data
2. Applying rules-based detection to screen data and apply initial corrections
3. Identifying and correcting sensor drift and calibration (if applicable)
4. Developing a model (i.e., ARIMA or LSTM)
5. Applying model to make time series predictions
6. Determining a threshold and detecting anomalies by comparing sensor observations to modeled results
7. Widening the window over which an anomaly is identified
8. Aggregating detections resulting from multiple models
9. Making corrections for anomalous events
Instructions to run the notebook through the CUAHSI JupyterHub:
1. Click "Open with..." at the top of the resource and select the CUAHSI JupyterHub. You may need to sign into CUAHSI JupyterHub using your HydroShare credentials.
2. Select 'Python 3.8 - Scientific' as the server and click Start.
2. From your JupyterHub directory, click on the ExampleNotebook.ipynb file.
3. Execute each cell in the code by clicking the Run button.
Created: June 3, 2021, 7:36 p.m.
Authors: Jones, Amber Spackman · Tanner Jones · Horsburgh, Jeffery S.
ABSTRACT:
This resource contains the supporting data and code files for the analyses presented in "Toward automating post processing of aquatic sensor data," an article published in the journal Environmental Modelling and Software. This paper describes pyhydroqc, a Python package developed to identify and correct anomalous values in time series data collected by in situ aquatic sensors. For more information on pyhydroqc, see the code repository (https://github.com/AmberSJones/pyhydroqc) and the documentation (https://ambersjones.github.io/pyhydroqc/). The package may be installed from the Python Package Index (more info: https://packaging.python.org/tutorials/installing-packages/).
Included in this resource are input data, Python scripts to run the package on the input data (anomaly detection and correction), results from running the algorithm, and Python scripts for generating the figures in the manuscript. The organization and structure of the files are described in detail in the readme file. The input data were collected as part of the Logan River Observatory (LRO). The data in this resource represent a subset of data available for the LRO and were compiled by querying the LRO’s operational database. All available data for the LRO can be sourced at http://lrodata.usu.edu/tsa/ or on HydroShare: https://www.hydroshare.org/search/?q=logan%20river%20observatory.
There are two sets of scripts in this resource: 1.) Scripts that reproduce plots for the paper using saved results, and 2.) Code used to generate the complete results for the series in the case study. While all figures can be reproduced, there are challenges to running the code for the complete results (it is computationally intensive, different results will be generated due to the stochastic nature of the models, and the code was developed with an early version of the package), which is why the saved results are included in this resource. For a simple example of running pyhydroqc functions for anomaly detection and correction on a subset of data, see this resource: https://www.hydroshare.org/resource/92f393cbd06b47c398bdd2bbb86887ac/.
Created: Sept. 7, 2021, 10:46 p.m.
Authors: Jones, Amber Spackman · Horsburgh, Jeffery S. · Tannner Jones
ABSTRACT:
This resource contains a video recording for a presentation given as part of the National Water Quality Monitoring Council conference in April 2021. The presentation covers the motivation for performing quality control for sensor data, the development of PyHydroQC, a Python package with functions for automating sensor quality control including anomaly detection and correction, and the performance of the algorithms applied to data from multiple sites in the Logan River Observatory.
The initial abstract for the presentation:
Water quality sensors deployed to aquatic environments make measurements at high frequency and commonly include artifacts that do not represent the environmental phenomena targeted by the sensor. Sensors are subject to fouling from environmental conditions, often exhibit drift and calibration shifts, and report anomalies and erroneous readings due to issues with datalogging, transmission, and other unknown causes. The suitability of data for analyses and decision making often depend on subjective and time-consuming quality control processes consisting of manual review and adjustment of data. Data driven and machine learning techniques have the potential to automate identification and correction of anomalous data, streamlining the quality control process. We explored documented approaches and selected several for implementation in a reusable, extensible Python package designed for anomaly detection for aquatic sensor data. Implemented techniques include regression approaches that estimate values in a time series, flag a point as anomalous if the difference between the sensor measurement exceeds a threshold, and offer replacement values for correcting anomalies. Additional algorithms that scaffold the central regression approaches include rules-based preprocessing, thresholds for determining anomalies that adjust with data variability, and the ability to detect and correct anomalies using forecasted and backcasted estimation. The techniques were developed and tested based on several years of data from aquatic sensors deployed at multiple sites in the Logan River Observatory in northern Utah, USA. Performance was assessed based on labels and corrections applied previously by trained technicians. In this presentation, we describe the techniques for detection and correction, report their performance, illustrate the workflow for applying to high frequency aquatic sensor data, and demonstrate the possibility for additional approaches to help increase automation of aquatic sensor data post processing.
Created: Jan. 28, 2022, 5:41 p.m.
Authors: Jones, Amber Spackman · Horsburgh, Jeffery S.
ABSTRACT:
This resource contains Jupyter Notebooks with examples for accessing USGS NWIS data via web services and performing subsequent analysis related to drought with particular focus on sites in Utah and the southwestern United States (could be modified to any USGS sites). The code uses the Python DataRetrieval package. The resource is part of set of materials for hydroinformatics and water data science instruction. Complete learning module materials are found in HydroLearn: Jones, A.S., Horsburgh, J.S., Bastidas Pacheco, C.J. (2022). Hydroinformatics and Water Data Science. HydroLearn. https://edx.hydrolearn.org/courses/course-v1:USU+CEE6110+2022/about.
This resources consists of 6 example notebooks:
1. Example 1: Import and plot daily flow data
2. Example 2: Import and plot instantaneous flow data for multiple sites
3. Example 3: Perform analyses with USGS annual statistics data
4. Example 4: Retrieve data and find daily flow percentiles
3. Example 5: Further examination of drought year flows
6. Coding challenge: Assess drought severity
Created: Jan. 28, 2022, 8:38 p.m.
Authors: Jones, Amber Spackman
ABSTRACT:
This resource contains Jupyter Notebooks with examples for conducting quality control post processing for in situ aquatic sensor data. The code uses the Python pyhydroqc package. The resource is part of set of materials for hydroinformatics and water data science instruction. Complete learning module materials are found in HydroLearn: Jones, A.S., Horsburgh, J.S., Bastidas Pacheco, C.J. (2022). Hydroinformatics and Water Data Science. HydroLearn. https://edx.hydrolearn.org/courses/course-v1:USU+CEE6110+2022/about.
This resources consists of 3 example notebooks and associated data files.
Notebooks:
1. Example 1: Import and plot data
2. Example 2: Perform rules-based quality control
3. Example 3: Perform model-based quality control (ARIMA)
Data files:
Data files are available for 6 aquatic sites in the Logan River Observatory. Each file contains data for one site for a single year. Each file corresponds to a single year of data. The files are named according to monitoring site (FranklinBasin, TonyGrove, WaterLab, MainStreet, Mendon, BlackSmithFork) and year. The files were sourced by querying the Logan River Observatory relational database, and equivalent data could be obtained from the LRO website or on HydroShare. Additional information on sites, variables, and methods can be found on the LRO website (http://lrodata.usu.edu/tsa/) or HydroShare (https://www.hydroshare.org/search/?q=logan%20river%20observatory). Each file has the same structure indexed with a datetime column (mountain standard time) with three columns corresponding to each variable. Variable abbreviations and units are:
- temp: water temperature, degrees C
- cond: specific conductance, μS/cm
- ph: pH, standard units
- do: dissolved oxygen, mg/L
- turb: turbidity, NTU
- stage: stage height, cm
For each variable, there are 3 columns:
- Raw data value measured by the sensor (column header is the variable abbreviation).
- Technician quality controlled (corrected) value (column header is the variable abbreviation appended with '_cor').
- Technician labels/qualifiers (column header is the variable abbreviation appended with '_qual').
Created: Jan. 28, 2022, 9:27 p.m.
Authors: Bastidas Pacheco, Camilo J. · Jones, Amber Spackman
ABSTRACT:
This resource contains Jupyter Notebooks with examples that are an introduction to machine learning classification based on residential water use data. The resource is part of set of materials for hydroinformatics and water data science instruction. Complete learning module materials are found in HydroLearn: Jones, A.S., Horsburgh, J.S., Bastidas Pacheco, C.J. (2022). Hydroinformatics and Water Data Science. HydroLearn. https://edx.hydrolearn.org/courses/course-v1:USU+CEE6110+2022/about.
This resources consists of 4 example notebooks and a data files.
Notebooks:
1. Example 1: Data import and exploration
2. Example 2: Implementing a first machine learning model
3. Example 3: Comparing multiple machine learning models
4. Example 4: Model optimization by hyperparameter tuning
Data files:
The data is contained in a flat file and is a record of water use data from a single residential property with manually applied labels to classify the water uses. Columns are:
- StartTime: Start date and time of each individual event. Format: 'YYYY-MM-DD HH:MM:SS'
- EndTime: End date and time of each individual event. Format: 'YYYY-MM-DD HH:MM:SS'
- Duration: Duration of each individual event (end time - start time). Units: Minutes
- Volume: Volume of water used in each individual event. Unit: Gallons
- FlowRate: Average flow rate of each individual event. Unit: Gallons per minute
- Peak: Maximum value observed in each 4-seconds period within each event. Unit: Gallons
- Mode: Most frequent value observed in an event. Unit: Gallons
- Label: Event classification. Values: faucet, toilet, shower, clotheswasher, bathtub
Created: Jan. 28, 2022, 11:25 p.m.
Authors: Jones, Amber Spackman · Horsburgh, Jeffery S. · Bastidas Pacheco, Camilo J.
ABSTRACT:
This resource contains Jupyter Notebooks with examples that illustrate how to work with SQLite databases in Python including database creation and viewing and querying with SQL. The resource is part of set of materials for hydroinformatics and water data science instruction. Complete learning module materials are found in HydroLearn: Jones, A.S., Horsburgh, J.S., Bastidas Pacheco, C.J. (2022). Hydroinformatics and Water Data Science. HydroLearn. https://edx.hydrolearn.org/courses/course-v1:USU+CEE6110+2022/about..
This resources consists of 3 example notebooks and a SQLite database.
Notebooks:
1. Example 1: Querying databases using SQL in Python
2. Example 2: Python functions to query SQLite databases
3. Example 3: SQL join, aggregate, and subquery functions
Data files:
These examples use a SQLite database that uses the Observations Data Model structure and is pre-populated with Logan River temperature data.
Created: Feb. 17, 2022, 2:48 p.m.
Authors: Jones, Amber Spackman · Horsburgh, Jeffery S. · Bastidas Pacheco, Camilo J.
ABSTRACT:
This collection is comprised of resources with code examples that support educational materials for hydroinformatics and water data science. Each resource contains Jupyter notebooks and associated datasets. Complete learning module materials are found in HydroLearn: Jones, A.S., Horsburgh, J.S., Bastidas Pacheco, C.J. (2022). Hydroinformatics and Water Data Science. HydroLearn. https://edx.hydrolearn.org/courses/course-v1:USU+CEE6110+2022/about.
The resources and code examples are:
1. Programmatic Data Access with USGS Data Retrieval
2. Sensor Data Quality Control with pyhydroqc
3. Databases and SQL in Python
4. Introduction to Machine Learning with Residential Water Use Data
Created: Feb. 17, 2022, 3:23 p.m.
Authors: Jones, Amber Spackman · Horsburgh, Jeffery S. · Flint, Courtney G
ABSTRACT:
This resource contains the results of interviews and surveys of instructors of hydroinformatics and water data science courses in the United States conducted in Fall 2021. Potential participants were initially identified via investigator connections, review of relevant literature, and information on institutional and personal websites discovered by Internet searches. Target participants were selected based on their experience teaching hydroinformatics, water data science, or related subject matter at an institution of higher education. We used email to invite contacts to participate, and participants elected to respond to questions either via online survey or recorded interview. During each interview or survey, participants were asked to identify any additional instructors who might be a good fit for the project. The survey was composed using Qualtrics software and administered with links personalized for each participant. Interviews were conducted over Zoom, recorded, and subsequently transcribed. Each interview lasted approximately 45-60 minutes. Procedures were approved by the Utah State University Institutional Review Board for Human Subjects Research with participation limited to instructors within the United States.
This resource contains the list of questions asked to each participant, interview transcripts, and survey responses. Participant names and institutions have been removed from the files.
This resource contains supporting data for the paper Jones AS, Horsburgh JS, Bastidas Pacheco CJ, Flint CG and Lane BA (2022) Advancing Hydroinformatics and Water Data Science Instruction: Community Perspectives and Online Learning Resources. Front. Water 4:901393. doi: 10.3389/frwa.2022.901393.
Created: April 8, 2024, 10:59 p.m.
Authors: Jones, Amber Spackman
ABSTRACT:
This resource was created for the 2024 New Zealand Hydrological Society Data Workshop in Queenstown, NZ. This resource contains Jupyter Notebooks with examples for conducting quality control post processing for in situ aquatic sensor data. The code uses the Python pyhydroqc package to detect anomalies. This resource consists of 3 example notebooks and associated data files. For more information, see the original resource from which this was derived: http://www.hydroshare.org/resource/451c4f9697654b1682d87ee619cd7924.
Notebooks:
1. Example 1: Import and plot data
2. Example 2: Perform rules-based quality control
3. Example 3: Perform model-based quality control (ARIMA)
4. Example 4: Model-based quality control (ARIMA) with user data
Data files:
Data files are available for 6 aquatic sites in the Logan River Observatory. Each file contains data for one site for a single year. Each file corresponds to a single year of data. The files are named according to monitoring site (FranklinBasin, TonyGrove, WaterLab, MainStreet, Mendon, BlackSmithFork) and year. The files were sourced by querying the Logan River Observatory relational database, and equivalent data could be obtained from the LRO website or on HydroShare. Additional information on sites, variables, and methods can be found on the LRO website (http://lrodata.usu.edu/tsa/) or HydroShare (https://www.hydroshare.org/search/?q=logan%20river%20observatory). Each file has the same structure indexed with a datetime column (mountain standard time) with three columns corresponding to each variable. Variable abbreviations and units are:
- temp: water temperature, degrees C
- cond: specific conductance, μS/cm
- ph: pH, standard units
- do: dissolved oxygen, mg/L
- turb: turbidity, NTU
- stage: stage height, cm
For each variable, there are 3 columns:
- Raw data value measured by the sensor (column header is the variable abbreviation).
- Technician quality controlled (corrected) value (column header is the variable abbreviation appended with '_cor').
- Technician labels/qualifiers (column header is the variable abbreviation appended with '_qual').
There is also a file "data.csv" for use with Example 4. If any user wants to bring their own data file, they should structure it similarly to this file with a single column of datetime values and a single column of numeric observations labeled "raw".