Using Python Packages and HydroShare to Advance Open Data Science and Analytics for Water


Authors:
Owners: Jeffery S. Horsburgh
Type: Resource
Storage: The size of this resource is 31.0 MB
Created: May 26, 2023 at 8:56 p.m.
Last updated: Sep 28, 2023 at 5:38 p.m.
Citation: See how to cite this resource
Sharing Status: Public
Views: 1312
Downloads: 415
+1 Votes: 2 others +1 this
Comments: No comments (yet)

Abstract

Scientific and management challenges in the water domain require synthesis of diverse data. Many data analysis tasks are difficult because datasets are large and complex; standard data formats are not always agreed upon or mapped to efficient structures for analysis; scientists may lack training for tackling large and complex datasets; and it can be difficult to share, collaborate around, and reproduce scientific work. Overcoming barriers to accessing, organizing, and preparing datasets for analyses can transform the way water scientists work. Building on the HydroShare repository’s cyberinfrastructure, we have advanced two Python packages that make data loading, organization, and curation for analysis easier, reducing time spent in choosing appropriate data structures and writing code to ingest data. These packages enable automated retrieval of data from HydroShare and the USGS’s National Water Information System (NWIS) (i.e., a Python equivalent of USGS’ R dataRetrieval package), loading data into performant structures that integrate with existing visualization, analysis, and data science capabilities available in Python, and writing analysis results back to HydroShare for sharing and publication. While these Python packages can be installed for use within any Python environment, we will demonstrate how the technical burden for scientists associated with creating a computational environment for executing analyses can be reduced and how sharing and reproducibility of analyses can be enhanced through the use of these packages within CUAHSI’s HydroShare-linked JupyterHub server.

This HydroShare resource includes all of the materials presented in a workshop at the 2023 CUAHSI Biennial Colloquium.

Content

    No files to display.

Related Resources

The content of this resource references Horsburgh, J. S., S. S. Black (2021). HydroShare Python Client Library (hsclient) Usage Examples, HydroShare, http://www.hydroshare.org/resource/7561aa12fd824ebb8edbee05af19b910
The content of this resource references Horsburgh, J. S., A. S. Jones, S. S. Black, T. O. Hodson (2022). USGS dataretrieval Python Package Usage Examples, HydroShare, http://www.hydroshare.org/resource/c97c32ecf59b4dff90ef013030c54264

Credits

Funding Agencies

This resource was created using funding from the following sources:
Agency Name Award Title Award Number
National Science Foundation Collaborative Research: Elements: Advancing Data Science and Analytics for Water (DSAW) 1931297

How to Cite

Horsburgh, J. S., A. S. Jones, A. M. Castronova, S. Black (2023). Using Python Packages and HydroShare to Advance Open Data Science and Analytics for Water, HydroShare, http://www.hydroshare.org/resource/4f4acbab5a8c4c55aa06c52a62a1d1fb

This resource is shared under the Creative Commons Attribution CC BY.

http://creativecommons.org/licenses/by/4.0/
CC-BY

Comments

There are currently no comments

New Comment

required