Leslie Hsu
U.S. Geological Survey | Coordinator, Community for Data Integration
Subject Areas: | Geomorphology, Experimental Data, Geoinformatics |
Recent Activity
ABSTRACT:
RUBIN, Kenneth H., Department of Geology and Geophysics, University of Hawaii, Honolulu, HI 96822
EarthCube is an NSF program started in 2011 to better enable geoscience research through cyberinfrastructure for data availability and access. The goal is to improve science workflows, especially for data discovery, access, analysis and visualization, for individual domain scientists and multidisciplinary teams, to transform how data-intensive geoscience research is conducted. The long-term vision is to develop interoperable geo-wide capabilities to tackle important research questions in complex, dynamic Earth System processes, building out from existing infrastructure, developing and promoting standards, and educating geoscientists on their adoption. As a community-driven and community-governed effort, with support from the NSF GEO Directorate and the Office of Advanced Cyberinfrastructure, the program spent much of its initial years building a community, exploring ways to address these goals, building demonstration components, and refining our understanding of science workflows across geoscience domains. More than 60 projects have been supported in its first 5 years. During this time, parallel developments in other NSF directorates, Data Repositories, and elsewhere (e.g., the ESIP community) have raised general awareness of geosciences data needs and best practices. A good example is the FAIR initiative, where data are Findable, Accessible, Interoperable and Reusable. The EarthCube Leadership Council, in consultation with stakeholders, has outlined three priority activities for 2018 and beyond: (a) Scientist Engagement and Science Advancement; (b) Registries for Resource Integration and Reuse; and (c) Scientific Workflow and Data Support. In partnership with upcoming NSF Geo domain data science workshops, and with hopes to partner with the new NSF-wide Harnessing the Data Revolution initiative, EarthCube is emerging as a central hub to support geoscience and geoinformatics community data needs, to work with other similar entities to engage scientists to learn about and support their data needs, to drive development and implementation of standards through registries and aligned data facilities, and to lower the barrier for scientists to participate in data-intensive projects in all forms. EarthCube’s future plans and examples of current and completed efforts will be discussed.
ABSTRACT:
FILS, Douglas, Ocean Leadership, 1201 New York Ave, NW, 4th Floor, Washington, DC 20005, SHEPHERD, Adam, Woods Hole Oceangraphic Inst, 266 Woods Hole Road, Woods Hole, MA 02543-1050 and LINGERFELT, Eric, Earth Science Support Office, Boulder, CO 80304
The growth in the amount of geoscience data on the internet is paralleled by the need to address issues of data citation, access and reuse. Additionally, new research tools are driving a demand for machine accessible data as part of researcher workflows.
In the commercial sector, elements of this have been addressed by the use of the Schema.org vocabulary encoded via JSON-LD and coupled with web publishing patterns. Adaptable publishing approaches are already in use by many data facilities as they work to address publishing and FAIR patterns. While these often lack the structured data elements these workflows could be leveraged to additionally implement schema.org style publishing patterns.
This presentation will report on work that grew out of the EarthCube Council of Data Facilities known as, Project 418. Project 418 was a proof of concept funded by the EarthCube Science Support Office for exploring the approach of publishing JSON-LD with schema.org and extensions by a set of NSF data facilities. The goal was focused on using this approach to describe data set resources and evaluate the use of this structured metadata to address discovery. Additionally, we will discuss growing interest by Google and others in leveraging this approach to data set discovery.
The work scoped 47,650 datasets from 10 NSF-funded data facilities. Across these datasets, the harvester found 54,665 data download URLs, and approximately 560K dataset variables and 35k unique identifiers (DOIs, IGSNs or ORCIDs).
The various publishing workflows used by the involved data facilities will be presented along with the harvesting and interface developments. Details on how resources were indexed into text, spatial and graph systems and used for search interfaces will be presented along with future directions underway building on this foundation.
ABSTRACT:
PETERS, Shanan E.1, ROSS, Ian2, CZAPLEWSKI, John3 and LIVNY, Miron2, (1)Department of Geoscience, University of Wisconsin–Madison, 1215 W. Dayton St, Madison, WI 53706, (2)Computer Sciences, University of Wisconsin-Madison, Madison, WI 53706, (3)Department of Geoscience, University of Wisconsin-Madison, 1215 W Dayton St, Madison, WI 53706
Modern scientific databases simplify access to data and information, but a large body of knowledge remains within the published literature and is therefore difficult to access and leverage at scale in scientific workflows. Recent advances in machine reading and learning approaches to converting unstructured text, tables, and figures into structured knowledge bases are promising, but these software tools cannot be deployed for scientific research purposes without access to new and old publications and computing resources. Automation of such approaches is also necessary in order to keep pace with the ever-growing scientific literature. GeoDeepDive bridges the gap between scientists needing to locate and extract information from large numbers of publications and the millions of documents that are distributed by multiple different publishers every year. As of August 2018, GeoDeepDive (GDD) had ingested over 7.4 million full-text documents from multiple commercial, professional society, and open-access publishers. In accordance with GDD-negotiated publisher agreements, original documents and citation metadata are stored locally and prepared for common data mining activities by running software tools that parse and annotate their contents linguistically (natural language processing) and visually (optical character recognition). Vocabularies of terms in domain-specific databases can be labeled throughout the full-text of documents, with results exposed to users via an API. New vocabularies and versions of parsing and annotation tools can be deployed rapidly across all original documents using the distributed computing capacities provided by HTCondor. Downloading, storing, and pre-processing original PDF content from distributed publishers and making these data products available to user applications provides new mechanisms for discovering and using information in publications, augmenting existing databases with new information, and reducing time-to-science.
ABSTRACT:
BARNES, Jason1, BORDEN, Robert C.2, YUNCU, Bilgen3 and HURLEY, Jim3, (1)Exponent, Inc., Environmental & Earth Sciences, 15375 SE 30th Place, Suite 250, Bellevue, NC 98007, (2)B2E, Inc., 1101 Nowell Road, Raleigh, NC 27607, (3)Draper Aden Associates, 1101 Nowell Road, Raleigh, NC 27607
Decades of research has greatly improved our understanding of environmental remediation. While the results of this work are readily accessible to industry experts and academics, much of this information has not percolated down to the people that actually manage, regulate, and implement projects. This reduces the benefits of this research and increases the costs of managing environmental liabilities, especially at sites where contamination persists after an initial remedy is selected. New approaches are needed to communicate this information to users in a timely and accessible manner.
We are expanding the Environmental Restoration (ER) Wiki, developed by the Department of Defense’s ESTCP program, to the Environmental (Enviro) Wiki (www.ENVIRO.wiki) to provide accessible, current information on environmental restoration and other topics including contaminated sediments, natural resources, water and wastewater, air, and climate change impacts. The overall format is similar to Wikipedia with short ‘encyclopedia’ type summaries of current information, technical challenges, and extensive links to reports and project summaries of research funded by SERDP, ESTCP, and other programs. Each page is prepared by recognized experts and subject to review for accuracy and completeness. Existing and upcoming topics include:
Environmental Restoration
Contaminants (Hydrocarbons, CVOCs, Metals, Energetics, PFASs, NDMA, 14D, NAPL)
Subsurface Transport and Attenuation (Physical, Chemical and Biological Processes)
Characterization and Monitoring (DPT, Geophysics, LTM, MBTs, CSIA)
Remediation (SVE, Sparging, P&T, Thermal, ISCO, ISCR, Bio, Phyto, ZVI)
Monitored Natural Attenuation (NSZD, Solvents, Abiotic processes)
Energetics (Deposition, Toxicology, Sampling, Treatment)
Sediments (Capping, Dredging, Risk, Sustainability, Life Cycle Analysis)
Regulatory Issues (Alternative Endpoints, Mass Flux, Risk, Modeling, Sustainability)
Energy, Water & Infrastructure Management
Regulatory Issues
Natural Resources
Climate Change Resilience
The wiki is online at www.ENVIRO.wiki. Please check it out and provide us with your input. Suggestions for new topics, website improvements and general impressions are all welcome at http://www.feedback.enviro.wiki.
ABSTRACT:
SONG, Carol X., Rosen Center for Advanced Computing, Purdue University, 155 South Grant Street, Young Hall, West Lafayette, IN 47907
Science gateways are becoming an integral component of modern collaborative research. They find widespread adoption by research groups to share data, code and tools both within a project and with the broader community. Sustainability beyond initial funding is a significant challenge for a science gateway to continue to operate, update and support the communities it serves. MyGeoHub.org is a geospatial science gateway powered by HUBzero. MyGeoHub employs a business model of hosting multiple research projects on a single HUBzero instance to manage the gateway operations more efficiently and sustainably while lowering the cost to individual projects. This model allows projects to share the gateway’s common capabilities and the underlying hardware and other connected computing resources, and continued maintenance of their sites even after the original funding has run out allowing time for acquiring new funding. MyGeoHub has hosted a number of projects, ranging from hydrologic modeling and data sharing, plant phenotyping, global and local sustainable development, climate variability impact on crops, and most recently, modeling of industry processes to improve reuse and recycling of materials. The shared need to manage, visualize and process geospatial data across the projects has motivated the Geospatial Data Building Blocks (GABBs) development funded by NSF DIBBs. GABBs provides a “File Explorer” type user interface for managing geospatial data (no coding is needed), a builder for visualizing and exploring geo-referenced data without coding, a Python map library and other toolkits for building geospatial analysis and computational tools without requiring GIS programming expertise. GABBs can be added to an existing or new HUBzero site, as is the case on MyGeoHub. Teams use MyGeoHub to coordinate project activities, share files and information, publish tools and datasets (with DOI) to provide not only easy access but also improved reuse and reproducibility of data and code as the interactive online tools and workflows can be used without downloading or installing software. Tools on MyGeoHub have also been used in courses, training workshops and summer camps. MyGeoHub is supporting more than 8000 users annually.
Contact
(Log in to send email) | |
Website | https://sites.google.com/site/lhsu000001/ |
All | 0 |
Collection | 0 |
Resource | 0 |
App Connector | 0 |
Created: Dec. 6, 2018, 4:50 p.m.
Authors: Leslie Hsu
ABSTRACT:
Pardee Symposium held at the GSA 2018 Annual Meeting in Indianapolis, IN, November 4, 2018.
https://gsa.confex.com/gsa/2018AM/webprogram/Session45446.html
Created: Dec. 6, 2018, 4:59 p.m.
Authors: Richard W. Allmendinger
ABSTRACT:
FROM PUNCH CARDS TO MOBILE APPS: A GEOLOGIST'S 40 YEAR ADVENTURE IN COMPUTING
ALLMENDINGER, Richard W., Department of Earth and Atmospheric Sciences, Cornell University, Snee Hall, Ithaca, NY 14853-1504
Few things have changed more than computing over the last 40 years: from slide rulers and expensive calculators (early 70s), punch cards (late 70s and early 80s), desktop computers with graphical user interfaces (mid-1980s to 1990s) laptop computers of the (1990s to mid-2000s), to the current explosion of mobile devices/apps along with the Internet/Cloud. I started developing apps in the mid-1980s and today, my desktop and mobile apps touch about 50,000 people per year. I will highlight two of my 12 major apps: Stereonet and GMDE (Geologic Map Data Extractor).
Stereonet was first written and distributed in the 1980s for the Mac. Today it is available for the Mac, Windows, and Linux and, although it remains single-user focused, it has been expanded to include visualization of observations in a Google satellite view, export 3D symbols for plotting in Google Earth, and upload of data directly to the StraboSpot website/database, tagged with StraboSpot-specific nomenclature. Stereonet also made the jump to iOS where the user can, not only see and plot their data on their iPhone or iPad, but can also use device orientation to make basic measurements in the field. GMDE is also available for all three desktop platforms but not (yet) for mobile devices. In short, GMDE facilitates the task of extracting quantitative data from geologic maps and satellite imagery. A georeferenced basemap with realtime access to elevation at any point from internet elevation services makes it easy to leverage all of the information hidden in a century of high quality geologic mapping. GMDE specializes in structural calculations: 3-point and piercing point problems, rapid digitization of existing orientation symbols, topographic sections, and down-plunge projections as well as an integrated Google satellite view. The digitized data from a static, raster map can be analyzed quantitatively and shared over the Internet to enable new scientific studies. In the future, the algorithms in GMDE can be adapted to enable better geologic mapping itself by allowing the geologist to make realtime calculations in the field that can be interrogated immediately for their significance. After all, technology should not just make our lives easier but enable genuinely new science to be done. http://www.geo.cornell.edu/geology/faculty/RWA/programs/.
Created: Dec. 6, 2018, 5:43 p.m.
Authors: Tikoff, Basil
ABSTRACT:
TIKOFF, Basil1, WALKER, J. Douglas2, NEWMAN, Julie3, WILLIAMS, Randolph T.1, ASH, Jason4, GOOD, Jessica2, NOVAK, Nathan2, BUNSE, Emily Grace4, CHATZARAS, Vasileios1, DUNCAN, Casey J.5, CUNNINGHAM, Hannah3, KAHN, Maureen1, ROBERTS, Nicolas M.6, SNELL, Alexandra K.3, CHAN, Marjorie A.7, KAMOLA, Diane L.4, GLAZNER, Allen8, SCHOENE, Blair9, SPEAR, Frank S.10 and SKEMER, Philip11, (1)Department of Geoscience, University of Wisconsin-Madison, 1215 W Dayton St, Madison, WI 53706, (2)Department of Geology, The University of Kansas, 1475 Jayhawk Blvd., Lindley Hall, Lawrence, KS 66045, (3)Department of Geology and Geophysics, Texas A&M University, College Station, TX 77843, (4)Department of Geology, University of Kansas, Lawrence, KS 66045, (5)University of Utah, University of Utah, Salt Lake City, UT 84108, (6)Department of Geology, Carleton College, Northfield, MN 55057, (7)Dept. of Geology and Geophysics, Univ of Utah, 135 South 1460 East, Room 719, Salt Lake City, UT 84112, (8)Department of Geological Sciences, University of North Carolina at Chapel Hill, 107 Mitchell Hall CB 3315, Chapel Hill, NC 27599-3315, (9)Department of Geosciences, Princeton University, Guyot Hall, Princeton, NJ 08544, (10)Earth and Environmental Sciences, Rensselaer Polytechnic Institute, 110 8th St., Troy, NY 12180, (11)Dept. of Earth and Planetary Sciences, Washington University in St Louis, Saint Louis, MO 63130
The StraboSpot data system was initially designed for the structural geology and tectonics community to standardize structural geology data collection, facilitate sharing of primary data, and promote interactions with other geoscience communities. In order to address the "long tail" of community adoption of digital data storage, we are expanding the data system to include other aspects of spatially based, geological field that are relevant to tectonics.
StraboSpot uses two central concepts to organize the data: Spots and Tags. A Spot is any observation that characterizes a specific area. Spots are related in a purely spatial manner (one spot encloses another spot, which encloses another, etc.). Tags, which have no inherent spatial meaning, link conceptually related spots. The data system is based on a graph database, rather than a relational database approach, to increase flexibility and allow geologically realistic relationships between observations and measurements. StraboSpot can be used either as: 1) A field-based application that runs on iOS and Android mobile devices in either internet connected or disconnected environments; and 2) A desktop system that runs only in connected settings and directly addresses the back-end database.
We leverage the same framework to expand usage to microstructures, experimental deformation, sedimentology, and petrology. The Spot-based system maintains spatial coverage from outcrops to thin-sections, with the same approach useable within a thin section (including a gridding system). For experimental deformation, data about experiments can be directly linked to thin-sections. For sedimentological data, we have introduced a Strat (stratigraphic) section view, in addition to the map view, and practitioners can toggle between these views. Additions for petrology include functionality for mineralogy and igneous and metamorphic features. We are also developing interoperability between StraboSpot, MetPetDB, and EarthChem databases.
Created: Dec. 6, 2018, 5:47 p.m.
Authors: Czaplewski, John
ABSTRACT:
CZAPLEWSKI, John, Department of Geoscience, University of Wisconsin-Madison, 1215 W Dayton St, Madison, WI 53706 and PETERS, Shanan E., Department of Geoscience, University of Wisconsin–Madison, 1215 W. Dayton St, Madison, WI 53706
Rockd, a mobile application for iOS and Android, is designed to leverage location-aware internet devices for the purpose of discovering and documenting geological data and information while in the field. By combining a number of open access geoscientific resources (i.e., Macrostrat, GeoDeepDive, GPlates, Paleobiology Database, and Mindat), Rockd pushes basic geological information to users anywhere on Earth, with local precision and accuracy varying regionally based on data availability. By summarizing location-specific data and information and distributing it to users in an interactive application that functions on- and offline, new field observations made by the nearly 10,000 registered users of Rockd can be contextualized and explicitly related to existing databases. All user-contributed photos and descriptions can be shared with any other internet-connected user via URLs that are created for all user-contributed data. By succinctly compiling and summarizing information from many different sources, Rockd allows users to focus on making new observations that can improve accuracy and completeness of local information, while unregistered App users gain insights into their geological surroundings without needing to be immersed in the intricacies of multiple databases. Participation in Rockd is enhanced by adhering to design principles that focus on user experience and by including gaming-like elements of interactivity. From an informatics point of view, Rockd provides a vehicle to generate large numbers of geological field-contextualized images and descriptions, opening up the possibility of applying data science approaches to outcrop location and characterization.
Created: Dec. 6, 2018, 5:50 p.m.
Authors: Loeffler, Shane
ABSTRACT:
LOEFFLER, Shane1, MYRBO, Amy2, MCEWAN, Reed3, AI, Sijia4 and MORRISON, Alexander4, (1)Flyover Country, Department of Earth Sciences, University of Minnesota, Minneapolis, MN 55455, (2)LacCore/CSDCO, Department of Earth Sciences, University of Minnesota, 500 Pillsbury Dr. SE, Minneapolis, MN 55455, (3)Institute for Health Informatics & Academic Health Center, University of Minnesota, Minneapolis, MN 55455, (4)LacCore/CSDCO, Department of Earth Sciences, University of Minnesota, Minneapolis, MN 55455
The Flyover Country mobile app builds on the recent successes the geoinformatics community has made in improving the interoperability of geoscience databases and tools. The app visualizes data from many different geoscience disciplines in a map-based view that works worldwide. All data can be saved offline, which is useful in outreach (airplane window seat, roadtrip, or hike), research (bringing data into remote field areas), and education (self-guided field trips). The ability to easily visualize data from several geoscience disciplines simultaneously can prompt new hypothesis and lead to new discoveries. Flyover Country continues to add new data sources and expand the reach of the app, with aims of providing geoscience information to the inflight entertainment systems of airlines in the future.
Created: Dec. 6, 2018, 5:58 p.m.
Authors: Peters, Shanan E.
ABSTRACT:
PETERS, Shanan E., Department of Geoscience, University of Wisconsin–Madison, 1215 W. Dayton St, Madison, WI 53706, CZAPLEWSKI, John, Department of Geoscience, University of Wisconsin-Madison, 1215 W Dayton St, Madison, WI 53706 and HUSSON, Jon M., School of Earth and Ocean Sciences, Bob Wright Centre, 3800 Finnerty Rd, Victoria, BC V8P 5C2, Canada
Characterizing the lithology, age, and physical-chemical properties of rocks and sediments in the Earth’s upper crust is necessary to fully assess energy, water, and mineral resources and to address many fundamental research questions in the geo- and paleobiosciences. Although a large number of geological maps, regional geological syntheses, and sample-based measurements have been produced, there is no openly available system that integrates all such rock record-derived data, while also facilitating large-scale, quantitative characterization of the volume, age, and material properties of the upper crust. Here, we describe Macrostrat, a relational geospatial database and supporting cyberinfrastructure that is designed to power geo-applications and to enable quantitative spatial and geochronological analyses of the entire assemblage of surface and subsurface sedimentary, igneous and metamorphic rocks. Macrostrat contains general, comprehensive summaries of the age and properties of over 34,000 lithologically and chronologically-defined geological units distributed across nearly 1,500 regions in the Americas, the Caribbean, New Zealand, and the deep sea. Sample-derived data, including fossil occurrences in the Paleobiology Database, more than 3M geochemical and outcrop-derived measurements, and more than 2.3 million bedrock geologic map units from over 200 map sources, are linked to Macrostrat units and/or lithologies. The database has generated numerous quantitative results and is used as a data platform in several independently developed applications, but it is necessary to expand geographic coverage and to continuously refine age models and material properties to arrive at a more precise and accurate characterization of the upper crust.
Created: Dec. 6, 2018, 6:11 p.m.
Authors: Goring, Simon
ABSTRACT:
GORING, Simon, Department of Geography, University of Wisconsin, 550 N Park St, Madison, WI 53706
The Neotoma Paleoecology Database serves global change science by providing a community-curated data resource (CCDR) for paleoecological and associated paleoenvironmental data. Neotoma currently holds over 4 million individual observations in over 31,000 datasets and 15,000 sites. Major dataset types stored include fossil pollen, vertebrate records, diatoms, ostracodes, testate amoebae, insects, macroinvertebrates, and charcoal, and the data model can be readily extended to other data types. The database also stores 5,000 geochronological age controls, mostly radiocarbon dates, along with associated age-depth model metadata and age inferences. Neotoma includes surface sample datasets with associated environmental variables for data calibration. Data upload, cleaning, and curation are performed by Data Stewards using the Tilia software system, with validation steps including checks of variable names, geographic coordinates, and site name duplication. Neotoma data can be explored and visualized using the map-based Neotoma Explorer and obtained using RESTful Application Programmatic Interfaces (APIs) and the neotoma R package. Third-party websites and apps drawing on Neotoma include the NOAA WDC-Paleoclimatology data portal, the Earth Life Consortium APIs for paleobiological data, the Global Pollen Project, and Flyover Country. Neotoma is governed by an elected Neotoma Leadership Council and welcomes community data contributions, new members, and new stewards.
Created: Dec. 6, 2018, 6:18 p.m.
Authors: Noren, Anders
ABSTRACT:
NOREN, Anders, CSDCO / LacCore, University of Minnesota, 116 Church St SE, Minneapolis, MN 55455
Open Core Data, a collaboration between the Consortium for Ocean Leadership, Continental Scientific Drilling Coordination Office (CSDCO), and the Interdisciplinary Earth Data Alliance (IEDA), aims to be a key linked open data component of drilling and coring data from both continents and oceans. Guided by FAIR principles, Open Core Data will hold data generated at facilities (CSDCO/LacCore and JANUS data from the JOIDES Resolution at the start), provide semantic enhancement to ingested datasets, and standards-based, human- and machine-readable formats for discovery and access through multiple means: simple web browser user interface, programmatic data access, and web services for data systems requiring drilling and coring information (e.g. Neotoma Paleoecology Database, Paleobiology Database, GPlates, MagIC Magnetics Database, National Geothermal Data System, archives such as NOAA and PANGAEA, etc.). This approach is motivated by the recognition that different scientific communities have varying reasons and requirements for access to drilling data. By leveraging the similar structures of drilling and coring data across institutions, Open Core Data can serve multiple communities and institutions for data discovery, access, and distribution, utilizing the best technological resources available, and it can provide a common platform for development of tools for data visualization and other purposes. It is designed to enable future extension and support of additional scientific communities, including polar coring and drilling, and the large marine coring community.
Created: Dec. 6, 2018, 6:20 p.m.
Authors: Markey, Kelsey
ABSTRACT:
CARTER, Megan and LEHNERT, Kerstin A., Lamont-Doherty Earth Observatory, Columbia University, 61 Route 9W, Palisades, NY 10964
(Presented by Kelsey Markey)
Physical samples and the data generated by their study are fundamental to progress across many Earth science disciplines and, thus, should be FAIR (findable, accessible, interoperable, and reusable) to enable future scientific utility and transparency in research. The ability to find and re-use existing sample-based analyses is dependent both on the use of unique sample identifiers and on the quality and accessibility of sample documentation. The System for Earth Sample Registration (SESAR; www.geosamples.org) aids researchers and sample curators in making sample metadata FAIR, and in obtaining IGSNs (International Geo Sample Numbers) as globally unique and persistent identifiers that resolve to sample metadata. IGSNs are used to unambiguously refer to samples in the literature and in databases, allowing disparate analyses of samples to be discovered more readily and linked to each other. Data syntheses like EarthChem’s PetDB database, which has served as a pivotal source of geochemical data for more than 20 years, would simply not be possible without the use of unique identifiers and without access to key sample metadata. Further encouraging researchers to obtain and use IGSNs will increasingly enhance discovery of distinct, yet complementary, data produced on same or similar samples across an even broader range of resources. This presentation and demonstration will show how SESAR services currently support diverse workflows both for a broad range of researchers and sample types and what plans exist for the future. It will also demonstrate how SESAR supports sample-based data syntheses and repositories, especially those operated by EarthChem (www.earthchem.org).
Created: Dec. 6, 2018, 6:24 p.m.
Authors: Bowring, James F.
ABSTRACT:
BOWRING, James F., WALTON, Julius, MAROTTA, Jake, BARRETT, Ryan and BARRETT, Bryce, Computer Science, College of Charleston, 66 George Street, Charleston, SC 29424
The Cyber Infrastructure Research and Development Lab for the Earth Sciences (CIRDLES.org) is an undergraduate software development facility at the College of Charleston. Faculty and students in CIRDLES collaborate with those from the earth sciences at other institutions to research and develop an ecosystem of software products that support geochronology and interact with products from the System for Earth Sample Registration (SESAR), such as Geochron.org, via the use of the International Geo Sample Number (IGSN), a unique identifier for a sample. A primary goal of this work is to provide facilities in the software products that ease the burden on scientists of 1) collecting and collating metadata about their data and analyses, and 2) registering their samples and results with IGSNs in a way that supports their own and the community’s management, both electronic and physical, of these artifacts.
Our suite of products is under continuous improvement in an open source context, located at github.com/CIRDLES. The suite includes
1) MARS - middleware for assisting with the registration of samples: MARS is designed to serve the needs of repositories with thousands of legacy samples by providing automation that maps diverse local metadata schema to SESAR’s schema and then supports the bulk upload and registration of sample metadata and the return of newly assigned IGSNs to the repository;
2) Topsoil - an anagram of and replacement for Isoplot: Topsoil is evolving into a visualization library for geochronological data that can be used stand-alone or in other applications and eventually as a webservice in support of visualizations of IGSN-registered samples archived in repositories such as Geochron.org.
3) CHRONI: a mobile application that interacts with Geochron.org using IGSNs to provide for customizable views of archived data and visual artifacts for scientists in the field.
4) ET_Redux: the workflow automation engine that many geochronologists use for Uranium-Lead-Thorium dating with Thermal Ionization and Laser Ablation Inductively Coupled Plasma Mass Spectrometry (TIMS, LA-ICP-MS) that supports IGSN-based sample registration.
5) Squid: a new workflow automation tool for Sensitive high-resolution ion microprobe (SHRIMP) - based Uranium-Lead geochronology that will also support IGSN-based sample registration.
Created: Dec. 6, 2018, 6:27 p.m.
Authors: Arrowsmith, Ramón
ABSTRACT:
ARROWSMITH, Ramón, School of Earth and Space Exploration, Arizona State University, Tempe, AZ 85287, CROSBY, Christopher J., UNAVCO, 6350 Nautilus Drive, Boulder, CO 80301 and NANDIGAM, Viswanath, San Diego Supercomputer Center, University of California, San Diego, MC 0505, 9500 Gilman Drive, La Jolla, CA 92093-0505
OpenTopography (OT) democratizes access to topographic data, services, and knowledge, enabling fundamental discoveries and innovative applications. OT focuses on improved data access using best practices in cyberinfrastructure and geoinformatics. We deliver topographic data (laser, radar, and photogrammetry) at a range of scales. We enable efficient access to global raster data (30-100 m/pix), but our emphasis has always been high-resolution topography (HRT; <1m/pix or >1 sample/sq. meter). OT currently holds 274 lidar point cloud datasets covering ~217,000 km2. More than a trillion points are available for on-demand processing and download. This is a considerable investment in HRT, valued at greater than $30 million, and represent the efforts of the NSF research community, public agencies, and international partners. OT distributes these data at various product levels at no cost to users, saving time and driving scientific innovation and broader impacts. OT has over 22,000 unique visitors per month, and almost 75,000 unique users have accessed data and processing services via OT. In 2017, 66,061 browser-based jobs were run with another 33,344 jobs via API calls. These computations and analyses support substantial academic, educational, and applied use and reuse of the valuable OT data holdings. OT exemplifies domain cyberinfrastructure evolving to become a production data facility upon which researchers, educators, and many others rely. Our partners depend on OT for data management because of our efficient distribution of data to a wide and diverse community of users, thus increasing the impact and return on investment of the data. OT supports tiered data access including lightweight network linked kmz hillshades all the way to custom derived topographic products such as drainage network properties and solar insolation distributions (for global datasets). Newly implemented browser based visualization of point cloud datasets enables rich 3D interactions without the need to download data or additional software tools. OT is built on open source software, and actively contributes to such efforts.
Created: Dec. 6, 2018, 6:29 p.m.
Authors: Greg Tucker
ABSTRACT:
TUCKER, Gregory E., CIRES & Department of Geological Sciences, University of Colorado, 2200 Colorado Ave, Boulder, CO 80309-0399; Community Surface Dynamics Modeling System (CSDMS), University of Colorado, Campus Box 399, Boulder, CO 80309, HUTTON, Eric, Community Surface Dynamics Modeling System (CSDMS), University of Colorado, Cam, Boulder, CO 80309 and PIPER, Mark, Community Surface Dynamics Modeling System (CSDMS), University of Colorado, Campus Box 399, Boulder, CO 80309; Instaar, University of Colorado, campus Box 450, 1560 30th St, Boulder, CO 80303
Our planet’s surface is a restless place. Understanding the processes of weathering, erosion, and deposition that shape it is critical for applications ranging from short-term hazard analysis to long-term sedimentary stratigraphy and landscape/seascape evolution. Improved understanding requires computational models, which link process mechanics and chemistry to the observable geologic and geomorphic record. Historically, earth-surface process models have often been complex and difficult to work with. To help improve this situation and make the discovery process more efficient, the CSDMS Python Modeling Tool (PyMT) provides an environment in which community-built numerical models and tools can be initialized and run directly from a Python command line or Jupyter notebook. By equipping each model with a standardized set of command functions, known collectively as the Basic Model Interface (BMI), the task of learning and applying models becomes much easier. Using BMI functions, models can also be coupled together to explore dynamic feedbacks among different earth systems. To illustrate how PyMT works and the advantages it provides, we present an example that couples a terrestrial landscape evolution model (CHILD) with a marine sediment transport and stratigraphy model (SedFlux3D). Experiments with the resulting coupled model provide insights into how terrestrial “signals,” such as variations in mean precipitation, are recorded in deltaic stratigraphy. The example also illustrates the utility of PyMT’s tools, such as the ability to map variables between a regular rectilinear grid and an irregular triangulated grid. By simplifying the process of learning, operating, and coupling models, PyMT frees researchers to focus on exploring ideas, testing hypotheses, and comparing models with data.
Created: Dec. 6, 2018, 6:32 p.m.
Authors: Nathan Lyons
ABSTRACT:
LYONS, Nathan J.1, BANDARAGODA, Christina2, BARNHART, Katherine R.3, GASPARINI, Nicole M.1, HOBLEY, Daniel E.J.4, HUTTON, Eric5, ISTANBULLUOGLU, Erkan2, MOUCHENE, Margaux3, SIDDHARTHA NUDURUPATI, Sai2 and TUCKER, Gregory E.3, (1)Department of Earth and Environmental Sciences, Tulane University, New Orleans, LA 70118, (2)Civil & Environmental Engineering, University of Washington, Seattle, WA 98195, (3)CIRES, University of Colorado, Boulder, CO 80309, (4)School of Earth and Ocean Sciences, Cardiff University, Cardiff, United Kingdom, (5)INSTAAR, University of Colorado, Boulder, CO 80303
Landlab is designed for scientists and students to build numerical landscape models of earth surface dynamics. This open-source software is written in the popular and user-friendly Python programming language. The toolkit includes an engine to construct regular and irregular model grids; a library of components that simulate earth surface processes; support functions for tasks such as reading in a DEM and input variables, setting boundary conditions, and plotting and outputting data; and data structures for storing and operating on datasets with spatial and temporal dimensions. With these tools, a Landlab user can build a unique model to explore earth surface dynamics by coupling process components that act on the data associated with a model grid. This approach eliminates the need to recode fundamental model elements each time a new problem is explored and provides the flexibility to include the processes relevant to the problem. The software, tutorials, and documentation are freely available for download (http://landlab.github.io). Landlab models can also be built and run on Hydroshare (www.hydroshare.org), an online collaborative environment for sharing data, models, and code. This software can also be used in geoscience education. Landlab teaching tools illustrate examples of physical processes using numerical models intended for undergraduate and graduate students including for those without computer programming experience (https://github.com/landlab/landlab_teaching_tools). Here we present Landlab, its data capabilities, and demonstrate recent and forthcoming additions to the software.
Created: Dec. 6, 2018, 6:35 p.m.
Authors: Jerad Bales
ABSTRACT:
BALES, Jerad and CASTRONOVA, Anthony, Consortium of Universities for the Advancement of Hydrologic Science, Inc., 150 Cambridgepark Drive, Suite 203, Cambridge, MA 02140
The array of water challenges requires an integrated multi-disciplinary research-community approach to understand and identify scientifically sound solutions and support information and data infrastructure that enables discovery, dissemination, and transparency. The Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI) was founded on the premise that big hydrologic science demands collaboration and integration of information from multiple sources to support advancements in hydrologic research and the infrastructure for community activities in the hydrologic sciences. CUAHSI has successfully built and maintained innovative and robust tools for data discovery, publication, storage, analysis, modeling, and collaboration, harnessing the data revolution. CUAHSI’s Water Data Services support discovery, publication, storage, and re-use of water data and models, as well as tools for collaboration, thereby promoting reproducible discovery in the water sciences through interdisciplinary collaboration. We currently employ two primary technologies developed in collaboration with our community members: the CUAHSI Hydrologic Information System (HIS) and HydroShare. These domain-specific, open-source, interoperable technologies were developed based on community input from the hydrologic science research community. All CUAHSI’s software is open source and developed through an open process with community code repositories on GitHub [35] that facilitate community use of and contributions to this code. This presentation will focus on HydroShare, which is a platform for sharing hydrologic resources (data, models, model instances, geographic coverages, etc.), enabling the scientific community to more easily and freely share products resulting from their research, including the data, models, and workflow scripts used to create scientific publications. HydroShare also includes a variety of social functions, such as resource sharing within a specified group, the ability to comment on and rate resources, and support for integrating external applications to view and use resources without downloading them. HydroShare also has applications beyond the hydrologic sciences, as will be demonstrated.
Created: Dec. 6, 2018, 6:37 p.m.
Authors: Song, Carol X.
ABSTRACT:
SONG, Carol X., Rosen Center for Advanced Computing, Purdue University, 155 South Grant Street, Young Hall, West Lafayette, IN 47907
Science gateways are becoming an integral component of modern collaborative research. They find widespread adoption by research groups to share data, code and tools both within a project and with the broader community. Sustainability beyond initial funding is a significant challenge for a science gateway to continue to operate, update and support the communities it serves. MyGeoHub.org is a geospatial science gateway powered by HUBzero. MyGeoHub employs a business model of hosting multiple research projects on a single HUBzero instance to manage the gateway operations more efficiently and sustainably while lowering the cost to individual projects. This model allows projects to share the gateway’s common capabilities and the underlying hardware and other connected computing resources, and continued maintenance of their sites even after the original funding has run out allowing time for acquiring new funding. MyGeoHub has hosted a number of projects, ranging from hydrologic modeling and data sharing, plant phenotyping, global and local sustainable development, climate variability impact on crops, and most recently, modeling of industry processes to improve reuse and recycling of materials. The shared need to manage, visualize and process geospatial data across the projects has motivated the Geospatial Data Building Blocks (GABBs) development funded by NSF DIBBs. GABBs provides a “File Explorer” type user interface for managing geospatial data (no coding is needed), a builder for visualizing and exploring geo-referenced data without coding, a Python map library and other toolkits for building geospatial analysis and computational tools without requiring GIS programming expertise. GABBs can be added to an existing or new HUBzero site, as is the case on MyGeoHub. Teams use MyGeoHub to coordinate project activities, share files and information, publish tools and datasets (with DOI) to provide not only easy access but also improved reuse and reproducibility of data and code as the interactive online tools and workflows can be used without downloading or installing software. Tools on MyGeoHub have also been used in courses, training workshops and summer camps. MyGeoHub is supporting more than 8000 users annually.
Created: Dec. 6, 2018, 6:39 p.m.
Authors: Barnes, Jason
ABSTRACT:
BARNES, Jason1, BORDEN, Robert C.2, YUNCU, Bilgen3 and HURLEY, Jim3, (1)Exponent, Inc., Environmental & Earth Sciences, 15375 SE 30th Place, Suite 250, Bellevue, NC 98007, (2)B2E, Inc., 1101 Nowell Road, Raleigh, NC 27607, (3)Draper Aden Associates, 1101 Nowell Road, Raleigh, NC 27607
Decades of research has greatly improved our understanding of environmental remediation. While the results of this work are readily accessible to industry experts and academics, much of this information has not percolated down to the people that actually manage, regulate, and implement projects. This reduces the benefits of this research and increases the costs of managing environmental liabilities, especially at sites where contamination persists after an initial remedy is selected. New approaches are needed to communicate this information to users in a timely and accessible manner.
We are expanding the Environmental Restoration (ER) Wiki, developed by the Department of Defense’s ESTCP program, to the Environmental (Enviro) Wiki (www.ENVIRO.wiki) to provide accessible, current information on environmental restoration and other topics including contaminated sediments, natural resources, water and wastewater, air, and climate change impacts. The overall format is similar to Wikipedia with short ‘encyclopedia’ type summaries of current information, technical challenges, and extensive links to reports and project summaries of research funded by SERDP, ESTCP, and other programs. Each page is prepared by recognized experts and subject to review for accuracy and completeness. Existing and upcoming topics include:
Environmental Restoration
Contaminants (Hydrocarbons, CVOCs, Metals, Energetics, PFASs, NDMA, 14D, NAPL)
Subsurface Transport and Attenuation (Physical, Chemical and Biological Processes)
Characterization and Monitoring (DPT, Geophysics, LTM, MBTs, CSIA)
Remediation (SVE, Sparging, P&T, Thermal, ISCO, ISCR, Bio, Phyto, ZVI)
Monitored Natural Attenuation (NSZD, Solvents, Abiotic processes)
Energetics (Deposition, Toxicology, Sampling, Treatment)
Sediments (Capping, Dredging, Risk, Sustainability, Life Cycle Analysis)
Regulatory Issues (Alternative Endpoints, Mass Flux, Risk, Modeling, Sustainability)
Energy, Water & Infrastructure Management
Regulatory Issues
Natural Resources
Climate Change Resilience
The wiki is online at www.ENVIRO.wiki. Please check it out and provide us with your input. Suggestions for new topics, website improvements and general impressions are all welcome at http://www.feedback.enviro.wiki.
Created: Dec. 6, 2018, 6:40 p.m.
Authors: Peters, Shanan E.
ABSTRACT:
PETERS, Shanan E.1, ROSS, Ian2, CZAPLEWSKI, John3 and LIVNY, Miron2, (1)Department of Geoscience, University of Wisconsin–Madison, 1215 W. Dayton St, Madison, WI 53706, (2)Computer Sciences, University of Wisconsin-Madison, Madison, WI 53706, (3)Department of Geoscience, University of Wisconsin-Madison, 1215 W Dayton St, Madison, WI 53706
Modern scientific databases simplify access to data and information, but a large body of knowledge remains within the published literature and is therefore difficult to access and leverage at scale in scientific workflows. Recent advances in machine reading and learning approaches to converting unstructured text, tables, and figures into structured knowledge bases are promising, but these software tools cannot be deployed for scientific research purposes without access to new and old publications and computing resources. Automation of such approaches is also necessary in order to keep pace with the ever-growing scientific literature. GeoDeepDive bridges the gap between scientists needing to locate and extract information from large numbers of publications and the millions of documents that are distributed by multiple different publishers every year. As of August 2018, GeoDeepDive (GDD) had ingested over 7.4 million full-text documents from multiple commercial, professional society, and open-access publishers. In accordance with GDD-negotiated publisher agreements, original documents and citation metadata are stored locally and prepared for common data mining activities by running software tools that parse and annotate their contents linguistically (natural language processing) and visually (optical character recognition). Vocabularies of terms in domain-specific databases can be labeled throughout the full-text of documents, with results exposed to users via an API. New vocabularies and versions of parsing and annotation tools can be deployed rapidly across all original documents using the distributed computing capacities provided by HTCondor. Downloading, storing, and pre-processing original PDF content from distributed publishers and making these data products available to user applications provides new mechanisms for discovering and using information in publications, augmenting existing databases with new information, and reducing time-to-science.
Created: Dec. 6, 2018, 6:42 p.m.
Authors: Fils, Douglas
ABSTRACT:
FILS, Douglas, Ocean Leadership, 1201 New York Ave, NW, 4th Floor, Washington, DC 20005, SHEPHERD, Adam, Woods Hole Oceangraphic Inst, 266 Woods Hole Road, Woods Hole, MA 02543-1050 and LINGERFELT, Eric, Earth Science Support Office, Boulder, CO 80304
The growth in the amount of geoscience data on the internet is paralleled by the need to address issues of data citation, access and reuse. Additionally, new research tools are driving a demand for machine accessible data as part of researcher workflows.
In the commercial sector, elements of this have been addressed by the use of the Schema.org vocabulary encoded via JSON-LD and coupled with web publishing patterns. Adaptable publishing approaches are already in use by many data facilities as they work to address publishing and FAIR patterns. While these often lack the structured data elements these workflows could be leveraged to additionally implement schema.org style publishing patterns.
This presentation will report on work that grew out of the EarthCube Council of Data Facilities known as, Project 418. Project 418 was a proof of concept funded by the EarthCube Science Support Office for exploring the approach of publishing JSON-LD with schema.org and extensions by a set of NSF data facilities. The goal was focused on using this approach to describe data set resources and evaluate the use of this structured metadata to address discovery. Additionally, we will discuss growing interest by Google and others in leveraging this approach to data set discovery.
The work scoped 47,650 datasets from 10 NSF-funded data facilities. Across these datasets, the harvester found 54,665 data download URLs, and approximately 560K dataset variables and 35k unique identifiers (DOIs, IGSNs or ORCIDs).
The various publishing workflows used by the involved data facilities will be presented along with the harvesting and interface developments. Details on how resources were indexed into text, spatial and graph systems and used for search interfaces will be presented along with future directions underway building on this foundation.
Created: Dec. 6, 2018, 6:44 p.m.
Authors: Rubin, Kenneth H.
ABSTRACT:
RUBIN, Kenneth H., Department of Geology and Geophysics, University of Hawaii, Honolulu, HI 96822
EarthCube is an NSF program started in 2011 to better enable geoscience research through cyberinfrastructure for data availability and access. The goal is to improve science workflows, especially for data discovery, access, analysis and visualization, for individual domain scientists and multidisciplinary teams, to transform how data-intensive geoscience research is conducted. The long-term vision is to develop interoperable geo-wide capabilities to tackle important research questions in complex, dynamic Earth System processes, building out from existing infrastructure, developing and promoting standards, and educating geoscientists on their adoption. As a community-driven and community-governed effort, with support from the NSF GEO Directorate and the Office of Advanced Cyberinfrastructure, the program spent much of its initial years building a community, exploring ways to address these goals, building demonstration components, and refining our understanding of science workflows across geoscience domains. More than 60 projects have been supported in its first 5 years. During this time, parallel developments in other NSF directorates, Data Repositories, and elsewhere (e.g., the ESIP community) have raised general awareness of geosciences data needs and best practices. A good example is the FAIR initiative, where data are Findable, Accessible, Interoperable and Reusable. The EarthCube Leadership Council, in consultation with stakeholders, has outlined three priority activities for 2018 and beyond: (a) Scientist Engagement and Science Advancement; (b) Registries for Resource Integration and Reuse; and (c) Scientific Workflow and Data Support. In partnership with upcoming NSF Geo domain data science workshops, and with hopes to partner with the new NSF-wide Harnessing the Data Revolution initiative, EarthCube is emerging as a central hub to support geoscience and geoinformatics community data needs, to work with other similar entities to engage scientists to learn about and support their data needs, to drive development and implementation of standards through registries and aligned data facilities, and to lower the barrier for scientists to participate in data-intensive projects in all forms. EarthCube’s future plans and examples of current and completed efforts will be discussed.