The ENTRAIN project is a feasibility study within the NERC Constructing a Digital Environment Strategic Priorities Fund Programme, led by the Centre for Ecology & Hydrology in partnership with University of Lincoln School of Computer Science.
“The Constructing a Digital Environment Strategic Priorities Fund programme aims to develop the digitally enabled environment which benefits policymakers, businesses, communities and individuals. This will happen by creating an integrated network of sensors (in situ and remote sensing based), methodologies and tools for assessing, analysing, monitoring and forecasting the state of the natural environment. This will be done at higher spatial resolutions and at higher frequency than previously possible. This would support responses to acute events but also inform our understanding of long-term environmental change. Multi-disciplinary and inter-disciplinary research and innovation will aid in the successful construction of a 'digital environment'.”
ENTRAIN aims to assess and test implementation of new approaches to managing data flows within sensor networks, and integrating data between research and regulatory sensor networks, in particular automating the quality control and analysis of sensor network data using AI and machine learning techniques.
It is focused on three of CEH’s key sensor networks:
COSMOS-UK : a flagship NERC terrestrial National Capability long-term monitoring network, delivering near real-time measurements of soil moisture and meteorology from ~50 stations across the UK using cutting edge techniques.
UK Greenhouse Gas (GHG) Flux Network : a network of 12 Eddy Covariance (EC) flux towers focused on observing land-atmosphere fluxes of carbon dioxide (CO2) and water vapour (evapotranspiration), with some measuring other trace GHG gas fluxes such as methane. The EC technique measures wind turbulence and deploys fast response infrared gas analysers, usually collecting data at 20 samples per second and streaming this data to CEH servers.
CEH Thames Initiative Research Platform provides high-quality, weekly water quality data from 23 sites along the River Thames and its major tributaries, hourly nutrient and water quality data from a range of auto-analysers and sondes at two automated monitoring stations in the lower Thames, and novel biological data, such as weekly cell abundances of diatoms, algae and cyanobacteria at all 23 sites, using flow cytometry, working in partnership with the EA’s National Water Quality Instrumentation Service (EA-NWQIS).
ENTRAIN will enhance these networks and use the data from the networks to assess opportunities and approaches to sensor data storage, integration, automated quality control, and analysis using machine learning.
It is structured around five work packages:
- Data Integration Structures: assessing sensor data storage approaches, and metadata to provide detailed sensor measurement descriptions appropriate for network integration and machine-actionable analyses, linking sensor measurements across the freshwater environment using digital representations of rivers.
- AI for QC and gap-filling: assessing new approaches for automated quality control and infilling of sensor network data streams (in particular meteorology and water quality) to reduce manual checking and intervention and produce better and more complete data streams.
- Network requirements and upgrades: assessing and describing best-practice approaches to managing and operating research sensor networks, including in-field processing, data transmission and network security.
- Automation of analyses: moving from semi-manual approaches to sensor data stream analysis to fully automated, end-to-end analyses delivering information products directly to decision-makers, and developing new sensor data streams of snow water equivalent from the COSMOS network, indicators of vegetation greenness from phenocams, and nutrient load apportionment within the Thames.
- Stakeholder engagement: engaging with environmental sensor network operators, assessing potential for citizen-science to contribute to the management of complex sensor networks, establishing conversations around future opportunities for AI-based solutions using sensor network data.
ENTRAIN is a one-year project running to April 2020.
Outputs to date:
- New real-time sensor data streams implemented within COSMOS-UK network: Snow Water Equivalent / snow days, RGB signal from fields using Phenocams (for crop growth / land cover change detection)
- Sensor data and metadata standards review.
- Github repository of sensor metadata including examples
- Review of time series database technologies
- Workshop on machine learning for water quality analysis (November 2019) with researchers, regulators, data and machine learning experts from Environment Agency, UKCEH, University of Lincoln, University of Durham, Turing Institute.
- New toolset for river network analysis, using graph structures and python networkX package, for identifying up and downstream river stretches, and linking features such as sewage treatment works effluent discharge locations and water quality monitoring sites.
- CEH river network converted to graph format and linked to monitoring sites within the EA Water Quality Data Archive.
- EGU presentation on river network analysis. Fry, M.J., Rozecky, J. 2020. Graph-based river network analysis for rapid discovery and analysis of linked hydrological data. EGU General Assembly 2020. https://doi.org/10.5194/egusphere-egu2020-17318.
- Chivers, B.D., Wallbank, J., Cole, S.J., Sebek, O., Stanley, S., Fry, M., Leontidis, G. 2020. Imputation of missing sub-hourly precipitation data in a large sensor network: a machine learning approach, Journal of Hydrology, 0022-1694. https://doi.org/10.1016/j.jhydrol.2020.125126