Papers by Boris Faybishenko

Concurrency and Computation: Practice and Experience
SummaryScientific datasets are growing rapidly and becoming critical to next‐generation scientifi... more SummaryScientific datasets are growing rapidly and becoming critical to next‐generation scientific discoveries. The validity of scientific results relies on the quality of data used and data are often subject to change, for example, due to observation additions, quality assessments, or processing software updates. The effects of data change are not well understood and difficult to predict. Datasets are often repeatedly updated and recomputing derived data products quickly becomes time consuming and resource intensive and may in some cases not even be necessary, thus delaying scientific advance. Despite its importance, there is a lack of systematic approaches for best comparing data versions to quantify the changes, and ad‐hoc or manual processes are commonly used. In this article, we propose a novel hierarchical approach for analyzing data changes, including real‐time (online) and offline analyses. We employ a variety of fast‐to‐compute numerical analyses, graphical data change repr...

Hydrogeology, Chemical Weathering, and Soil Formation
A review of the status of fundamental research into soil genesis and development is given, togeth... more A review of the status of fundamental research into soil genesis and development is given, together with a discussion of the outstanding problems from various perspectives, such as the geological, hydrological, and soil ecological points of view. The urgency of understanding what soil is, how it forms and evolves, relates fundamentally to its connection with the cycling of water and those elements of deep significance to biology, e.g. carbon, nitrogen, and phosphorus, as well as to the uptake and fate of the (mostly) solar energy input. The coupling is inherent in the close relationships between soil genesis and the formative process of chemical weathering, together with its abiotic drawdown of atmospheric carbon, as well as the relationship between soil evolution and the biological processes that change the soil. Each of these phases links soils to the atmospheric carbon composition and the Earth's climate system. More recently, the link between soil formation and Earth's water cycle has become clearer with the recognition that field weathering rates are more likely flux-limited (water, organic acids, and reaction products) than kinetics-limited. While a link between water and CO2 drawdown appears explicit in the photosynthetic reaction, the relationship between plant productivity and transpiration fundamentally links water and cycling of elements such as carbon, nitrogen, and phosphorus. The summary is intended to place the works of the present book into the context of present research efforts and future goals.

Neural Computing and Applications
We present an approach that uses a deep learning model, in particular, a MultiLayer Perceptron, f... more We present an approach that uses a deep learning model, in particular, a MultiLayer Perceptron, for estimating the missing values of a variable in multivariate time series data. We focus on filling a long continuous gap (e.g., multiple months of missing daily observations) rather than on individual randomly missing observations. Our proposed gap filling algorithm uses an automated method for determining the optimal MLP model architecture, thus allowing for optimal prediction performance for the given time series. We tested our approach by filling gaps of various lengths (three months to three years) in three environmental datasets with different time series characteristics, namely daily groundwater levels, daily soil moisture, and hourly Net Ecosystem Exchange. We compared the accuracy of the gap-filled values obtained with our approach to the widely used R-based time series gap filling methods and . The results indicate that using an MLP for filling a large gap leads to better resu...

In some cases there may be a milestone where an item is being fabricated, maintenance is being pe... more In some cases there may be a milestone where an item is being fabricated, maintenance is being performed on a facility, or a document is being issued through a formal document control process where it specifically calls out a formal review of the document. In these cases, documentation (e.g., inspection report, maintenance request, work planning package documentation or the documented review of the issued document through the document control process) of the completion of the activity, along with the Document Cover Sheet, is sufficient to demonstrate achieving the milestone. NOTE 2: If QRL 1, 2, or 3 is not assigned, then the QRL 4 box must be checked, and the work is understood to be performed using laboratory specific QA requirements. This includes any deliverable developed in conformance with the respective National Laboratory / Participant, DOE or NNSA-approved QA Program. NOTE 3: If the lab has an NQA-1 program and the work to be conducted requires an NQA-1 program, then the QRL-1 box must be checked in the work Package and on the Appendix E cover sheet and the work must be performed in accordance with the Lab's NQA-1 program. The QRL-4 box should not be checked LIST OF CONTRIBUTING CO-AUTHORS OF FEEDER REPORTS

Frontiers in Microbiology, 2021
Snowmelt dynamics are a significant determinant of microbial metabolism in soil and regulate glob... more Snowmelt dynamics are a significant determinant of microbial metabolism in soil and regulate global biogeochemical cycles of carbon and nutrients by creating seasonal variations in soil redox and nutrient pools. With an increasing concern that climate change accelerates both snowmelt timing and rate, obtaining an accurate characterization of microbial response to snowmelt is important for understanding biogeochemical cycles intertwined with soil. However, observing microbial metabolism and its dynamics non-destructively remains a major challenge for systems such as soil. Microbial volatile compounds (mVCs) emitted from soil represent information-dense signatures and when assayed non-destructively using state-of-the-art instrumentation such as Proton Transfer Reaction-Time of Flight-Mass Spectrometry (PTR-TOF-MS) provide time resolved insights into the metabolism of active microbiomes. In this study, we used PTR-TOF-MS to investigate the metabolic trajectory of microbiomes from a sub...

Changes to the Earth’s climate are expected to negatively impact water resources in the future. I... more Changes to the Earth’s climate are expected to negatively impact water resources in the future. It is important to have accurate modelling of river flow and water quality to make optimal decisions for water management. Machine learning and deep learning models have become promising methods for making such hydrological predictions. Using these models, however, requires careful consideration both of data constraints and of model complexity for a given problem. Here, we use machine learning (ML) models to predict monthly stream water temperature records at three monitoring locations in the Northwestern United States with long-term datasets, using meteorological data as predictors. We fit three ML models: a Multiple Linear Regression, a Random Forest Regression, and a Support Vector Regression, and compare them against two baseline models: a persistence model and historical model. We show that all three ML models are reasonably able to predict mean monthly stream temperatures with root ...

Communication with stakeholders, regulatory agencies, and the public is an essential part of impl... more Communication with stakeholders, regulatory agencies, and the public is an essential part of implementing different remediation and monitoring activities, and developing site closure strategies at contaminated sites. Modeling of contaminant plume evolution plays a critical role in estimating the benefit, cost, and risk of particular options. At the same time, effective visualization of monitoring data and modeling results are particularly important for conveying the significance of the results and observations. In this paper, we present the results of the Advanced Simulation Capability for Environmental Management (ASCEM) project, including the discussion of the capabilities of newly developed ASCEM software package, along the its application to the F-Area Seepage Basins located in the U.S. Department of Energy Savannah River Site (SRS). ASCEM software includes state-of-the-art numerical methods for simulating complex flow and reactive transport, as well as various toolsets such as a graphical user interface (GUI), visualization, data management, uncertainty quantification, and parameter estimation. Using this software, we have developed an advanced visualization of tritium plume migration coupled with a data management system, and simulated a three-dimensional model of flow and plume evolution on a high-performance computing platform. We evaluated the effect of engineered flow barriers on a nonreactive tritium plume, through advanced plume visualization and modeling of tritium plume migration. In addition, we developed a geochemical reaction network to describe complex geochemical processes at the site, and evaluated the impact of coupled hydrological and geochemical heterogeneity. These results are expected to support SRS's monitoring activities and operational decisions.
Goldschmidt2021 abstracts, 2021

Journal of Machine Learning for Modeling and Computing, 2021
Machine learning can provide sustainable solutions to gap-fill groundwater (GW) data needed to ad... more Machine learning can provide sustainable solutions to gap-fill groundwater (GW) data needed to adequately constrain watershed models. However, imputing missing extremes is more challenging than other parts of a hydrograph. To impute missing subhourly data, including extremes, within GW time-series data collected at multiple wells in the East River watershed, located in southwestern Colorado, we consider a single-well imputation (SWI) and a multiple-well imputation (MWI) approach. SWI gap-fills missing GW entries in a well using the same well's time-series data; MWI gap-fills a specific well's missing GW entry using the time series of neighboring wells. SWI takes advantage of linear interpolation and random forest (RF) approaches, whereas MWI exploits only the RF approach. We also use an information entropy framework to develop insights into how missing data patterns impact imputation. We discovered that if gaps were at random intervals, SWI could accurately impute up to 90% of missing data over an approximately two-year period. Contiguous gaps constituted more complex scenarios for imputation and required the use of MWI. Information entropy suggested that if gaps were contiguous, up to 50% of missing GW data could be estimated accurately over an approximately two-year period. The RF-feature importance suggested that a time feature (months) and a space feature (neighboring wells) were the most important predictors in the SWI and MWI. We also noted that neither SWI nor MWI methods could capture the missing extremes of a hydrograph. To counter this, we developed a new sequential approach and demonstrated the imputation of missing extremes in a GW time series with high accuracy.

New Phytologist, 2021
Summary Deep‐water access is arguably the most effective, but under‐studied, mechanism that plant... more Summary Deep‐water access is arguably the most effective, but under‐studied, mechanism that plants employ to survive during drought. Vulnerability to embolism and hydraulic safety margins can predict mortality risk at given levels of dehydration, but deep‐water access may delay plant dehydration. Here, we tested the role of deep‐water access in enabling survival within a diverse tropical forest community in Panama using a novel data‐model approach. We inversely estimated the effective rooting depth (ERD, as the average depth of water extraction), for 29 canopy species by linking diameter growth dynamics (1990–2015) to vapor pressure deficit, water potentials in the whole‐soil column, and leaf hydraulic vulnerability curves. We validated ERD estimates against existing isotopic data of potential water‐access depths. Across species, deeper ERD was associated with higher maximum stem hydraulic conductivity, greater vulnerability to xylem embolism, narrower safety margins, and lower mort...

Representativeness and quality of collected meteorological data impact accuracy and precision of ... more Representativeness and quality of collected meteorological data impact accuracy and precision of climate, hydrological, and biogeochemical analyses and predictions. We developed a comprehensive Quality Assurance (QA) and Quality Control (QC) statistical framework, consisting of three major steps: Step 1—Preliminary data exploration, i.e., processing of raw datasets, with the challenging problems of time formatting and combining datasets of different lengths and different time intervals; Step 2—QA of the datasets, including detecting and flagging of duplicates, outliers, and extreme data; and Step 3—the development of time series of a desired frequency, imputation of missing values, visualization and a final statistical summary. The paper includes two use cases based on the time series data collected at the Billy Barr meteorological station (East River Watershed, Colorado), and the Barro Colorado Island (BCI, Panama) meteorological station. The developed statistical methods are suita...

Frontiers in Water, 2020
Meteorological records, including precipitation, commonly have missing values. Accurate imputatio... more Meteorological records, including precipitation, commonly have missing values. Accurate imputation of missing precipitation values is challenging, however, because precipitation exhibits a high degree of spatial and temporal variability. Data-driven spatial interpolation of meteorological records is an increasingly popular approach in which missing values at a target station are imputed using synchronous data from reference stations. The success of spatial interpolation depends on whether precipitation records at the target station are strongly correlated with precipitation records at reference stations. However, the need for reference stations to have complete datasets implies that stations with incomplete records, even though strongly correlated with the target station, are excluded. To address this limitation, we develop a new sequential imputation algorithm for imputing missing values in spatio-temporal daily precipitation records. We demonstrate the benefits of sequential imputation by incorporating it within a spatial interpolation based on a Random Forest technique. Results show that for reliable imputation, having a few strongly correlated references is more effective than having a larger number of weakly correlated references. Further, we observe that sequential imputation becomes more beneficial as the number of stations with incomplete records increases. Overall, we present a new approach for imputing missing precipitation data which may also apply to other meteorological variables.

International Journal of Environmental Research and Public Health, 2020
The fate of water and water-soluble toxic wastes in the subsurface is of high importance for many... more The fate of water and water-soluble toxic wastes in the subsurface is of high importance for many scientific and practical applications. Although solute transport is proportional to water flow rates, theoretical and experimental studies show that heavy-tailed (power-law) solute transport distribution can cause chemical transport retardation, prolonging clean-up time-scales greatly. However, no consensus exists as to the physical basis of such transport laws. In percolation theory, the scaling behavior of such transport rarely relates to specific medium characteristics, but strongly to the dimensionality of the connectivity of the flow paths (for example, two- or three-dimensional, as in fractured-porous media or heterogeneous sediments), as well as to the saturation characteristics (i.e., wetting, drying, and entrapped air). In accordance with the proposed relevance of percolation models of solute transport to environmental clean-up, these predictions also prove relevant to transpor...

Forest disturbance and regrowth are key processes in forest dynamics but detailed information of ... more Forest disturbance and regrowth are key processes in forest dynamics but detailed information of these processes is difficult to obtain in remote forests as the Amazon. We used chronosequences of Landsat satellite imagery to determine the sensitivity of surface reflectance from all spectral bands to windthrow, clearcutting, and burning and their successional pathways of forest regrowth in the Central Amazon. We also assess whether the forest demography model Functionally Assembled Terrestrial Ecosystem Simulator (FATES) implemented in the Energy Exascale Earth System Model (E3SM) Land Model (ELM), ELM-FATES, accurately represents the changes for windthrow and clearcut. The results show that all spectral bands from Landsat satellite were sensitive to the disturbances but after 3 to 6 years only the Near Infrared (NIR) band had significant changes associated with the successional pathways of forest regrowth for all the disturbances considered. In general, the NIR decreased immediately after disturbance, increased to maximum values with the establishment of pioneers and earlysuccessional tree species, and then decreased slowly and almost linearly to pre-disturbance conditions with the dynamics of forest succession. Statistical methods predict that NIR will return to pre-disturbance values in about 39 years (consistent with observational data of biomass regrowth following windthrows), and 36 and 56 years for clearcut and burning. The NIR captured the observed successional pathways of forest regrowth after clearcut and burning that diverge through time. ELM-FATES predicted higher peaks of initial forest responses (e.g., biomass, stem density) after clearcuts than after windthrows, similar to the changes in NIR. However, ELM-FATES predicted a faster recovery of forest structure and canopy-coverage back to pre-disturbance conditions for windthrows compared to clearcuts. The similarity of ELM-FATES predictions of regrowth patterns after windthrow and clearcut to those of the NIR results suggest that the dynamics of forest regrowth for these disturbances are represented with appropriate fidelity within ELM-FATES and useful as a benchmarking tool.

Plant functional traits determine vegetation responses to environmental variation, but variation ... more Plant functional traits determine vegetation responses to environmental variation, but variation in trait values is large, even within a single site. Likewise, uncertainty in how these traits map to Earth system feedbacks is large. We use a vegetation demographic model (VDM), the Functionally Assembled Terrestrial Ecosystem Simulator (FATES), to explore parameter sensitivity of model predictions, and comparison to observations, at a tropical forest site: Barro Colorado Island in Panama. We define a single 12-dimensional distribution of plant trait variation, derived primarily from observations in Panama, and define plant functional types (PFTs) as random draws from this distribution. We compare several model ensembles, where individual ensemble members vary only in the plant traits that define PFTs, and separate ensembles differ from each other based on either model structural assumptions or non-trait, ecosystem-level parameters, which include (a) the number of competing PFTs present in any simulation and (b) parameters that govern dis-Published by Copernicus Publications on behalf of the European Geosciences Union.

Oecologia, 2019
Transpiration in humid tropical forests modulates the global water cycle and is a key driver of c... more Transpiration in humid tropical forests modulates the global water cycle and is a key driver of climate regulation. Yet, our understanding of how tropical trees regulate sap flux in response to climate variability remains elusive. With a progressively warming climate, atmospheric evaporative demand [i.e., vapor pressure deficit (VPD)] will be increasingly important for plant functioning, becoming the major control of plant water use in the twenty-first century. Using measurements in 34 tree species at seven sites across a precipitation gradient in the neotropics, we determined how the maximum sap flux velocity (v max) and the VPD threshold at which v max is reached (VPD max) vary with precipitation regime [mean annual precipitation (MAP); seasonal drought intensity (P DRY) ] and two functional traits related to foliar and wood economics spectra [leaf mass per area (LMA); wood specific gravity (WSG)]. We show that, even though v max is highly variable within sites, it follows a negative trend in response to increasing MAP and P DRY across sites. LMA and WSG exerted little effect on v max and VPD max , suggesting that these widely used functional traits provide limited explanatory power of dynamic plant responses to environmental variation within hyper-diverse forests. This study demonstrates that long-term precipitation plays an important role in the sap flux response of humid tropical forests to VPD. Our findings suggest that under higher evaporative demand, trees growing in wetter environments in humid tropical regions may be subjected to reduced water exchange with the atmosphere relative to trees growing in drier climates.

Journal of Hydrology and Hydromechanics, 2018
Accurate estimates of infiltration and groundwater recharge are critical for many hydrologic, agr... more Accurate estimates of infiltration and groundwater recharge are critical for many hydrologic, agricultural and environmental applications. Anticipated climate change in many regions of the world, especially in tropical areas, is expected to increase the frequency of high-intensity, short-duration precipitation events, which in turn will affect the groundwater recharge rate. Estimates of recharge are often obtained using monthly or even annually averaged meteorological time series data. In this study we employed the HYDRUS-1D software package to assess the sensitivity of groundwater recharge calculations to using meteorological time series of different temporal resolutions (i.e., hourly, daily, weekly, monthly and yearly averaged precipitation and potential evaporation rates). Calculations were applied to three sites in Brazil having different climatological conditions: a tropical savanna (the Cerrado), a humid subtropical area (the temperate southern part of Brazil), and a very wet ...

The New phytologist, 2018
The fate of tropical forests under climate change is unclear as a result, in part, of the uncerta... more The fate of tropical forests under climate change is unclear as a result, in part, of the uncertainty in projected changes in precipitation and in the ability of vegetation models to capture the effects of drought-induced mortality on aboveground biomass (AGB). We evaluated the ability of a terrestrial biosphere model with demography and hydrodynamics (Ecosystem Demography, ED2-hydro) to simulate AGB and mortality of four tropical tree plant functional types (PFTs) that operate along light- and water-use axes. Model predictions were compared with observations of canopy trees at Barro Colorado Island (BCI), Panama. We then assessed the implications of eight hypothetical precipitation scenarios, including increased annual precipitation, reduced inter-annual variation, El Niño-related droughts and drier wet or dry seasons, on AGB and functional diversity of the model forest. When forced with observed meteorology, ED2-hydro predictions capture multiple BCI benchmarks. ED2-hydro predicts...
Uploads
Papers by Boris Faybishenko