Hybrid physics-AI Outperforms Numerical Weather Prediction For Extreme Precipitation Nowcasting
Hybrid physics-AI Outperforms Numerical Weather Prediction For Extreme Precipitation Nowcasting
https://doi.org/10.1038/s41612-024-00834-8
Precipitation nowcasting, which is critical for flood emergency and river management, has remained
challenging for decades, although recent developments in deep generative modeling (DGM) suggest
1234567890():,;
1234567890():,;
the possibility of improvements. River management centers, such as the Tennessee Valley Authority,
have been using Numerical Weather Prediction (NWP) models for nowcasting, but they have been
struggling with missed detections even from best-in-class NWP models. While decades of prior
research achieved limited improvements beyond advection and localized evolution, recent attempts
have shown progress from so-called physics-free machine learning (ML) methods, and even greater
improvements from physics-embedded ML approaches. Developers of DGM for nowcasting have
compared their approaches with optical flow (a variant of advection) and meteorologists’ judgment,
but not with NWP models. Further, they have not conducted independent co-evaluations with water
resources and river managers. Here we show that the state-of-the-art physics-embedded deep
generative model, specifically NowcastNet, outperforms the High Resolution Rapid Refresh (HRRR)
model, which is the latest generation of NWP, along with advection and persistence, especially for
heavy precipitation events. Thus, for grid-cell extremes over 16 mm/h, NowcastNet demonstrated a
median critical success index (CSI) of 0.30, compared with median CSI of 0.04 for HRRR. However,
despite hydrologically-relevant improvements in point-by-point forecasts from NowcastNet, caveats
include overestimation of spatially aggregate precipitation over longer lead times. Our co-evaluation
with ML developers, hydrologists and river managers suggest the possibility of improved flood
emergency response and hydropower management.
Flooding, a prevalent weather hazard, impacts numerous regions globally, SEUS, has to often deal with flash floods, primarily triggered by mesoscale
causing economic damage and disruption each year. In the United States convective systems (MCSs). A noteworthy example is the devastating flood
alone, between 1980 and 2019, flooding resulted in losses totaling $146.5 in middle Tennessee, in August 2021, which resulted in the loss of 20 lives
billion and claimed the lives of 555 individuals, as reported by NOAA’s and more than $100 million in property damage5. Similarly, the Damodar
National Centers for Environmental Information1. The impacts of floods Valley Corporation, modeled after the TVA6, manages the Damodar River
and especially flash floods extend beyond immediate human and infra- in West Bengal, India, and has been struggling with unpredictable floods7
structural losses to encompass critical infrastructure such as hydropower despite infrastructural advancements. Other examples of river management
operations and dam management2. In the Southeastern United States centers dealing with deadly flash floods includes the Società Adriatica di
(SEUS), flash floods are a major concern due to their sudden and severe Elettricità in Italy (1963 Vajont Dam failure, 2000 people killed)8, the Nile
nature3,4. The Tennessee Valley Authority (TVA), which manages the River Basin Authority in Egypt (2015 Alexandria and Nile Delta floods, 17
Tennessee River system in Tennessee and six surrounding states in the deaths)9, and the Kerala Water Resources Department in India (2018 flood
1
Sustainability and Data Sciences Laboratory, Northeastern University, Boston, MA, USA. 2The Institute for Experiential AI and Roux Institute, Northeastern
University, Boston, MA, USA. 3Tennessee Valley Authority, Knoxville, TN, USA. 4Zeus AI, Cambridge, MA, USA. 5Environmental Sciences Division, Oak Ridge
National Laboratory, Oak Ridge, TN, USA. 6Research Triangle Institute, Raleigh, NC, USA. 7Pacific Northwest National Laboratory, Richland, WA, USA.
e-mail: [email protected]
in Kerala, 400 deaths)10. To address the challenges posed by emergency flash in weather and hydrological forecasting cannot be directly compared across
flood management, short-term Quantitative Precipitation Forecasts (QPF) different space-time scales30.
serve as vital tools by driving the hydrologic and hydraulic models which Accurate nowcasting impacts integral areas in hydrology such as river
predict runoff and flooding downstream11,12. Traditional forecasting management, dam operations, and flash flood prediction, which affect the
methods have employed persistence, advection of radar echoes13, Numerical lives and property of human directly. We evaluated NowcastNet using
Weather Prediction (NWP) models14, and data-driven extrapolation-based extreme storms relevant to the stakeholders, employing both standard skill
methods15, either individually or in combination16. Although short-term metrics and hydrologically relevant metrics co-developed with river man-
QPFs offer well-documented advantages, this field has long been agers. Skill scores measure how well the model’s predictions outperform a
acknowledged as one of the most challenging in hydrometeorology. Even baseline or reference model and are essential for early detection and issuing
leading NWP models such as High Resolution Rapid Refresh (HRRR), often emergency alerts like flood warnings. However, for predicting exact flood
struggle to accurately predict extreme precipitation events17,18, prompting levels or calculating the specific volume of water to release from a dam,
organizations like the TVA to opt for alternative forecasts with coarser error-based metrics are crucial. We also employ the Contiguous Rainfall
spatial and temporal resolution. However, in recent years, with advance- Area (CRA) method to break down errors into pattern, volume, and dis-
ments of machine learning, studies have demonstrated that deep learning placement components, providing a detailed understanding of where pre-
methods can surpass traditional approaches like persistence, advection, and dictions diverge from observed events. Additionally, the use of median,
optical flow16,19–21. quartiles, and outliers of scores quantifies uncertainty across extreme storms
Current machine learning methods treat forecasting as an image-to- in the TVA region, giving river managers insights into the reliability of
image translation problem, employing computer vision tools to generate forecasts and helping them respond effectively to future events. These
nowcasts22. The latest development in such physics-free nowcasting approaches ensure that the model’s predictions are not only timely but also
approaches comes from Google DeepMind23. Their physics-free AI model, accurate, thereby enhancing reliability and minimizing the risk of false
known as Deep Generative Model of Rainfall (DGMR), is trained on his- alarms or missed events, ultimately contributing to the development of
torical weather data and can rapidly analyze patterns and make predictions trustworthy AI in hydrologic management.
without explicit knowledge of atmospheric physics. However, while DGMR
offered accurate forecasts in comparison to previous methods, it struggled to Results
accurately predict extreme precipitation events24. A more recent study Tennessee Valley Authority case study
improved nowcasting of extreme precipitation by combining physical- The TVA plays a pivotal role in flood control, navigation, power generation,
evolution schemes such as the conservation of mass for precipitation fields water supply, water quality maintenance, and recreation across the Ten-
over time and space and conditional-learning methods into a neural- nessee River system in the Southeastern US and the Appalachian region.
network framework called NowcastNet24. It addresses both advective and They manage a vast river network spanning approximately 640 miles and
convective processes, which was previously deemed challenging in DGMR. encompassing around 40,000 square miles of watershed. TVA manages 49
In this study, we assess the performance of the state-of-the-art physics- dams, 29 of which produce hydroelectricity, and they provide electricity to
conditioned deep generative model in predicting precipitation patterns 153 local power companies, serving more than 10 million people. Moreover,
during record-breaking flood events as well as heavy precipitation events in with strategically constructed dams and reservoirs along major river systems
the Tennessee Valley. Due to its exposure to extreme storms and extensively like the Tennessee River, TVA regulates water flow to mitigate flood risks
dammed rivers, the Tennessee Valley is a critical focus area for evaluating during heavy rainfall and storm events. In Fig. 1, the operating area of TVA
NowcastNet’s effectiveness in flood prediction and disaster management. In is shown with the locations of key electricity generating facilities.
this study, we evaluate the following methods: In the Tennessee Valley, floods are primarily triggered by mesoscale
• NowcastNet24, state-of-the-art physics embedded DGM, provides convective systems (MCSs), mid-latitude cyclones (MLC), and tropical
forecasts at 10 min interval for 3 h at 1 km resolution. NowcastNet storm remnants (TSR), either individually or in combination. Despite
merges convective-scale details observed through radar data with advancements in weather prediction models, several flood instances
mesoscale patterns dictated by physical laws into a neural-network revealed limitations in forecasting intensity and location accurately. A
framework. pertinent case study is the devastating August 2021 flood in Waverly,
• High Resolution Rapid Refresh (HRRR)25,26, state-of-the-art NWP Tennessee. The flood event was triggered by unprecedented rainfall and the
model, developed by NOAA, provides hourly forecasts at 3 km deluge, attributed to a complex interplay of meteorological phenomena,
resolution utilizing complex physics based equations and data highlighted vulnerabilities in flood preparedness and response mechanisms.
assimilation. Despite prior warnings issued by the National Weather Service, the rapid
• Baseline approaches: onset of the flood prevented timely evacuation efforts, exacerbating the
— Advection or Optical Flow, represented by the PySTEPS27 algorithm, impact on residents. Meteorological observations indicated an abundance of
which uses an advection scheme influenced by the continuity equation. atmospheric moisture, along with the interaction between a mid-level warm
It predicts future motion fields and intensity residuals by iteratively front and a stationary front over West Tennessee, creating conditions
advecting past radar data. conducive to intense precipitation and subsequent flooding. The mesoscale
— Persistence, which assumes precipitation intensity and location will convective system responsible for the event showcased the heightened
remain the same over increasing lead time. vulnerability to extreme weather events, emphasizing the imperative for
robust flood management strategies and precise forecasting methods to
While developers of physics free or physics-conditioned deep gen- mitigate future risks. While this event occurred in an unregulated part of the
erative models of nowcasting, have compared their approaches with optical basin, it underscores the potential for similar catastrophic events across the
flow in terms of skill scores as well as judgment of meteorologists, they have TVA region. TVA holds its dam reservoirs at a high water level in the
not compared with NWP models and did not do independent evaluations summer, as part of its multi-objective optimization, including recreation
for hydrologic use. This study compares NowcastNet with HRRR, which is and seasonal electricity demand. These elevated water levels would have
widely used in many river basins28,29 but has not previously been evaluated constrained the time available for emergency response if the event had
against NowcastNet. Although there are studies comparing deep learning happened in a regulated section of the system. Thus, accurate forecasts
models with NWP models for precipitation forecasting17, these comparisons would have been crucial in managing or mitigating the flood impact,
often overlook the differences in phenomenon space and time scales. Earth emphasizing the significance of timely predictions. Therefore, the Waverly
system prediction problems vary depending on these scales, and challenges flood event emphasizes the intricate relationship between meteorological
Fig. 1 | Map depicting the Tennessee Valley Authority (TVA) service area and the distribution of major power generation facilities as reported by the Government
locations of key electricity generating assets within the region. The figure provides Accountability Office (GAO) in their 2023 Report to Congressional Requesters62.
an overview of the geographical coverage of TVA's operations and highlights the
dynamics and human vulnerability, prompting TVA to prioritize high- (Table S1). These events are selected based on the catastrophic impacts they
quality hourly forecasts and consistent predictions for extreme events. had in the TVA region.
Following the Waverly event, questions arose regarding the effec-
tiveness of the HRRR model, used by TVA and other agencies to predict Performance of physics conditioned deep generative model:
weather patterns and assess flood risks. While the HRRR model provided NowcastNet
valuable insights into typical weather conditions, its performance during The performance of the NowcastNet model during the Waverly event
the Waverly event cast doubt on its reliability during extreme events. Here, (August 21, 2021) is evaluated within the TVA area. Multi Radar Multi
Fig. 2 shows the performance of the HRRR model during the Waverly Sensor (MRMS) data are used as reference observations which is developed
event on August 21, 2021. The figure reveals the disparity between the by NOAA’s National Severe Storms Laboratory (NSSL) and it incorporates
predicted accumulated precipitation and the observed values at the data from approximately 180 operational US WSR-88D weather radars and
McEwen gauge, shedding light on the forecast bias. The McEwen gauge model analyses to produce gridded precipitation32. Detailed information on
station was chosen for this analysis due to its reliable ground-based the dataset and NowcastNet model is given in the “Methods” section and
observations near the Waverly storm, setting a Tennessee record with 17 Supplementary Information Section B, including how the model has
inches of rainfall in 24 h31. Moreover, the data from this gauge was spe- incorporated physics and its training and evaluation datasets. The steps to
cifically recommended by TVA managers, further validating its relevance apply the NowcastNet model in the TVA region are described in Supple-
to our analysis and the McEwen precipitation accumulation data utilized mentary Information Section C. Figure 3 presents precipitation predictions
in this comparison is derived from TVA-collected data. Around 11:00 starting from 9:00 UTC (T + 1 h) until 11:00 UTC (T + 3 h) from both the
UTC, when the actual rainfall accumulation reached a cumulative 13 NowcastNet model and HRRR forecasts, along with the Power Spectral
inches, the HRRR forecasted only 2 inches. Similarly, despite a total Density (PSD) performance metric.
rainfall of 17 inches throughout the day in McEwen, the HRRR model The Waverly event is characterized by its extreme precipitation, which
predicted only 4 inches. stemmed from a mesoscale convective system, a collection of thunder-
More information about other heavy precipitation events and HRRR’s storms. Capturing extreme precipitation at convective scales is challenging
failure to accurately forecast the events are given in Supplementary Infor- due to the rapid development, intensity, and localized nature of convective
mation Section A. Beyond assessing the Waverly event, we considered 30 storms. Despite the challenges, NowcastNet predicted the hotspots of
additional extreme precipitation events (with grid cells exceeding 30 mm/h) extreme precipitation locations of more than 30 mm/h more accurately than
occurring between January 2021 and April 2024, specifically within the TVA HRRR. For the 3-h forecasts, NowcastNet is capable of forecasting the
area. The list of the events is given in Supplementary Information Section trajectory of the thin line of convective precipitation, whereas HRRR could
Fig. 2 | Comparison of accumulated precipitation forecasts from High- yellow, and green lines, while observations are shown with a blue line. The McEwen
Resolution Rapid Refresh (HRRR) model at the McEwen precipitation gauge precipitation accumulation data utilized in this comparison is derived from TVA-
during the Waverly event provides insights into the forecast bias. Precipitation collected data. The plot illustrates the discrepancy between the accumulated pre-
forecasts (in inches) on August 21, 2021, displayed in Coordinated Universal Time cipitation forecasts and the actual observations at the McEwen gauge.
(UTC) from the High-Resolution Rapid Refresh (HRRR) model, are shown with red,
not predict the heavy precipitation at all. The PSD reveals strength of a signal examined in this study. Among them, the performance of NowcastNet
as a function of spatial scale, and for this case study the PSD curve of the against MRMS, HRRR, persistence, and advection is highlighted for 4 events
forecast matches the PSD curve of the MRMS for wavelengths 4 km to and shown in Supplementary Figs. S2–S5. In all the events, NowcastNet
16 km, and the nowcast only slightly overestimates PSD for the rest of the exhibits a higher degree of similarities to MRMS. In contrast, HRRR
wavelengths from 2 km to 256 km. Even at 3 h lead time, although the two encounters challenges in capturing finer details when compared to MRMS.
PSD curves are slightly off at wavelengths greater than 16 km or less than Persistence assumes precipitation intensity and location will remain the same
4 km, it is a near-exact match for wavelengths between 4 km and 16 km, over time, and likewise we see its performance deteriorate over time. On the
indicating that the forecast contains the same amount of information detail other hand, the advection model illustrates the movement but fails to capture
as the MRMS at these spatial scales. On the other hand, the information the intensities of extreme precipitation and produces blurry nowcasts.
content in the HRRR does not match the observed QPE at any wavelength. Various metrics were employed to evaluate NowcastNet’s predictions
Although, NowcastNet displays the right amount of detail at most spatial against HRRR in these events. These metrics help determine the model’s
scales, with increasing lead time, the model exhibits broader areas of light ability to classify and predict the occurrence and intensity of precipitation
precipitation, leading to challenges in precisely identifying the exact location events, as well as how well the models predict continuous variables related to
of precipitation. hydrological processes, such as rainfall amounts and their spatial distribu-
To provide a more comprehensive evaluation of the model’s perfor- tion. The spatial resolution of MRMS and NowcastNet is finer (1 km) than
mance, we expanded the analysis to consider multiple initialization times, HRRR (3 km). Here, for fair comparison, the MRMS QPEs and NowcastNet
similar to the approach used in Fig. 2. Supplementary Fig. S1 shows accu- forecasts were upscaled to 3 km. Persistence and advection forecasts were
mulated precipitation forecasts (hourly) from both the NowcastNet and also analyzed at 3 km spatial resolution. In this study, we have used 3
HRRR models at various initialization times (08-20-2021 13:00 UTC to 08- thresholds (t) for extreme precipitation events, t > 0.1 mm/h, t > 16 mm/h,
22-2021 07:00 UTC) with underestimation of forecasts shown with positive and t > 32 mm/h for all categorical skill scores. We have chosen 16 mm/h
bias and overestimation shown with negative bias during the Waverly event. and 32 mm/h because they are standard benchmarks used to define extreme
The NowcastNet forecast closely follows the observed precipitation trend, events in the literature24. The skill score metrics presented here include pixel-
capturing the timing and intensity of the rainfall with minimal bias. How- based metrics such as probability of detection (POD), false alarm ratio
ever, there is a slight underestimation towards the end of the storm event. (FAR), and critical success index (CSI), along with a neighborhood-based
The HRRR forecast significantly underestimates the precipitation metric, fractions skill score (FSS) (Fig. 4). POD and FAR are particularly
throughout the event, as shown in Fig. 2. important for conveying the performance of these models to river managers,
The performance of NowcastNet was compared against HRRR as well CSI is chosen because it is a standard metric used in the evaluation of state-
as persistence and advection. Advection is represented by the PySTEPS27 of-the-art models23,24, and FSS with 9 km × 9 km neighborhood is included
algorithm (Supplementary Information Section D). For comprehensive to provide insight into neighborhood-based skill, as using only grid-point
evaluation, 30 heavy precipitation events from January 2021 to April 2024 are verification can yield misleading results due to double-penalty errors, where
Fig. 3 | Comparison of precipitation forecasts from NowcastNet and HRRR for 2021 (T = 8:00 UTC) within the TVA area. The precipitation images cover a spatial
the Waverly flood event (August 21, 2021): spatial accuracy and power spectral extent of 384 km × 384 km. The base map shows US state boundaries. NowcastNet
density analysis. Precipitation forecasts (in mm/h) from NowcastNet (1 km spatial predicts the MRMS precipitation patterns more closely than HRRR does, in terms of
resolution) and HRRR (3 km spatial resolution) at different lead times (T + 1 h, the spatial distribution and intensity of the precipitation. The last row depicts the
T + 2 h, and T + 3 h) with MRMS QPE32 for the Waverly flood event on August 21, PSD at different wavelengths, at different lead times (T + 1 h, T + 2 h, and T + 3 h).
Fig. 4 | Metrics comparison across 30 heavy precipitation events. Comparison of (CSI), Probability of Detection (POD), False Alarm Ratio (FAR), and Fractions Skill
precipitation forecast accuracy between NowcastNet, HRRR, Persistence, and Score (FSS); all at thresholds (t) of t > 0.1 mm/h, t > 16 mm/h and t > 32 mm/h for
Advection against MRMS QPE (at 3 km spatial resolution) for 30 heavy precipita- different lead times (T + 1 h, T + 2 h, and T + 3 h). Upward arrow indicates higher
tion events across the geography of interest. Metrics include Critical Success Index score is better and downward arrow indicates lower score is better.
forecasts are penalized twice for deviations caused by displacement errors33. corresponds to the smallest possible neighborhood. However, to understand
River managers and dam operators need to focus on the smallest neigh- how skills change with larger neighborhoods, we have compared the
borhood that still provides meaningful area-based evaluations. Thus, we NowcastNet model with MRMS (observations) across multiple neighbor-
have selected a 3 × 3 pixel neighborhood (i.e., a 9 km by 9 km area), which hood sizes, as shown in Supplementary Figs. S9 and S10. Figure 4 shows box
plots of all these metrics for different lead times and for all 3 thresholds. The dynamics regardless of whether the storm at hand merely requires skill
median, quartiles, and outliers of scores provide uncertainty quantification within a large neighborhood, or whether skill in a small neighborhood is
across 30 extreme storms from the TVA region, giving river managers vital strictly required.
information on how to trust and react to each model’s forecast during a For further investigation of the NowcastNet model’s performance in
future extreme storm. Here, for all lead times, all thresholds, and all metrics, comparison to MRMS QPE, we assessed four extreme precipitation events
NowcastNet outperforms HRRR, with better median scores across the 30 from 2021 and 2024. The event forecasts are done for August 21, 2021, at
events. Although NowcastNet’s performance declines at longer lead times 8:00 UTC, February 17, 2022, at 18:00 UTC, February 16, 2023, at 12:00
for more extreme thresholds, it still outperforms HRRR and persistence, UTC and March 15, 2024, at 3:00 UTC in the TVA area. Figure 5 illustrates a
which have the worst scores. At the T + 1 h lead time and for the t > 0.1 comparison of precipitation prediction discrepancies from the NowcastNet
threshold at longer lead times, NowcastNet’s superiority over HRRR is clear. model at different lead times (T + 10 min, T + 1 h, T + 2 h, and T + 3 h)
Across nearly all metrics and thresholds, the quartiles, minimum, and with MRMS QPE, for the extreme events. The plot shows that, as the lead
maximum of NowcastNet scores are better than those of HRRR, indicating an time increases, discrepancies between MRMS and predictions become more
advantage not just for the median but for most storms. Aside from superiority pronounced, indicating the challenges associated with accurately forecasting
over HRRR, we see that NowcastNet performs better than the baseline extreme precipitation events over extended time horizons. We observed
methods in terms of CSI, FAR and FSS, but the advection model shows better areas of underestimation, where NowcastNet either forecast a less-intense
performance in terms of POD, highlighting its strength in capturing the event or missed the precipitation event completely, as well as over-
movement of precipitation, albeit with less accuracy in predicting its intensity. estimation, where the model predicted high rainfall despite lesser pre-
HRRR might perform better at longer lead times with sufficient spin-up time cipitation or lack of it altogether. This suggests that the model’s predictive
for data assimilation, but this scenario was not tested in our study. This capability diminishes with longer lead times, leading to larger areas of
quantitative evaluation emphasizes NowcastNet’s effectiveness relative to overestimation of precipitation.
HRRR and other baseline methods in predicting extreme precipitation To understand the source of these errors, we conducted a detailed
events’ intensity and location across the forecast intervals. We also evaluated analysis using the Contiguous Rainfall Area (CRA) method, which quan-
additional commonly used skill score based metrics, including the F1 Score, tifies errors in the predicted location of rain systems by breaking down the
Equitable Threat Score (ETS), and Heidke Skill Score (HSS) (Supplementary total error into components related to location inaccuracies, amplitude
Fig. S6), all of which corroborated the findings of our primary analysis. discrepancies, and differences in fine-scale patterns34,35. Figure 6 presents the
Apart from skill scores, we also employed error and correlation based comparison between observed and forecasted precipitation patterns and the
metrics for comprehensive assessment in Supplementary Fig. S7. Metrics associated error decompositions using the CRA method for the Waverly
included here are RMSE, Inverse NMSE, Numerical Bias, Normalized Error, event (August 21, 2021, 8:00 UTC).
and Pearson’s Correlation. Metrics are calculated at pixel level, and they Panel A shows the overlapping contours of observed and forecast pre-
provide insights into how well each forecasting method predicts precipita- cipitation patterns, highlighting the spatial mismatch between them. The
tion amounts compared to observed values. Findings from these metrics displacement required for optimal alignment of the forecast with the observed
show that NowcastNet outperforms HRRR and baseline methods at the 1-h precipitation is represented by the arrow, demonstrating how the CRA
lead time for all metrics, demonstrating its strength in short-term pre- method separates errors due to incorrect location. Panel B displays the spatial
cipitation forecasting accuracy. NowcastNet more accurately captures the distributions of observed and forecasted precipitation at three different lead
timing and magnitude of precipitation events, resulting in lower residuals times: 1 h, 2 h, and 3 h. Panel C presents the error decomposition for different
compared to other models. Notably, NowcastNet maintains a better cor- lead times (10 min, 1 h, 2 h, and 3 h) as Root Mean Square Error (RMSE) in
relation with MRMS at all lead times, while the median HRRR forecasts mm/hr and the errors are broken down into three components: volume error,
exhibit zero correlation. However, NowcastNet tends to overestimate pre- displacement error and pattern error. Panel D provides a summary of CRA
cipitation, particularly at longer lead times, which affects error metrics such verification metrics for different lead times (1 h, 2 h, and 3 h), using a ver-
as RMSE and numerical bias. Despite this overestimation, the model ification grid of 0.01° and a CRA threshold of 16 mm/h. The results highlight
effectively aligns with overall trends in the data, highlighting its reliability in that the most significant error in NowcastNet’s predictions arises from
short-term precipitation forecasts. In contrast, HRRR shows better perfor- inaccuracies in the spatial distribution of precipitation, particularly as the
mance than NowcastNet at longer lead times (2–3 h) in terms of RMSE, forecast lead time increases. Even when the total volume of rainfall is accu-
Inverse NMSE, and Normalized Error, likely due to NowcastNet’s tendency rately captured, the model frequently misaligns the forecasted precipitation
to overestimate precipitation more frequently than HRRR. objects with their observed counterparts, resulting in substantial pattern
To assess NowcastNet’s predictive capabilities at 1 km spatial resolu- errors. The error decomposition is further analyzed for other events shown in
tion against MRMS QPE, the set of skill score based metrics is employed at Fig. 5, in the Supplementary Fig. S11. Findings from this analysis show that
10-min intervals for three thresholds. In Supplementary Fig. S8, the metrics pattern errors consistently dominate across all four events analyzed
are plotted against different lead times. The shaded range in the figure shows (65%–90% of the total error), with displacement errors being less prominent
the maximum and minimum of 30 heavy precipitation events at any given (10%–30% of the total error) and volume errors minimal (0–3%). However,
lead time. The results at 10-min intervals show similar trends to those the model’s difficulty in capturing the precise spatial structure of rainfall
observed at the hourly intervals in Fig. 4), reaffirming the model’s strengths suggests a need for improvement in representing complex precipitation
and limitations. However, the 10-min interval forecasts are particularly patterns.
important for river managers, as they provide more granular and timely
information, which is crucial for emergency management and rapid Discussion
response to changing conditions during extreme weather events. We have Precipitation nowcasting stands as a paramount objective in meteorological
also estimated how the neighborhood-based metric, FSS, changes with science, crucial for informing weather-dependent policymaking. Despite
increasing neighborhood sizes and lead times, demonstrating the Now- advancements, current numerical weather-prediction systems struggle to
castNet model’s performance at spatial resolution of 1 km (Supplementary provide accurate nowcasts, particularly for extreme precipitation events17,18.
Figs. S9 and S10). It should be noted that the earlier FSS analysis used 3 km In this study, we assessed the efficacy of cutting-edge precipitation now-
resolution, but because NowcastNet will be used operationally at 1 km casting methodologies, focusing on NowcastNet (a physics conditioned
resolution, we used 1 km for this supplementary analysis. From this analysis deep generative model) within the TVA service area during extreme pre-
it is evident that FSS score is higher for larger neighborhoods, and for each cipitation events.
neighborhood size, a decreasing trend is observed with increasing lead NowcastNet’s performance was compared against MRMS QPE and
times. These results show that river managers can expect similar forecast HRRR as well as against baseline approaches such as persistence and
Fig. 5 | Precipitation prediction discrepancies from NowcastNet at varying lead February 16, 2023 (T = 12:00 UTC) and March 15, 2024 (T = 3:00 UTC), within the
times for four extreme rainfall events. Comparison of precipitation prediction TVA area. The basemap shows US state boundaries. Blue shades represent under-
discrepancies (in mm/h) from NowcastNet model at different lead times estimation, while red shades represent overestimation of precipitation. With
(T + 10 min, T + 1 h, T + 2 h, and T + 3 h) with MRMS for four extreme rainfall increasing lead time, discrepancies between MRMS and NowcastNet predictions
events on August 21, 2021 (T = 8:00 UTC), February 17, 2022 (T = 18:00 UTC), become more pronounced.
advection, using various skill score-based metrics such as POD, FAR, CSI, assessing its operational utility in life-saving applications like river man-
FSS, F1 Score, ETS, and HSS, as well as error- and correlation-based metrics agement. We focused on the Waverly event, which highlighted the chal-
such as RMSE, Numerical Bias, Inverse NMSE, Normalized Error, and lenges of predicting extreme precipitation from mesoscale convective
Pearson’s Correlation. The suite of metrics, co-developed with river man- systems. In this event, NowcastNet outperformed HRRR by accurately
agers, goes beyond standard skill metrics typically used to evaluate weather forecasting hotspots of extreme precipitation over 30 mm/h and predicting
forecasts. It incorporates hydrologically relevant metrics that account for the trajectory of convective storms over 3-h lead times. Also, NowcastNet
extreme precipitation events, the time series of precipitation, and multiple maintained detailed predictions across most spatial scales, with its spatial
resolutions in both time and space. Moreover, these metrics were not only power spectral density (PSD) closely matching observed data. Furthermore,
essential for evaluating the model’s predictive accuracy, but also critical in when evaluated across multiple initialization times, NowcastNet
Fig. 6 | Comparison of observed and forecast precipitation patterns and error displacement errors. C Error decomposition into volume, displacement, and pattern
decomposition using the contiguous rain area (CRA) method. A Illustration of the errors for different lead times, quantified as RMSE in mm/h. D Summary of CRA
CRA formation by aligning the isohytes between observed (MRMS) and forecast verification metrics (with threshold of 16 mm/h), including Pearson correlation
(NowcastNet) fields, highlighting the displacement required for optimal coefficients (CC), RMSE values, and error decomposition percentages, across dif-
alignment34,35. B Spatial distributions of observed and forecasted precipitation at ferent lead times, demonstrating the dominance of pattern errors in forecast
various lead times (T + 1 h, T + 2 h, and T + 3 h) with identified pattern and accuracy.
consistently followed the observed precipitation trend with minimal bias, maintains an accurate total rainfall volume, demonstrating the effectiveness
while the HRRR model underestimated precipitation throughout the event, of the mass balance component incorporated through the continuity
reinforcing NowcastNet’s robustness in handling complex weather patterns equation. Although displacement errors are somewhat variable, they
and maintaining accuracy over extended periods. In a comprehensive account for only about 30% of the total error, suggesting that the model can
evaluation of 30 heavy precipitation events from 2021 to 2024, NowcastNet adequately capture the spatio-temporal movement of precipitation. How-
consistently showed higher similarity to observed MRMS data compared to ever, as displacement occurs, the rainfall area is expected to evolve (either
HRRR and other benchmarks like persistence and advection, which grow or decay), and the model struggles to accurately represent these
struggled with fine details and intensity predictions. dynamic changes in precipitation patterns. Future efforts could improve
In terms of skill score and correlation based metrics, NowcastNet’s performance by incorporating features that better capture spatial variability,
better performance was noticeable against HRRR and other baseline or by refining the model architecture to enhance its ability to learn spatial
approaches at all lead times and for all thresholds, especially at prediction of dependencies. In summary, NowcastNet exhibited shortcomings, such as
extreme precipitation at threshold >32 mm/h. In terms of error based inaccuracies in estimating total rainfall and spatial imprecision at higher
metrics also, NowcastNet is highly effective for 1-h predictions, but it may resolutions, underscoring the need for continued model refinement.
sacrifice some accuracy in 3-h forecasts compared to HRRR which was However, it consistently outperformed HRRR and other models in pre-
evident from the results of RMSE, Inverse NMSE, and Normalized Error. dicting heavy precipitation events, and enhanced trust in DGM at 1 h
The reason behind this is the overestimation of precipitation in longer lead lead time.
times. Through the comparison of pixel-based precipitation predictions A salient feature of this study has been the co-evaluation of our now-
from the model, areas of both underestimation and overestimation were casting approach within our team of coauthors consisting of ML developers,
showcased, the consequences of which are noteworthy. Underestimation hydrologists, water resources engineers and scientists, as well as river
can lead to inadequate preparedness and response measures, increasing the managers and hydrometeorologists working at the TVA. The TVA ori-
risk of property damage, flooding, and even loss of life during extreme ginally discontinued the operational use of HRRR at the request of the river
events. Conversely, overestimation can result in unnecessary disruptions forecast center’s (RFC’s) lead engineers because it was adding noise in the
and resource allocation, leading to economic losses and public incon- early lead times and was inconsistent from run to run. However, they
venience. Therefore, minimizing both underestimation and overestimation continued examining HRRR predictions as a reference. Although HRRR is
is crucial for improving forecast accuracy and enhancing the effectiveness of the state-of-the-art NWP model, its inability to predict extreme rainfall
early warning systems. Notably, NowcastNet’s underestimation and over- amounts during disastrous flooding events in the TVA region, such as the
estimation tendencies intensified with longer lead times. The overestimation Waverly event36, further reinforced their decision to discontinue the
and underestimation of precipitation in NowcastNet primarily stem from operational use of HRRR. A false sense of complacency based on missed
errors in capturing the spatial patterns of rainfall, rather than inaccuracies in predictions of extreme precipitation events, as seemed apparent with HRRR,
the total volume or displacement of precipitation. The model generally could lead to inadequate guidance to flooding emergency managers and
Fig. 7 | Multisource integration and predictive analytics in precipitation fore- nowcasting, NWP models, satellite information, etc. B demonstrates generation of
casting. A demonstrates the reduction of prediction skill or information content of precipitation forecasts using a deep generative model (DGM). The proposed DGM
precipitation forecasts as lead time (shown in logarithmic scale) increases, com- combines observed remotely sensed data from radar and geostationary satellites,
paring (a) persistence, (b) nowcasting, (c) mesoscale and (d) synoptic scale ground sensors, ancillary information from terrain properties, physics of
numerical weather prediction (NWP), (e) merged approach within the boundary of precipitation22,66 and NWP state variables to enhance forecast accuracy.
(f) limit of predictability44,63–65. Merged forecasts can be a combination of
RFC operators. However, the TVA has remained interested in exploring increasing lead time and highlights the potential of physics-conditioned
alternatives for improved nowcasting. Our approach directly addresses this deep generative models to enhance forecast accuracy through multi source
need by building trustworthiness in precipitation forecasts using physics- integration and predictive analytics.
embedded DGM for river managers, which is a critical component for Most of this study’s analysis has been done at the grid cell level, but
effective hazard management37. Based on the results reported here, the basin-level analysis is important for river management and flood manage-
physics-embedded ML system, specifically our implementation of Now- ment. The fact that this study reported most metrics at the grid cell level
castNet, will be evaluated within the operational system of the TVA. means there has not been quantification of how far away hotspots are when
Our research highlights the critical need for further investigations to they are wrongly placed—an observed hotspot 2 km away from the fore-
advance the accuracy of precipitation forecasting. For the longest time, it has casted hotspot is much better than 10 km away, so more evaluation is
been argued that no method has consistently outperformed Lagrangian required to understand this. A precipitation hotspot misplaced across basin
persistence (i.e., advection or its variant, optical flow) in improving QPF at lines may demand emergency preparations in a completely different river,
scales useful for hydrologic applications, especially for very short lead times whereas a precipitation hotspot misplaced within the same basin requires
(e.g., 1–2 h)23,38,39. But recently deep learning methods have shown promise, much the same preparations. Geographically incorrect hotspots were a
compared to baseline methods, in nowcasting at shorter lead times. How- major problem with HRRR, prompting its discontinuation in TVA’s deci-
ever, a common challenge across deep learning based nowcasting is the loss sion-making, so basin-wise or dam-wise evaluation of NowcastNet would
of information content as forecast lead time increases from 1 to 3 h. On the quantify the confidence that NowcastNet could serve a similar role in
other hand, while our understanding of the physics behind precipitation, hourly-level flood management without the geographic errors.
including stratiform and convective rains, continues to advance, translating In conclusion, advancing precipitation nowcasting is crucial for
this knowledge into improved prediction skills, especially at the nowcasting informed decision-making in meteorology, especially for extreme events.
scale, remains challenging. So we hypothesize that by integrating additional While methodologies like NowcastNet show promise in capturing con-
physical principles, like momentum conservation, and incorporating vective events, they exhibit limitations such as false alarms and spatial
diverse ancillary data sources—such as satellite observations, numerical imprecision. Further model refinement and integration of diverse data
weather predictions, surface observations, land use details, terrain char- sources offer avenues for improvement.
acteristics, and elevation—forecast reliability can be improved. Incorpor-
ating satellite data enhances the model’s understanding of large-scale Methods
weather patterns and atmospheric dynamics, improving its ability to capture Nowcasting methods
the spatial and temporal variability of precipitation. Land use information In this section, we outline the mathematical formulations of various now-
helps account for urban effects, vegetation with high transpiration, and casting techniques, starting with the foundational method of Persistence
bodies of water that influence precipitation. Terrain properties, such as and progressing through Optical Flow analysis, Numerical Weather Pre-
slope, aspect, and roughness, are crucial for modulating precipitation due to diction (NWP) models, Machine Learning (ML) techniques, physics-free
orographic effects and wind patterns, while elevation data refine forecasts by approaches, and finally physics-conditioned Deep Generative Models.
considering changes in atmospheric stability and moisture with altitude. Persistence-based nowcasting in atmospheric science involves incor-
Finally, combining forecasts of state variables from numerical weather porating knowledge of precipitation physics into simple models. Traditional
predictions with deep generative nowcasts could further improve accuracy. approaches include climatological precipitation history, Eulerian persis-
Figure 7 illustrates the evolution of precipitation forecasting methodologies, tence, Lagrangian persistence, and persistence of convective cells39. Eulerian
showcasing the reduction of information content in forecasts with persistence (Eq. (1)) predicts future observations based on the most recent
observation, while Lagrangian persistence (Eq. (2)) accounts for the dis- precipitation scenarios over the next 90 min23:
placement of air parcels. The Lagrangian persistence assumption is parti- Z
cularly relevant for short-term rainfall prediction and forms the basis of
PðX Mþ1:MþN jX 1:M Þ ¼ PðX Mþ1:MþN jZ; X 1:M ; θÞPðZjX 1:M ÞdZ ð4Þ
current radar extrapolation models40.
The Eulerian persistence model represents the forecasted precipitation
field ðψ^ Þ at a future time (t0 + τ) as equal to the observed precipitation field Although DGMR generates predictions which are spatio-temporally
(ψ) at the initial time (t0), without considering any displacement. In contrast, consistent with ground truth for light to medium precipitation events, it
the Lagrangian persistence model incorporates a displacement vector (λ) produces nowcasts with unnatural motion and intensity, high location
into the equation, representing the movement of air parcels. It forecasts the error, and large cloud dissipation at increasing lead times24. So, in this study,
precipitation field ðψ ^ Þ at a future time (t0 + τ) by shifting the observed we focus on the state-of-the-art physics conditioned deep generative model
precipitation field (ψ) at the initial time (t0) by the displacement vector (λ): NowcastNet24. This model employs a physics-conditional deep generative
architecture to forecast future radar fields based on past observations, as
^ ðt 0 þ τ; xÞ ¼ ψðt 0 ; xÞ
ψ ð1Þ described in Eq. (5)24. It consists of a stochastic generative network para-
meterized by θ and a deterministic evolution network parameterized by ϕ,
allowing for physics-driven generation from latent vectors z24:
^ ðt 0 þ τ; xÞ ¼ ψðt 0 ; x λÞ
ψ ð2Þ
Z
Optical flow techniques, essential in precipitation nowcasting, infer motion ^ 1:T jX T :O ; ϕ; θÞ ¼
PðX ^ 1:T jX T :O ; ϕðX T :O Þ; Z; θÞPðZÞdZ
PðX
0 0 0
patterns from consecutive image frames21,41. These methods operate at both
ð5Þ
local and global scales, utilizing optical flow constraints (OFCs) to delineate
motion in specific areas or across entire images41–43. Equation (3) describes This integration enables ensemble forecasting, capturing chaotic
the Optical Flow Constraint (OFC) equation, which assumes that features dynamics effectively and ensuring physically plausible predictions at both
within an image sequence maintain their size and intensity while changing mesoscale and convective scales. The modified 2D continuity equation for
shape, serving as the foundation for subsequent models such as STEPS21: precipitation evolution24, can be represented as:
δR δR δR δx
þu þv ¼0 ð3Þ þ ðϑ:∇Þ ¼ s ð6Þ
δt δx δy δt
In Eq. (3), the terms (u,v) represent the velocity field, while R(x,y) In this equation, x, ϑ, and s represent radar data pertaining to composite
denotes the rain rate at the coordinate (x,y). The rain rate R is known at each reflectivity, motion fields, and intensity residual fields, respectively. The
point, and a sequence of images helps estimate the partial derivatives symbol ∇ denotes the gradient operator. This equation represents the
required in Eq. (3). conservation of mass for precipitation fields over time and space. In simpler
NWP models have improved precipitation forecasting through sta- terms, it describes how precipitation changes and moves within a given area,
tistical interpretation, which involves analyzing historical weather data to considering factors like radar reflectivity, motion fields (velocity of pre-
identify patterns and relationships between various atmospheric variables. cipitation movement), and intensity residual fields (changes in precipitation
However, NWPs only explicitly capture broader weather patterns. So, they intensity). NowcastNet adaptively combines mesoscale patterns governed
are most effective for generating general forecasts 12 h ahead and beyond44. by physical laws with convective-scale details from radar observations,
HRRR is an NWP model that played a pivotal role in providing convective resulting in skillful multiscale predictions with up to a 3-h lead time24. More
storm guidance, over the past decade25,26,45. However, with advancements in information is provided in Supplementary Information Section B.
technology and modeling techniques, the HRRR is transitioning to the
Finite Volume Cubed (FV3)-based Rapid Refresh Forecast System (RRFS)46. Evaluation metrics
The RRFS represents an evolution from the HRRR, incorporating Evaluation metrics serve as crucial tools for assessing NowcastNet’s per-
improvements in resolution, physics parameterizations, and data assim- formance in generating precipitation nowcasts. Murphy described three
ilation techniques45,47. pillars of forecast evaluation50. Firstly, consistency, which refers to the
In recent years, machine learning has emerged as a promising tool for harmony between forecasters’ judgments and the forecasts they generate.
precipitation nowcasting, offering solutions to limitations in traditional Secondly, quality, which assesses the concordance between the forecasts and
methods like optical flow and numerical weather prediction models the corresponding observations. Lastly, goodness, which can be thought of
(NWPs)40. Optical flow methods face challenges due to assumptions of as value, evaluates the incremental economic or other benefits realized by
Lagrangian persistence and smooth motion fields, while NWPs struggle to decision-makers through the application of the forecasts50. We employed a
capture fine-scale spatio-temporal patterns associated with convective set of metrics to evaluate the performance of NowcastNet and HRRR with
storms. Machine learning offers potential solutions by capturing complex respect to MRMS. These metrics include categorical skill scores: Probability
spatio-temporal patterns, integrating diverse data sources, and introducing of Detection (POD), False Alarm Ratio (FAR), Critical Success Index (CSI),
approaches like spatiotemporal convolution16,20,22, adversarial training23,24,48, F1 Score, Equitable Threat Score (ETS), Heidke Skill Score (HSS) and the
and latent random variables49 to enhance nowcasting capabilities. Among Fractions Skill Score (FSS) which is neighborhood-based. We estimated
these, the state-of-the-art physics-free deep generative model is DGMR by PSD for frequency analysis. We also employed error and correlation based
Google DeepMind23. metrics which include Root Mean Squared Error (RMSE), Numerical Bias,
Equation (4) describes the nowcasting methodology of the DGMR Inverse NMSE, Normalized Spatially Averaged Error, and Pearson Corre-
model which relies on a conditional generative approach to predict N future lation. Lastly, we employed the Contiguous Rainfall Area (CRA) method for
radar fields based on past M observations23. This model incorporates latent error decomposition into volume, displacement and pattern error.
random vectors Z and parameters θ, ensuring spatially dependent predic- The categorical scores are derived from the 2 × 2 contingency table
tions by integrating over latent variables23. The learning process adopts a (Table 1), also known as a confusion matrix, clarifying which pixels were
conditional generative adversarial network (GAN) framework, tailored observed as events in MRMS and which pixels were forecast as events by the
specifically for precipitation prediction. Specifically, the model utilizes four model. Common nomenclature refers to a as Hits, b as False Alarms, c as
consecutive radar observations spanning the previous 20 min as contextual Misses, and d as Correct Negatives, Correct Nonevents, or Correct
input for a generator which enables the generation of multiple future Rejections.
This way, HSS ranges from negative infinity to 1. Negative values The normalized spatially averaged error, or normalized root mean
indicate that the chance forecast outperforms the actual forecast, while 0 square error, measures the average prediction error as proportion of the
indicates no skill, just as good as chance. A perfect forecast achieves an spatial mean precipitation. Higher errors indicate lower forecast skill:
HSS of 1.
The F1 score combines precision and recall, providing a balance RMSE
Normalized Error ¼ ð16Þ
between them. Here, precision measures the fraction of predicted events for Mean Observed Value
which the prediction was correct, indicating how correct the model was
when it predicted positive cases. Precision is equivalent to 1 − FAR, so a The inverse of the normalized root mean square errors (Inverse
higher precision means lower false alarm ratio. On the other hand, recall, NMSE) measures how well the forecast captures the spread of
equivalent to the POD, assesses the fraction of positive cases that were observed pixel values. It is calculated for pixels with nonzero rainfall.
correctly identified by the classifier, indicating how correct the model was A perfect forecast results in Inverse NMSE approaching infinity. For a
when an event (positive) was observed. F1 score (Eq. (12)) is calculated as stationary process, Inverse NMSE equal to 1 indicates that the RMSE
the harmonic mean of precision and recall (with a threshold of 0.1 mm/h, equals the standard deviation of the observed pixels, therefore the
16 mm/h and 32 mm/h, for differentiating precipitation events and non- forecast is as good as predicting the spatial mean in every location.
events), indicating the model’s accuracy both relative to its own predictions Inverse NMSE less than 1 means the forecast captures the spread
worse than a spatial mean prediction: In this study, we have estimated the best fit based on maximum cor-
relation and a rain threshold of 16 mm/h is used to define the CRA to focus
Standard Deviation of Observed Data on extreme rainfall. We also take the square root of each component to
Inverse NMSE ¼ ð17Þ
RMSE report an RMSE, which has more easily interpretable units.
Pearson’s correlation (Eq. (18)) assesses the similarity between spatial Data availability
patterns of observed and forecasted precipitation fields: The datasets “MRMS” for this study can be found in the NOAA at https://
Pn www.nssl.noaa.gov/projects/mrms or contact the MRMS data teams using
i¼1 ðxi xÞðyi yÞ [email protected]. HRRR operational and experimental output are available
Correlation ¼ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
Pn ð18Þ
2 Pn on the NOAA High Performance Storage System in their standard folder
i¼1 ðxi xÞ i¼1 ðy i yÞ2
locations for real-time runs.
Where, n = number of grid cells, xi = observed precipitation, yi = predicted Received: 17 July 2024; Accepted: 6 November 2024;
precipitation x = mean observed precipitation and y = mean predicted
precipitation.
The spatial power spectral density (PSD)58,59 characterizes the dis- References
tribution of precipitation intensities using Fourier transform techniques 1. NOAA National Centers for Environmental Information (NCEI). U.S.
(Eq. (19)). This captures the information content—here variance of rain rate billion-dollar weather and climate disasters. https://www.ncei.noaa.
—at different spatial scales. Forecasts whose information content matches gov/access/billions/ (2024).
observations’, at all spatial scales, are more desirable. Power spectral density 2. Al-Fugara, A., Mabdeh, A. N., Alayyash, S. & Khasawneh, A.
is a function of wavelength. To compute PSD across the geography of Hydrological and hydrodynamic modeling for flash flood and
interest, first the Fourier transform is computed in each dimension. The embankment dam break scenario: hazard mapping of extreme storm
Fourier transform has information about different wavelengths, so bins of events. Sustainability 15, 1758 (2023).
wavelengths are created and in each bin, the variance of amplitude of the 3. Alipour, A., Ahmadalipour, A. & Moradkhani, H. Assessing flash flood
Fourier signal is taken. Below is the formula59 for the Fourier transform in hazard and damages in the Southeast United States. J. Flood Risk
one dimension. F(xj) is the Fourier approximation of the signal yj at each of Manag. 13, e12605 (2020).
the n grid cells xj. L is the length (e.g., in kilometers) of the dataset in this 4. Hicks, N. S., Smith, J. A., Miller, A. J. & Nelson, P. A. Catastrophic
dimension. The values of k from 1 through m are the different wavelengths flooding from an orographic thunderstorm in the Central
considered. Then, a0, ak, bk are the Euler-Fourier coefficients that define the Appalachians. Water Resour. Res. 41, W12428 (2005).
signal: 5. National Centers for Environmental Information (NCEI). State Climate
Extremes Committee Memorandum. NOAA. https://www.ncei.noaa.
a0 X m xj xj gov/monitoring-content/extremes/scec/reports/20211220-
Fðxj Þ ¼ þ ak cos 2πk þ bk sin 2πk ð19Þ Tennessee-24-Hour-Precipitation.pdf (accessed 3 Sep 2024).
2 k¼1
L L
6. Chaudhuri, D. Forum article. J. Hydraul. Eng. 126, 395–397 (2000).
The Contiguous Rainfall Area (CRA) method is the first feature- 7. Sheet, S., Banerjee, M., Mandal, D. & Ghosh, D. Time traveling through
based approach developed to evaluate systematic errors in rain system the floodscape: assessing the spatial and temporal probability of
predictions by decomposing total error into components related to floods and susceptibility zones in the lower Damodar basin. Environ.
location, amplitude, and fine-scale pattern differences34,35,60. In the CRA Monit. Assess. 196, 482 (2024).
method, a rain entity is defined using an isohyet (rain rate contour), and 8. Genevois, R. & Tecca, P. R. The vajont landslide: state-of-the-art. Ital.
the forecast entity is translated and rotated over the observed entity until J. Eng. Geol. Environ. 6, 15–39 (2013).
the best fit is achieved based on criteria like minimum squared error, 9. The Watchers. Floods in Egypt, October 2016. https://watchers.
maximum correlation, or maximum overlap60,61. The displacement news/2016/10/29/flood-egypt-october-2016/ (2016).
vector provides the location error. The forecast’s mean squared error 10. Mishra, V. et al. The Kerala flood of 2018: combined impact of extreme
(MSE) is then decomposed into displacement, volume, and pattern rainfall and reservoir storage. Hydrol. Earth Syst. Sci. Discuss. 2018,
errors: 1–13 (2018).
11. Li, X. et al. Evaluating precipitation, streamflow, and inundation
MSEtotal ¼ MSEdisplacement þ MSEvolume þ MSEpattern : ð20Þ forecasting skills during extreme weather events: a case study for an
urban watershed. J. Hydrol. 603, 127126 (2021).
The error decomposition based on correlation optimization is: 12. Schubert, J. E., Luke, A., AghaKouchak, A. & Sanders, B. F. A
framework for mechanistic flood inundation forecasting at the
MSEtotal ¼ ðF XÞ2 þ ðσ X rσ F Þ2 þ ð1 r2 Þσ 2F ; ð21Þ metropolitan scale. Water Resour. Res. 58, e2021WR031279 (2022).
13. Lin, C., Vasić, S., Kilambi, A., Turner, B. & Zawadzki, I. Precipitation
where F and X are the mean forecast and observed values before the shift; σF forecast skill of numerical weather prediction models and radar
and σX are the standard deviations of the forecast and observed values, nowcasts. Geophys. Res. Lett. 32, L14801 (2005).
respectively; and r is the original spatial correlation between the forecast and 14. Marchuk, G. Numerical Methods in Weather Prediction (Elsevier,
observed features. Correcting the forecast location improves its correlation 2012).
with the observations, ropt. Adding and subtracting ropt and rearranging: 15. Jensen, D. G., Petersen, C. & Rasmussen, M. R. Assimilation of radar-
based nowcast into a HIRLAM NWP model. Meteorol. Appl. 22,
MSEdisplacement ¼ 2σ F σ X ðropt rÞ; ð22Þ 485–494 (2015).
16. Yadav, N. & Ganguly, A. R. A deep learning approach to short-term
0 0 2 quantitative precipitation forecasting. In Proceedings of the 10th
MSEvolume ¼ ðF X Þ ; ð23Þ
International Conference on Climate Informatics, 8–14 (ACM, 2020).
17. Espeholt, L. et al. Deep learning for twelve hour precipitation
MSEpattern ¼ 2σ F σ X ð1 r opt Þ þ ðσ F σ X Þ2 : ð24Þ
forecasts. Nat. Commun. 13, 1–10 (2022).
18. Yue, H. & Gebremichael, M. Evaluation of high-resolution rapid refresh 42. Ayzel, G., Heistermann, M. & Winterrath, T. Optical flow models as an
(HRRR) forecasts for extreme precipitation. Environ. Res. Commun. 2, open benchmark for radar-based precipitation nowcasting
065004 (2020). (rainymotion v0. 1). Geosci. Model Dev. 12, 1387–1402 (2019).
19. Ayzel, G., Scheffer, T. & Heistermann, M. Rainnet v1. Geosci. Model 43. Woo, W.-C. & Wong, W.-K. Operational application of optical flow
Dev. 13, 2631–2644 (2020). techniques to radar-based rainfall nowcasting. Atmosphere 8, 48
20. Shi, X. et al. Deep learning for precipitation nowcasting: a benchmark (2017).
and a new model. In Advances in Neural Information Processing 44. Browning, K. A. & Collier, C. G. Nowcasting of precipitation systems.
Systems, 30, NIPS (2017). Rev. Geophys. 27, 345–370 (1989).
21. Bowler, N., Pierce, C. E. & Seed, A. Development of a precipitation 45. Grim, J. A., Pinto, J. O. & Dowell, D. C. Assessing RRFS versus HRRR
nowcasting algorithm based upon optical flow techniques. J. Hydrol. in predicting widespread convective systems over the eastern conus.
288, 74–91 (2004). Weather Forecast. 39, 121–140 (2024).
22. Agrawal, S. et al. Machine learning for precipitation nowcasting from 46. Alexander, C., Carley, J. & Pyle, M. The rapid refresh forecast system:
radar images. Preprint at arXiv https://doi.org/10.48550/arXiv.1912. looking beyond the first operational version. In 28th Conference on
12132 (2019). Numerical Weather Prediction (2023).
23. Ravuri, S. et al. Skilful precipitation nowcasting using deep generative 47. Carley, J. et al. Mitigation efforts to address rapid refresh forecast
models of radar. Nature 597, 672–677 (2021). system (RRFS) v1 dynamical core performance issues and
24. Zhang, Y. et al. Skilful nowcasting of extreme precipitation with recommendations for RRFS v2. Office Note (National Centers for
NowcastNet. Nature 619, 526–532 (2023). Environmental Prediction), 516 (2023). https://doi.org/10.25923/ccgj-
25. Benjamin, S. G. et al. A North American hourly assimilation and model 7140.
forecast cycle: the rapid refresh. Mon. Weather Rev. 144, 1669–1694 (2016). 48. Goodfellow, I et al. Generative Adversarial Nets (Advances in Neural
26. Dowell, D. C. et al. The high-resolution rapid refresh (HRRR): an hourly Information Processing Systems) 2672–2680 (Curran, 2014).
updating convection-allowing forecast model. Weather Forecast. 37, 49. Xue, T., Wu, J., Bouman, K. L. & Freeman, W. T. Visual dynamics:
1371–1395 (2022). stochastic future generation via layered cross convolutional
27. Pulkkinen, S. et al. Pysteps: an open-source Python library for networks. IEEE Trans. Pattern Anal. Mach. Intell. 41, 2236–2250
probabilistic precipitation nowcasting (v1. 0). Geosci. Model Dev. 12, (2018).
4185–4219 (2019). 50. Murphy, A. H. What is a good forecast? An essay on the nature of
28. Pichugina, Y. L. et al. Evaluating the wfip2 updates to the HRRR model goodness in weather forecasting. Weather Forecast. 8, 281–293
using scanning doppler lidar measurements in the complex terrain of (1993).
the Columbia river basin. J. Renew. Sustain. Energy 12, (2020). 51. Schaefer, J. T. The critical success index as an indicator of warning
29. Krajewski, W. et al. Real-time Flood Forecasting for River Crossings. skill. Weather Forecast. 5, 570–575 (1990).
Technical report (University of Nebraska-Lincoln, Mid-America 52. Doswell, C. H. A. R. L., Davies-Jones, R. & Keller, D. L. On summary
Transportation Center, 2018). measures of skill in rare event forecasting based on contingency
30. Gettelman, A. et al. The future of earth system prediction: advances in tables. Weather Forecast. 5, 576–585 (1990).
model-data fusion. Sci. Adv. 8, eabn3488 (2022). 53. Larner, A. J. Assessing cognitive screeners with the critical success
31. National Centers for Environmental Information (NCEI). State Climate index. Prog. Neurol. Psychiatry 25, 33–37 (2021).
Extremes Committee Memorandum (2021). 54. Jolliffe, I. T. & Stephenson, D. B. Forecast Verification: A Practitioner’s
32. Zhang, J. et al. Multi-radar multi-sensor (MRMS) quantitative Guide in Atmospheric Science (John Wiley & Sons, 2012).
precipitation estimation: initial operating capabilities. Bull. Am. 55. Heidke, P. Calculation of the success and quality of wind force
Meteorol. Soc. 97, 621–638 (2016). forecasts in the storm warning service. Geogr. Ann. 8, 301–349 (1926).
33. Necker, T. et al. The fractions skill score for ensemble forecast 56. Hyvärinen, O. A probabilistic derivation of Heidke skill score. Weather
verification. Q. J. R. Meteorol. Soc. No. EGU24-8807 (2024). Forecast. 29, 177–181 (2014).
34. Ebert, E. E. & Gallus Jr, W. A. Toward better understanding of the 57. Roberts, N. M. & Lean, H. W. Scale-selective verification of rainfall
contiguous rain area (CRA) method for spatial forecast verification. accumulations from high-resolution forecasts of convective events.
Weather Forecast. 24, 1401–1415 (2009). Mon. Weather Rev. 136, 78–97 (2008).
35. Ebert, E. E. & McBride, J. L. Verification of precipitation in weather 58. Harris, D., Foufoula-Georgiou, E., Droegemeier, K. K. & Levit, J. J.
systems: determination of systematic errors. J. Hydrol. 239, 179–202 Multiscale statistical properties of a high-resolution precipitation
(2000). forecast. J. Hydrometeorol. 2, 406–418 (2001).
36. Gangrade, S. et al. Unraveling the 2021 central Tennessee flood event 59. Sinclair, S. & Pegram, G. G. S. Empirical mode decomposition in 2-d
using a hierarchical multi-model inundation modeling framework. J. space and time: a tool for space-time rainfall analysis and nowcasting.
Hydrol. 625, 130157 (2023). Hydrol. Earth Syst. Sci. 9, 127–137 (2005).
37. McGovern, A. et al. The value of convergence research for developing 60. Chen, Y., Ebert, E. E., Davidson, N. E. & Walsh, K. J. E.
trustworthy ai for weather, climate, and ocean hazards. npj Nat. Application of contiguous rain area (CRA) methods to tropical
Hazards 1, 13 (2024). cyclone rainfall forecast verification. Earth Space Sci. 5, 736–752
38. Ganguly, A. R. & Bras, R. L. Distributed quantitative precipitation (2018).
forecasting using information from radar and numerical weather 61. Moise, A. F. & Delage, F. P. New climate model metrics based on
prediction models. J. Hydrometeorol. 4, 1168–1180 (2003). object-orientated pattern matching of rainfall. J. Geophys. Res.
39. Germann, U. & Zawadzki, I. Scale-dependence of the predictability of Atmos. 116, D12108 (2011).
precipitation from continental radar images. Mon. Weather Rev. 130, 62. Government Accountability Office. Tennessee valley authority:
2859–2873 (2002). additional steps are needed to better manage climate related risks.
40. Prudden, R. et al. A review of radar-based nowcasting of precipitation (2023).
and applicable machine learning techniques. Preprint at arXiv https:// 63. Zipser, E. Rainfall predictability: when will extrapolation-based
doi.org/10.48550/arXiv.2005.04988 (2020). algorithms fail. In 8th Conference on Hydrometeorology, American
41. Liu, Y., Xi, D.-G., Li, Z.-L. & Hong, Y. A new methodology for pixel- Meteorological Society (1990).
quantitative precipitation nowcasting using a pyramid Lucas Kanade 64. Golding, B. W. Nimrod: a system for generating automated very short
optical flow approach. J. Hydrol. 529, 354–364 (2015). range forecasts. Meteorol. Appl. 5, 1–16 (1998).
65. Pierce, C., Seed, A., Ballard, S., Simonin, D. & Li, Z. Nowcasting. In Additional information
Doppler Radar Observations-Weather Radar, Wind Profiler, Ionospheric Supplementary information The online version contains
Radar, and Other Advanced Applications (IntechOpen, 2012). supplementary material available at
66. Houze Jr, R. A. Stratiform precipitation in regions of convection: a https://doi.org/10.1038/s41612-024-00834-8.
meteorological paradox? Bull. Am. Meteorol. Soc. 78, 2179–2196
(1997). Correspondence and requests for materials should be addressed to
Auroop R. Ganguly.
Acknowledgements
This work was supported by National Aeronautics and Space Administration Reprints and permissions information is available at
(NASA) funded project titled “Remote Sensing Data Driven Artificial http://www.nature.com/reprints
Intelligence for Precipitation Nowcasting (RAIN)” under Grant 21-WATER21-
2-0052 (Federal Project ID: 80NSSC22K1138) from the NASA Water Publisher’s note Springer Nature remains neutral with regard to
Resources Program within their Earth Science Applications under their jurisdictional claims in published maps and institutional affiliations.
Applied Sciences Program. The authors also acknowledge the support from
the Northeastern University (NU) focus area Artificial Intelligence for Climate Open Access This article is licensed under a Creative Commons
and Sustainability (AI4CaS), which is a part of The Institute for Experiential AI Attribution-NonCommercial-NoDerivatives 4.0 International License,
(EAI) at NU and supported by both the NU Roux Institute and the NU Office of which permits any non-commercial use, sharing, distribution and
the Provost. reproduction in any medium or format, as long as you give appropriate
credit to the original author(s) and the source, provide a link to the Creative
Author contributions Commons licence, and indicate if you modified the licensed material. You
P.D. and A.R.G. conceptualized and formulated the problem. P.D., A.P., and do not have permission under this licence to share adapted material
N.B. performed the experiments and analyzed the results. N.B. and M.H. derived from this article or parts of it. The images or other third party
worked closely as stakeholders to co-develop case studies and insights, material in this article are included in the article’s Creative Commons
besides pointing to relevant data. T.V. and K.D. helped with machine learning licence, unless indicated otherwise in a credit line to the material. If material
model evaluation and interpretation. D.S. and K.v.W. helped develop is not included in the article’s Creative Commons licence and your intended
hydrologic insights. P.D. and A.R.G. interpreted the results with help from all use is not permitted by statutory regulation or exceeds the permitted use,
authors. P.D. prepared the manuscript primarily with A.P. and A.R.G., while you will need to obtain permission directly from the copyright holder. To
all authors helped in revising and editing. view a copy of this licence, visit http://creativecommons.org/licenses/by-
nc-nd/4.0/.
Competing interests
The authors declare no competing interests. © The Author(s) 2024