Sparse Data Initialization for Machine Learning Weather Prediction Models

Navy SBIR 25.1- Topic N251-064
Office of Naval Research (ONR)
Pre-release 12/4/24   Opens to accept proposals 1/8/25   Closes 2/5/25 12:00pm ET

N251-064 TITLE: Sparse Data Initialization for Machine Learning Weather Prediction Models

OUSD (R&E) CRITICAL TECHNOLOGY AREA(S): Advanced Computing and Software;Sustainment;Trusted AI and Autonomy

OBJECTIVE: Develop a machine learning weather prediction (MLWP) system to generate a complete model analysis, initialization, and produce skillful forecasts from an incomplete, non-gridded set of observations that may include sparse and irregular spatial and temporal sampling of in-situ, remote sensing, satellite, and qualitative forecaster-based information.

DESCRIPTION: In the last few years, there has been a substantial increase in the number of skillful MLWP models that have demonstrated competitive results with state-of-the-art traditional physics-based numerical weather prediction (NWP) systems. However, these efforts not only require multiple decades of high-quality full physics reanalysis information to train the model, but similar quality high-resolution gridded fields from which to initialize the model. This latter point limits the utility of running MLWP systems operationally since full scale traditional data assimilation and reanalysis methods are needed to inform the MLWP model initial conditions.

While MWLP model development requires sufficiently large and balanced datasets to train appropriate physical relationships, it is not clear that a full data assimilation system is needed as with traditional NWP. MWLP time integration is not achieved via discretized partial differential equations, and thus does not share some NWP limitations such as requirements for Courant-Friedrichs-Lewy stability conditions or dynamically balanced states. The need to map real-world data onto a numerically regular grid necessitates smoothing, observation thinning, and methods to spread observational influence in space and time, all of which may lead to a loss of information. This topic solicits innovative machine learning development to build off of recent references (see selection below) that inform methodologies capable of initializing a MWLP model without the need of a full end-to-end NWP-type of data assimilation capability.

Given operational constraints of real-time data quality and quantity, this SBIR topic seeks to scope, prototype, and demonstrate a technique to create a MLWP analysis/initialization capability that: 1) is informed by and transcends state-of-the-science data assimilation methods and practices; 2) accepts a variety of observations and data sources, types, qualities, and characteristics; and 3) can be processed with varying and irregular amounts of data over consecutive model cycles.

PHASE I: Focus on understanding and documenting the technical limitations of initializing MLWP forecast models with sparse observations and formulating innovative concepts to overcome those challenges. Perform a background study of both data assimilation and state-of-the-art forecast analysis methods that will be required to motivate and inform how the proposed effort will address gaps in current processes. Develop a theory and/or simple method to demonstrate the feasibility of the initialization methods to produce robust analyses and stable, skillful forecast fields based on AI/ML techniques. It will be important to properly scope the breadth of analyzed and model environmental variables as well as the appropriate downstream applications for their use.

PHASE II: Using results from the Phase I, develop, demonstrate, and validate an end-to-end prototype MLWP software suite focused on the novel data ingest, initialization, and analysis scheme. Ensure that the toolset must be able to accept widely varying modalities, qualities, and types of observational data, including in-situ state variables, remotely sensed raw and retrieved quantities, gridded background fields of varying age and accuracy, and qualitative assessments of the environment including forecaster notations of important features and locations of environmental phenomena. Ensure that the prototype software must also allow for discontinuities in data streams (temporally or spatially) and have the ability to run or restart with old/degraded information. Intermediate processing outputs of data impacts and sensitivity to the analysis and/or forecast fields is highly desired. The developed workflow should also include robust methods of validation and verification as well as identify strengths and weaknesses of the product compared to traditional NWP modeling. Perform multiple demonstrations in coordination with field testing (may be required). Submit required Phase II deliverables to include regular reporting, participation in program reviews, technical documentation, and the end-to-end prototype software at the conclusion of the effort.

PHASE III DUAL USE APPLICATIONS: Operational hardening and establishing utility and trust for real-time application forms the main effort for transition and commercialization. Dynamic analysis software tools that quickly and accurately convey software system health, error logging and debugging, and processing metadata will need to be created and demonstrated. Develop additional metrics and diagnostics to facilitate expert forecaster guidance on using the product (and comparing to current state-of-the-art weather forecast information). Ensure that the system has a formalized methodology and data/compute needs for model training and a separate, leaner set of requirements for operational runs. Techniques should be generalizable to apply to a variety of environmental modeling use cases such that follow-on work and commercial applications can be addressed.

Dual-use applications will include partnering with other intergovernmental meteorological agencies such as USAF, NOAA, and NASA as well as commercialization for multiple potential markets with decision making requirements based on forecast skill.

REFERENCES:

1. Brajard, Julien et al. "Combining data assimilation and machine learning to emulate a dynamical model from sparse and noisy observations: A case study with the Lorenz 96 model." Journal of computational science 44 (2020): 101171. https://www.sciencedirect.com/science/article/abs/pii/S1877750320304725

2. Irrgang, Christopher et al. "Towards neural Earth system modelling by integrating artificial intelligence in Earth system science." Nature Machine Intelligence 3.8 (2021), pp. 667-674. https://www.nature.com/articles/s42256-021-00374-3

3. Geer, Alan J. "Learning earth system models from observations: machine learning or data assimilation?" Philosophical Transactions of the Royal Society A 379.2194 (2021): 20200089. https://royalsocietypublishing.org/doi/epdf/10.1098/rsta.2020.0089

4. Lam, Remi et al. "GraphCast: Learning skillful medium-range global weather forecasting." arXiv preprint arXiv:2212.12794 (2022). https://arxiv.org/abs/2212.12794

5. Buizza, Caterina et al. "Data learning: Integrating data assimilation and machine learning." Journal of Computational Science 58 (2022): 101525. https://www.sciencedirect.com/science/article/abs/pii/S1877750321001861

6. Cheng, Sibo et al. "Generalised latent assimilation in heterogeneous reduced spaces with machine learning surrogate models." Journal of Scientific Computing 94.1 (2023): 11. https://arxiv.org/abs/2204.03497

KEYWORDS: Data assimilation; initialization; machine learning; artificial intelligence; ai/ml; meteorology; oceanography; METOC; weather; forecast; machine learning weather prediction; mlwp


** TOPIC NOTICE **

The Navy Topic above is an "unofficial" copy from the Navy Topics in the DoD 25.1 SBIR BAA. Please see the official DoD Topic website at www.dodsbirsttr.mil/submissions/solicitation-documents/active-solicitations for any updates.

The DoD issued its Navy 25.1 SBIR Topics pre-release on December 4, 2024 which opens to receive proposals on January 8, 2025, and closes February 5, 2025 (12:00pm ET).

Direct Contact with Topic Authors: During the pre-release period (December 4, 2024, through January 7, 2025) proposing firms have an opportunity to directly contact the Technical Point of Contact (TPOC) to ask technical questions about the specific BAA topic. Once DoD begins accepting proposals on January 8, 2025 no further direct contact between proposers and topic authors is allowed unless the Topic Author is responding to a question submitted during the Pre-release period.

DoD On-line Q&A System: After the pre-release period, until January 22, at 12:00 PM ET, proposers may submit written questions through the DoD On-line Topic Q&A at https://www.dodsbirsttr.mil/submissions/login/ by logging in and following instructions. In the Topic Q&A system, the questioner and respondent remain anonymous but all questions and answers are posted for general viewing.

DoD Topics Search Tool: Visit the DoD Topic Search Tool at www.dodsbirsttr.mil/topics-app/ to find topics by keyword across all DoD Components participating in this BAA.

Help: If you have general questions about the DoD SBIR program, please contact the DoD SBIR Help Desk via email at [email protected]


[ Return ]