Physics-based Data Augmentation for Machine Learning (ML) Models

Navy SBIR 25.1- Topic N251-039
Naval Sea Systems Command (NAVSEA)
Pre-release 12/4/24   Opens to accept proposals 1/8/25   Closes 2/5/25 12:00pm ET    [ View Q&A ]

N251-039 TITLE: Physics-based Data Augmentation for Machine Learning (ML) Models

OUSD (R&E) CRITICAL TECHNOLOGY AREA(S): Advanced Computing and Software;Trusted AI and Autonomy

The technology within this topic is restricted under the International Traffic in Arms Regulation (ITAR), 22 CFR Parts 120-130, which controls the export and import of defense-related material and services, including export of sensitive technical data, or the Export Administration Regulation (EAR), 15 CFR Parts 730-774, which controls dual use items. Offerors must disclose any proposed use of foreign nationals (FNs), their country(ies) of origin, the type of visa or work permit possessed, and the statement of work (SOW) tasks intended for accomplishment by the FN(s) in accordance with the Announcement. Offerors are advised foreign nationals proposed to perform on this topic may be restricted due to the technical data under US Export Control Laws.

OBJECTIVE: Develop a tool to synthesize realistic physics-based sonar data for use in training Artificial Intelligence/Machine Learning (AI/ML) algorithms to enable rapid approaches to fielding sonar-oriented AI/ML capabilities.

DESCRIPTION: For imagery and vocal audio, tools exist to allow individuals to generate realistic audio and video clips for speeches, also known as deep fakes. These tools use a variety of AI/ML tools and limited exemplars of training data.

For sonar, there are tools to compute representative acoustics on sonar arrays to support sailor training objectives. Recording data at sea is currently used to obtain training data for sonar signal processing and it is cost prohibitive to obtain the quantity of data required to train AI/ML algorithms. The complex, physics-based models used in current simulations require a fundamental understanding of the entire phenomenon in question and requires extreme computational power. Data-generation tools exist in industry. However, these tools are not oriented toward sonar and existing tools are not sufficient to develop dynamic scene content covering 360 degrees at extended ranges to support mid-frequency sonar (1 kHz to 10 kHz) across the worldwide range of bathymetric, weather, volume scattering, and contact density conditions. Innovation is required to support the generation of phenomenologically representative data sets. The Navy seeks a tool to synthesize realistic physics-based sonar data for use in training AI/ML algorithms to enable rapid approaches to fielding sonar-oriented AI/ML capabilities. Currently, there are no commercial tools that can do this.

Success with video and vocal audio generation using AI/ML tools suggests that it may be possible to combine recorded exemplars obtained during exercises such as Rim of the Pacific (RIMPAC) with physics-based contact attributes to generate high quality sonar data. The primary use for this generated data would be to train emerging AI/ML algorithms.

AI/ML synthesis tools can enable development of realistic synthetic sonar data for use in training AI/ML algorithms. A limiting factor is the availability of recorded training data and the absence of recorded data from real-world conflict situations involving realistic numbers of enemy contacts. High-quality synthesis approaches that utilize AI/ML would provide an alternate means to creating the large volumes of data needed to train detection and classification algorithms.

The solution must include using generative adversarial models and deep predictive coding models. It must be capable of producing large volumes of diverse high-fidelity data to train ML algorithms that will improve target detection, classification, and tracking systems. Metrics for the solution includes computational performance, "image" similarity metrics, and user assessments.

Work produced in Phase II may become classified. Note: The prospective contractor(s) must be U.S. owned and operated with no foreign influence as defined by 32 U.S.C. § 2004.20 et seq., National Industrial Security Program Executive Agent and Operating Manual, unless acceptable mitigating procedures can and have been implemented and approved by the Defense Counterintelligence and Security Agency (DCSA) formerly Defense Security Service (DSS). The selected contractor must be able to acquire and maintain a secret level facility and Personnel Security Clearances. This will allow contractor personnel to perform on advanced phases of this project as set forth by DCSA and NAVSEA in order to gain access to classified information pertaining to the national defense of the United States and its allies; this will be an inherent requirement. The selected company will be required to safeguard classified material during the advanced phases of this contract IAW the National Industrial Security Program Operating Manual (NISPOM), which can be found at Title 32, Part 2004.20 of the Code of Federal Regulations.

PHASE I: Develop a concept for a tool to produce realistic synthetic sonar sequences suitable for training signal processing algorithms that meet the feasibility of parameters in the Description. Feasibility will be established through modeling and analysis of the design.

The Phase I Option, if exercised, will include the initial design specifications and capabilities description to build a prototype solution in Phase II.

PHASE II: Develop and deliver a prototype tool of the realistic synthetic sonar sequences. Demonstrate the tool’s ability to meet the parameters in the Description through testing. Testing will include benchmarking computational performance, "image" similarity metrics compared to recorded sonar exemplars (which will be provided by the government), and user assessments. Validate the prototype through application of the approach for use in a simulation environment. Provide a detailed test plan to demonstrate that the simulation achieves the metrics defined in the Description.

Due to the nature of recorded sonar data, it is probable that the work under this effort will be classified under Phase II (see Description section for details).

PHASE III DUAL USE APPLICATIONS: Support the Navy in transitioning the tool to Navy use in training current Navy sonar signal processing algorithms as well as with training systems or simulators. Work with the IWS 5.0 Undersea Systems program working groups for ML and training to increase the fidelity of the sonar sensor data used for training AI/ML algorithms and used within high fidelity sonar trainers.

The technology developed under this SBIR topic could provide an improved approach to creating dynamic scene content for other DoD programs. If this AI/ML-generated sonar data can be generated with less computational power than current physics-based models, this technology may also be of use in trainers for sailors.

Complex, physics-based models are often used in current simulations. This requires a fundamental understanding of the entire phenomenon in question and requires extreme computational power.

The innovation sought would reduce reliance processing capacity while retaining traceability to physical attributes of sonar returns. This new approach could be used for sensor data prediction and interpolation for scenarios where it is not possible to record data (e.g., wartime conflict situations) or to produce sonar data to train for salvage operations, oil and gas exploration, and border protection.

REFERENCES:

1. Tiu, E.; Talius, E.; Patel, P. et al. "Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning." Nat. Biomed. Eng 6, 2022, pp. 1399-1406 (2022). https://doi.org/10.1038/s41551-022-00936-9

2. Wang, Yaqing; Yao, Quanming; Kwok, James T. and Ni, Lionel M. "Generalizing from a Few Examples: A Survey on Few-shot Learning." ACM Comput. Surveys, 53(3), 2020. https://www.bing.com/ck/a?!&&p=6a3a95f1d66b22afJmltdHM9MTcxMjUzNDQwMCZpZ3VpZD0xMjM1NTcyYy0xNGQ2LTY4NzctMjU1My00Mzc3MTU0MjY5NDImaW5zaWQ9NTIyMg&ptn=3&ver=2&hsh=3&fclid=1235572c-14d6-6877-2553-437715426942&psq=2.+Wang%2c+Yaqing%2c+Yao%2c+Quanming%2c+Kwok%2c+James+T.+and+Ni%2c+Lionel+M.+(2020).+Generalizing+from+a+Few+Examples%3a+A+Survey+on+Few-shot+Learning.+ACM+Comput.+Surveys%2c+53(3)&u=a1aHR0cHM6Ly9hcnhpdi5vcmcvYWJzLzE5MDQuMDUwNDY&ntb=1

3. "AN/SQQ-89(V) Undersea Warfare / Anti-Submarine Warfare Combat System, updated 20 Sep 2021." https://www.navy.mil/Resources/Fact-Files/Display-FactFiles/Article/2166784/ansqq-89v-undersea-warfare-anti-submarine-warfare-combat-system/

4. "The Essential Guide to Quality Training Data for Machine Learning: What You Need to Know About Data Quality and Training the Machine." Cloudfactory. https://www.cloudfactory.com/training-data-guide

5. Rim of the Pacific (RIMPAC) international maritime exercise website. https://www.cpf.navy.mil/RIMPAC/

6. "National Industrial Security Program Executive Agent and Operating Manual (NISP), 32 U.S.C. § 2004.20 et seq. (1993)." https://www.ecfr.gov/current/title-32/subtitle-B/chapter-XX/part-2004

 

KEYWORDS: Train emerging AI/ML algorithms; AI/ML synthesis tools; High-quality synthesis approaches that utilize AI/ML; mid-frequency sonar; deep predictive coding models; physics-based contact attributes


** TOPIC NOTICE **

The Navy Topic above is an "unofficial" copy from the Navy Topics in the DoD 25.1 SBIR BAA. Please see the official DoD Topic website at www.dodsbirsttr.mil/submissions/solicitation-documents/active-solicitations for any updates.

The DoD issued its Navy 25.1 SBIR Topics pre-release on December 4, 2024 which opens to receive proposals on January 8, 2025, and closes February 5, 2025 (12:00pm ET).

Direct Contact with Topic Authors: During the pre-release period (December 4, 2024, through January 7, 2025) proposing firms have an opportunity to directly contact the Technical Point of Contact (TPOC) to ask technical questions about the specific BAA topic. Once DoD begins accepting proposals on January 8, 2025 no further direct contact between proposers and topic authors is allowed unless the Topic Author is responding to a question submitted during the Pre-release period.

DoD On-line Q&A System: After the pre-release period, until January 22, at 12:00 PM ET, proposers may submit written questions through the DoD On-line Topic Q&A at https://www.dodsbirsttr.mil/submissions/login/ by logging in and following instructions. In the Topic Q&A system, the questioner and respondent remain anonymous but all questions and answers are posted for general viewing.

DoD Topics Search Tool: Visit the DoD Topic Search Tool at www.dodsbirsttr.mil/topics-app/ to find topics by keyword across all DoD Components participating in this BAA.

Help: If you have general questions about the DoD SBIR program, please contact the DoD SBIR Help Desk via email at [email protected]

Topic Q & A

1/9/25  Q. The topic states: “The solution must include using generative adversarial models and deep predictive coding models.” Can you explain why deep predictive coding models must be included ?
   A. The authors believed the state of the art at the time the topic was written indicated that deep predictive coding models would be highly beneficial. However there may be other ML or physics-based approaches that could be better. We do recommend that the proposal address the reason your proposed technology is superior to any particular technology the authors wrote into the topic a year or so ago.
1/8/25  Q. The solicitation states that: The solution must include using generative adversarial models and deep predictive coding models. However, there may be alternative AI/ML models that can achieve the topic’s objectives. Is it acceptable to use an AI/ML solution that DOES NOT use generative adversarial models and deep predictive coding models, or will this solution be considered non-responsive to the topic’s requirements?
   A. There may be other ML or physics-driven approaches that may work as well or better than generative adversarial models and deep predictive coding models. We do recommend that the proposal explain why the proposed approach will be superior to generative adversarial models and deep predictive coding models, rather than risk being found unresponsive to the topic.
1/8/25  Q. 1. Is the synthetic data being generated in the form of a return signal from the sonar or resulting sonar image or both?
2. Is accurate metadata expected to also be in the header of the data?
   A. 1. The machine learning operates based on processed data, which includes the feature decomposition (e.g., signal to noise ratio (SNR), extent) of the cluster that reflects a sonar reflection.
2. As the truth and labeling of generated data is known, it would be appropriate for generated data to contain associated truth and labels, similar to how recorded data includes the (painfully-reconstructed) truth and labels.
1/5/25  Q.
  1. What specific attributes or parameters should the synthetic sonar data include (e.g., frequency ranges, environmental factors, or specific contact densities)?
  2. Are there any preferred physics-based contact models or simulation tools that the solution should integrate or draw inspiration from?
  3. Beyond computational performance and “image” similarity metrics, are there other criteria or benchmarks for evaluating the quality and realism of the synthetic sonar data?
  4. Will the government provide exemplar data (e.g., from exercises like RIMPAC) during Phase II, and if so, what formats or volumes can be expected?
  5. Should the tool prioritize generating data for specific operational scenarios, such as conflict situations, or cover a wide range of general use cases?
  6. What role will user assessments play in refining the tool? Should the solution include interfaces for operators to provide feedback or adjust parameters in real-time?
  7. For non-military contexts (e.g., oil exploration, border protection), are there specific functionalities or enhancements that should be considered during development?
   A.
  1. If companies are not familiar with generating acoustic data, we recommend the 2008 report regarding the Sonar Simulation Toolset (https://apl.uw.edu/research/downloads/publications/tr_0702.pdf&usg=AOvVaw3_VvUthWNO1-Uo_jz3Z1b6&opi=89978449) to get an idea of the physical factors that are considered important to correctly model undersea acoustic data. Our current recorded data, used for training the machine learning within our Pulsed Active Sonar capability, includes 100Ks of clusters that represent an echo, as from a surface ship, a submarine, or bathymetry. The machine learning uses features computed from the clusters (e.g., SNR, extent). As active sonar may require more than a single ping to detect and classified, the kinematics of successive clusters is also important.
  2. Because the Phase I must be unclassified, we recommend that the company consider what innovation they will propose to address the topic, then identify what data will best support determining feasibility of their approach. The government declines to mandate an approach that might interfere with the company’s probability of successful feasibility demonstration.
  3. It is left to the company to determine what innovation will address the topic and propose data and efforts to demonstrate feasibility. It is not necessary that the Phase I be restricted to acoustic data, though it should be clear that the proposed innovation is extensible to acoustic data and associated features.
  4. For Phase I the effort will be unclassified and the government expects the company to propose data they can obtain, generate, or simulate. This allows the company to control the success of their feasibility demonstration, allows the company to start day one of Phase I award, and allows the company to share their research with other potential customers without a constant need to request government permission. Classified data will be provided to the company selected for Phase II after award of the Phase II contract. The nature of this data will be discussed with the Phase II selectee to inform their final Phase II proposal.
  5. Initial validation that the tool is reasonable (during Phase II) will likely involve generating data for cases for which recorded data exist. This will likely involve recorded instances where there is a diversity of attributes to supporting databases (e.g., grid size for available bathymetry). Data augmentation will likely be desired for locations, seasons, targets, and conflict situations for which there is not currently sufficient representation in the recorded data sets.
  6. The technology sought under this topic will not likely be used by operators but would be used by subject matter experts (SMEs) to generate data to augment recorded data that has been truthed, labeled, and processed.
  7. The government is delighted to understand the company’s vision for how the technology they plan to develop can be used for purposes that will improve US Government or US commercial capabilities beyond the Navy’s intended transition. As the government does not know the full range of innovations that companies will propose, it is not possible to specify which non-military functionalities or enhancements would be possible.


[ Return ]