MITRE and FAA Introduce Novel Aerospace Large Language Model Evaluation Benchmark

Carbonatix Pre-Player Loader

Audio By Carbonatix

MCLEAN, Va.--(BUSINESS WIRE)--Sep 17, 2025--

The Federal Aviation Administration (FAA) and MITRE are introducing a new benchmark to enable the evaluation and assessment of large language models (LLMs) for aerospace tasks. Given the safety-critical nature of aerospace, it is imperative that LLMs undergo thorough evaluation prior to their integration into systems.

The Aerospace Language Understanding Evaluation (ALUE) benchmark provides a crucial tool for guiding the assurance of LLMs tailored to the unique demands of the aerospace domain. It incorporates diverse datasets and tasks and introduces several metrics for evaluating the correctness of LLM-generated responses.

ALUE is designed to streamline and improve the evaluation and inference of LLMs using aerospace domain-specific information. The versatile benchmark supports custom datasets, open-source and domain-specific LLMs, user-defined prompts, and various quantitative performance metrics. Such evaluations are essential not only for assessing a model’s performance but also for understanding its inherent limitations and potential risks, including issues such as hallucinations, biases, and privacy concerns.

“MITRE has deep expertise in both aviation safety and AI adoption, and is aligned with the FAA’s mission to provide the safest and most efficient aerospace in the world,” said Kerry Buckley, Ph.D., MITRE vice president and director, Center for Advanced Aviation System Development (CAASD). “ALUE allows the FAA and the aerospace community to create a definitive library of diverse and specific aviation nomenclature and terms that will enable the agency to harness the power of AI for tools and tasks that will continuously improve safety and efficiency today and into the future.”

Ongoing work will continue to expand the benchmark’s complexity and scope to address more intricate real-world aerospace challenges. This includes developing tasks for extracting complex information from charts, such as airspace boundaries or navigational aids, which require sophisticated spatial and symbolic reasoning.

Future work will also incorporate tasks that require LLMs to consult external data sources, such as aircraft operational manuals, to determine precise parameters such as flap and thrust settings under specific conditions, moving beyond simple information extraction to knowledge application.

CAASD’s engineers, scientists, and analysts pair cross-disciplinary capabilities with deep mission-centric expertise to deliver impactful solutions to advance aviation and aerospace safety.

ALUE is available via GitHub to airlines, academia, and aerospace stakeholders who are using or considering using LLMs on aerospace data. Active community collaboration is important to enhancing the benchmark with additional curated datasets and tasks, and organizations can run the benchmark on their machines. ALUE is the starting point to ensure the assurance of sophisticated and reliable AI tools for the enhanced safety and efficiency of the National Airspace System.

Reference:Aerospace Language Understanding Evaluation (ALUE): Large Language Benchmark with Aerospace Datasets, AIAA

About MITRE

MITRE’s mission-driven teams are dedicated to driving solutions to our nation’s most pressing challenges. As a not-for-profit research and development organization, MITRE’s staff leverage our unique multi-sponsor vantage point, systems expertise, and innovative solutions to ensure the health, prosperity, and security of our nation. www.mitre.org

View source version on businesswire.com:https://www.businesswire.com/news/home/20250917980616/en/

Media Contact: Jordan Graham at [email protected] 

KEYWORD: UNITED STATES NORTH AMERICA VIRGINIA

INDUSTRY KEYWORD: PROFESSIONAL SERVICES TECHNOLOGY DATA ANALYTICS AEROSPACE MANUFACTURING ARTIFICIAL INTELLIGENCE

SOURCE: MITRE

Copyright Business Wire 2025.

PUB: 09/17/2025 08:00 AM/DISC: 09/17/2025 07:59 AM

http://www.businesswire.com/news/home/20250917980616/en

 

Sponsored Links

Trending Videos

Salem News Channel Today

Trending Videos

On Air & Up Next

  • Best Stocks Now
    9:00AM - 10:00AM
     
    Bill Gunderson provides listeners with financial guidance that is both   >>
     
  • Investing & Trading Live
    10:00AM - 11:00AM
     
    The Investing & Trading Live Radio Show hosted by Josh and Al pulls back the   >>
     
  • Bloomberg Radio
    11:00AM - 12:00PM
     
    Bloomberg Radio is the world's only global 24-hour business radio station.   >>
     
  • Bloomberg Radio
    12:00PM - 1:00PM
     
    Bloomberg Radio is the world's only global 24-hour business radio station.   >>
     
  • The Ramsey Show
    1:00PM - 4:00PM
     
    Millions listen to The Ramsey Show every day for common-sense talk on money.   >>
     

See the Full Program Guide