Frontier AI security testing is at the center of new agreements announced by the Center for AI Standards and Innovation, part of the National Institute of Standards and Technology. The initiative formalizes expanded collaborations with Google DeepMind, Microsoft, and xAI to examine advanced AI systems before they reach the public. These arrangements also include continued analysis after deployment, along with focused research aimed at improving how emerging AI capabilities are assessed.
π Key Highlights
- CAISI signs agreements with Google DeepMind, Microsoft, and xAI
- Enables pre-deployment and post-deployment AI model evaluations
- More than 40 AI evaluations completed to date
- Developers provide models with reduced safeguards for testing
- Agreements support classified environment testing and interagency collaboration
The updated agreements replace earlier versions and reflect directives issued through the Department of Commerce, alongside the broader framework outlined in Americaβs AI Action Plan. Under the leadership of Secretary Howard Lutnick, CAISI now serves as the primary government contact for industry coordination on AI testing, joint research, and the development of operational best practices tied to commercial systems.
Through these partnerships, government teams gain access to AI models during critical stages of development. This includes early evaluations of systems that are not yet publicly available, as well as follow-up assessments once they are released. According to CAISI, more than 40 such evaluations have already been completed, covering advanced models that remain undisclosed to the public.
The agreements also establish mechanisms for information exchange between developers and government evaluators. In many cases, companies provide versions of their systems with safeguards reduced or removed to allow deeper analysis of potential risks. Testing may take place in classified environments, with participation from multiple government agencies contributing insights through the TRAINS Taskforce, a group focused on national security issues tied to AI.
The structure of these collaborations is designed to adapt as AI technology evolves. By combining early-stage testing, shared research, and coordinated feedback, the agreements aim to improve understanding of AI capabilities while supporting ongoing product refinements and maintaining visibility into global developments in advanced AI systems.
π What This Means (Our Analysis)
These agreements signal a more structured relationship between government and leading AI developers, creating a consistent channel for evaluating systems before they reach the public. That early visibility changes how risks are identified, shifting oversight closer to the development phase rather than reacting after deployment.
At the same time, the inclusion of classified testing environments and interagency input shows a deliberate effort to align AI development with national security priorities. By embedding evaluation into the lifecycle of AI systems, CAISI is positioning itself as a central checkpoint in how advanced models are understood and improved.
π Our Take: This approach could define how future AI governance operates at scale.