NEWSLETTER | CYBERSECURITY RESEARCH | APRIL 2026

When TTE Shrinks to Minutes: Can AI Red Teaming Become an Indispensable Layer of Cyber Defense?

Known vulnerabilities are moving from disclosure to exploitation faster than many organizations can patch, validate, or triage. In that environment, periodic testing is no longer enough; defenders need continuous purple teaming powered by autonomous red and blue agents under human authority.

Executive summary

The central question is not whether AI will replace human defenders. The real question is whether human-only security processes can keep up once time-to-exploit and breakout times compress from days into hours and minutes. The evidence increasingly says no. A more defensible operating model is not fully autonomous security, but a layered model in which AI red agents continuously emulate attackers, AI blue agents continuously triage and harden, and humans retain authority over high-impact actions.

Key takeaways

The exploitation window is shrinking fast: Mandiant estimated a mean 2023 time-to-exploit of five days, while VulnCheck found that 32.1% of newly observed exploited CVEs in 1H 2025 already had exploitation evidence on or before CVE issue day (Mandiant, 2024; VulnCheck, 2025).
AI does not need to create magical new offensive capabilities to change the game. Speed, scale, and lower operating cost are already strategically significant advantages for attackers and defenders alike (Google Threat Intelligence Group [GTIG], 2025; National Cyber Security Centre [NCSC], 2025).
AI red teaming is becoming a security speed layer. The National Institute of Standards and Technology (NIST) defines AI red teaming as a structured testing exercise and explicitly notes that human/AI red teaming can be more cost-effective than human-only teams (NIST, 2024).
Prompt-only guardrails are not a hard security boundary. Official OpenClaw documentation says system-prompt guardrails are soft guidance; hard enforcement must come from tool policy, approvals, sandboxing, and allowlists (OpenClaw, n.d.-a).

The sections below translate these signals into an operating model for security teams.

Table 1. Signals that the defender's time budget is collapsing

Signal	Latest evidence	Why it matters
Mean TTE	Mandiant's analysis of 2023 in-the-wild exploited vulnerabilities estimated an average time-to-exploit of five days. It also found that 12% of n-day vulnerabilities were exploited within one day and 29% within one week (Mandiant, 2024).	Even when a patch exists, defenders may have only days—or less—to validate exposure and respond.
Same-day exploitation	VulnCheck reported that 32.1% of CVEs newly added to its exploited-vulnerability dataset in 1H 2025 had exploitation evidence on or before CVE issue day, up from 23.6% in 2024 (VulnCheck, 2025).	Disclosure and weaponization are increasingly overlapping events.
Zero-day prevalence	The NSA reported that 11 of the 15 most exploited vulnerabilities in 2023 were initially used as zero-days; in 2022, the comparable figure was two (National Security Agency, 2024).	Defenders cannot assume awareness begins with public disclosure.
Operational tempo	CrowdStrike reported that the average eCrime breakout time in 2025 fell to 29 minutes, with the fastest observed breakout at 27 seconds (CrowdStrike, 2026).	Once an intrusion starts, human-only investigation loops can be too slow.

1. The exploitation window is collapsing

Time-to-exploit (TTE) measures the gap between disclosure and confirmed exploitation. It is not identical to breakout time, which measures how quickly an intruder moves after initial access. But together, these two clocks describe the same strategic problem: defenders are running out of time. Mandiant's 2024 analysis of 2023 trends found a mean TTE of five days, with a material share of n-day vulnerabilities exploited within one day or one week of patch availability (Mandiant, 2024). VulnCheck's 1H 2025 dataset then showed that nearly one-third of newly observed exploited CVEs already had exploitation evidence on or before the day the CVE was issued (VulnCheck, 2025).

That is why the UK National Cyber Security Centre (NCSC) assesses AI-assisted vulnerability research and exploit development as likely the most significant near-term AI-cyber development. In the NCSC's 2025 assessment, the gap between disclosure and exploitation had already shrunk to days, and AI was judged almost certain to reduce it further (NCSC, 2025). Meanwhile, the NSA's annual review of the most exploited vulnerabilities showed how frequently attackers are operating before defenders receive the normal benefits of disclosure-driven awareness: 11 of the 15 most exploited vulnerabilities in 2023 were initially exploited as zero-days (National Security Agency, 2024).

The tempo problem becomes even more severe after initial access. CrowdStrike reported an average eCrime breakout time of 29 minutes in 2025, with the fastest observed breakout in only 27 seconds (CrowdStrike, 2026). Every organization will not experience exactly the same timing, but the directional lesson is unambiguous: patching, scanning, and analyst-led validation can no longer be treated as slow, batch-oriented workflows.

2. Why AI red teaming is moving from optional to essential

NIST defines AI red teaming as a structured testing exercise used to probe an AI system and uncover flaws and vulnerabilities, often in a controlled environment and in collaboration with developers (NIST, 2024). That definition is useful far beyond model safety. It captures a broader security need: organizations now require repeatable, adversarial, system-level testing that runs faster and more often than traditional quarterly or annual exercises.

NIST's generative AI profile goes further and explicitly notes that human/AI red teaming can be more cost-effective than human-only teams (NIST, 2024). Microsoft's AI Red Teaming Agent documentation makes the same operational point from another angle: manual red teaming is time- and resource-intensive, while automated scans can help teams evaluate risk at scale and shift left from costly reactive incidents toward proactive testing before deployment (Microsoft, 2026a). In other words, AI red teaming is becoming less of a luxury assessment and more of a throughput technology.

The defensive potential is no longer hypothetical. Project Zero's Naptime and Big Sleep work showed that tool-using AI agents can perform meaningful vulnerability research, and Big Sleep reportedly identified a previously unknown exploitable SQLite issue before release, allowing it to be fixed before users were exposed (Project Zero, 2024a, 2024b). That does not mean current models solve security on their own. It means that defenders now have credible evidence that autonomous or semi-autonomous adversarial testing can produce real value if it is placed inside the right controls and workflows.

3. Attackers and defenders are both becoming agentic

A useful corrective to the hype is Google Threat Intelligence Group's 2025 assessment. GTIG found that state-backed actors were indeed experimenting with generative AI for research, troubleshooting, content creation, scripting help, vulnerability research, payload development, and evasion assistance. At the same time, GTIG concluded that AI was not yet a "game-changer" in the sense of giving threat actors genuinely novel capabilities; rather, it was allowing them to move faster and at higher volume (GTIG, 2025). That nuance matters. The strategic shift is tempo, not magic.

NCSC's 2026 guidance sharpens the implication for defenders. Frontier models are already showing results on specific cyber tasks, and activities that once required specialist skills—such as writing exploit code, understanding system architecture, or using attack tools—can increasingly be automated in some circumstances (NCSC, 2026). NCSC's conclusion is blunt: defenders should assume that at least some attackers already have access to capable AI tools, and defenders must use comparable capabilities to drive defensive advantage (NCSC, 2026).

Tool-using systems are what make this operationally important. Anthropic describes Claude Code as an agentic coding system that reads a codebase, makes changes across files, runs tests, and delivers committed code (Anthropic, n.d.). OpenClaw's official documentation explains the same principle generically: agents become operational through tools such as exec, browser, web search, and messaging rather than through text generation alone (OpenClaw, n.d.-b). Once an AI system can inspect code, query the web, call tools, and act across steps, it becomes relevant to both offensive and defensive security workflows.

4. The Claude Code leak teaches a deeper architectural lesson

In March 2026, public mirrors of leaked Claude Code source exposed a file named cyberRiskInstruction.ts, which encoded a policy boundary for authorized security testing, defensive use, CTF contexts, and educational scenarios while refusing destructive misuse such as denial-of-service, mass targeting, supply-chain compromise, or malicious evasion (Kuberwastaken, 2026; The Register, 2026). That detail is interesting, but the more important lesson is architectural rather than sensational.

The defensible conclusion is not that changing one leaked prompt automatically bypasses every provider-side safeguard. The stronger conclusion is that prompt-level policy is inspectable, forkable, and editable—and therefore insufficient as the primary security boundary for a capable agent. OpenClaw's own security documentation states this explicitly: system-prompt guardrails are only soft guidance, while hard enforcement must come from tool policy, execution approvals, sandboxing, and allowlists (OpenClaw, n.d.-a). NCSC makes the same point at the ecosystem level, warning that safeguards can often be bypassed and, in open-weight systems, removed entirely (NCSC, 2026).

Editorial note

Critical distinction: a leaked or modified prompt does not, by itself, prove that every provider-side safeguard has been removed. The stronger and better-supported lesson is that prompt-only safety cannot be the primary control for a capable security agent.

This is the chilling part for defenders. If a security team sleeps while its tooling assumes that hidden instructions alone will keep an autonomous agent safe, then the organization is relying on the weakest layer in the stack. The right lesson from the leak is not panic about one product. It is a broader design rule: never make the model's secret prompt your core control plane.

5. The target operating model: continuous purple teaming

The answer is not simply "run more pentests." The answer is to convert offensive pressure and defensive hardening into a more continuous operating model. If adversaries are already using AI to compress reconnaissance, vulnerability research, exploit adaptation, and post-compromise speed, then defenders need a continuous purple-team loop in which autonomous or semi-autonomous red agents pressure the environment while blue agents triage, correlate, and recommend hardening in parallel.

Table 2. A practical continuous purple-team operating model

Layer	Primary mission	Typical actions	Human control
Red agent	Continuously apply adversarial pressure to critical paths.	Attack-surface mapping, misconfiguration discovery, exploitability validation, attack-path chaining, pre-production adversary emulation, repeated control testing.	Scopes, targets, and risky actions must be constrained in policy.
Blue / SOC agent	Compress investigation and hardening time.	Alert triage, telemetry correlation, incident summaries, recommended containment, remediation prioritization, ticket drafting, runbook execution for low-risk actions.	Humans approve destructive or business-impacting containment steps.
Human team	Retain authority where context, accountability, and trade-offs matter.	Authorize high-impact actions, assess business risk, adjudicate false positives, redesign controls, approve exceptions, own governance.	Final decision rights remain human.

This model matters because attack simulation is no longer only about proving that a breach is possible. It is about learning what must be defended first. An autonomous red agent can repeatedly map exposed paths, validate exploitability, test privilege escalation hypotheses, and identify the shortest route to business impact. A blue or SOC agent can then classify findings, correlate them with telemetry, summarize likely attack narratives, prioritize remediation, and prepare containment recommendations before a human operator intervenes.

Vendor tooling already points in this direction. Microsoft's Phishing Triage Agent is designed to reduce repetitive investigation work and accelerate response for user-reported phishing incidents (Microsoft, 2026b). Microsoft's Vulnerability Remediation Agent performs automated evaluations to identify and prioritize vulnerabilities on managed devices, but its behavior is explicitly limited by the permissions of the account under which it runs (Microsoft, 2026c). Anthropic's security offering similarly emphasizes multi-stage verification and human review before patches are applied (Anthropic, 2026). The pattern is clear: machine-speed analysis plus explicit human authority.

6. A practical adoption blueprint

First, start where compressed timing hurts most: internet-facing assets, identity infrastructure, VPN and edge systems, CI/CD paths, production secrets, and agentic applications with tool access. Second, deploy red agents in read-only or lab environments first, then expand into tightly scoped production validation as tooling, logging, and approval gates mature. Third, place hard boundaries below the model: least-privilege identities, secret isolation, sandboxed execution, tool allowlists, approval flows for sensitive actions, and durable audit logs (OpenClaw, n.d.-a; Microsoft, 2026c).

Fourth, put blue agents where human bottlenecks already exist: alert triage, case summarization, event correlation, initial scoping, and remediation drafting. Fifth, measure exposure reduction rather than novelty. Useful metrics include time to validate exploitable paths, recurrence of the same attack path across exercises, mean time to detect, mean time to contain, patch lead time on validated findings, and the percentage of critical attack paths that are continuously re-tested. The goal is not to build theatrical autonomy. The goal is to buy back defender time and focus it where human judgment matters most.

Conclusion

AI red teaming is becoming indispensable not because human defenders are obsolete, but because human-only security cycles are too slow for the pace of exploitation. If attackers use AI to research, weaponize, and adapt faster, defenders need AI to simulate, detect, prioritize, and harden faster as well. The winning pattern is not AI instead of the SOC. It is AI inside a continuously exercised, human-governed purple-team model.

References

Anthropic. (2026, February 20). Making frontier cybersecurity capabilities available to defenders. https://www.anthropic.com/news/claude-code-security

Anthropic. (n.d.). Claude Code. Retrieved April 7, 2026, from https://www.anthropic.com/product/claude-code

CrowdStrike. (2026, February 24). 2026 CrowdStrike global threat report: AI accelerates adversaries and reshapes the attack surface. https://www.crowdstrike.com/en-us/press-releases/2026-crowdstrike-global-threat-report/

Google Threat Intelligence Group. (2025, January). Adversarial misuse of generative AI. https://services.google.com/fh/files/misc/adversarial-misuse-generative-ai.pdf

Kuberwastaken. (2026). cyberRiskInstruction.ts [Source code]. GitHub. Retrieved April 7, 2026, from https://github.com/kuberwastaken/claude-code/blob/main/constants/cyberRiskInstruction.ts

Mandiant. (2024, October 15). How low can you go? An analysis of 2023 time-to-exploit trends. https://cloud.google.com/blog/topics/threat-intelligence/time-to-exploit-trends-2023

Microsoft. (2026a, February 27). AI Red Teaming Agent. https://learn.microsoft.com/en-us/azure/foundry/concepts/ai-red-teaming-agent

Microsoft. (2026b, February 22). Security Copilot Phishing Triage Agent in Microsoft Defender. https://learn.microsoft.com/en-us/defender-xdr/phishing-triage-agent

Microsoft. (2026c, April 1). Vulnerability Remediation Agent overview and set up. https://learn.microsoft.com/en-us/intune/copilot/agents/vulnerability-remediation-agent

National Cyber Security Centre. (2025, May 7). Impact of AI on cyber threat from now to 2027. https://www.ncsc.gov.uk/report/impact-ai-cyber-threat-now-2027

National Cyber Security Centre. (2026, March 30). Why cyber defenders need to be ready for frontier AI. https://www.ncsc.gov.uk/blogs/why-cyber-defenders-need-to-be-ready-for-frontier-ai

National Institute of Standards and Technology. (2024). Artificial intelligence risk management framework: Generative artificial intelligence profile (NIST AI 600-1). https://doi.org/10.6028/NIST.AI.600-1

National Security Agency. (2024, November 12). CISA, NSA, and partners issue annual report on top exploited vulnerabilities. https://www.nsa.gov/Press-Room/Press-Releases-Statements/Press-Release-View/Article/3961769/cisa-nsa-and-partners-issue-annual-report-on-top-exploited-vulnerabilities/

OpenClaw. (n.d.-a). Security. Retrieved April 7, 2026, from https://docs.openclaw.ai/gateway/security

OpenClaw. (n.d.-b). Tools and plugins. Retrieved April 7, 2026, from https://docs.openclaw.ai/tools

Project Zero. (2024a, November 1). From Naptime to Big Sleep: Using large language models to catch vulnerabilities in real-world code. https://projectzero.google/2024/10/from-naptime-to-big-sleep.html

Project Zero. (2024b, June 20). Project Naptime: Evaluating offensive security capabilities of large language models. https://projectzero.google/2024/06/project-naptime.html

The Register. (2026, March 31). Anthropic accidentally exposes Claude Code source code. https://www.theregister.com/2026/03/31/anthropic_claude_code_source_code/

VulnCheck. (2025, July 30). State of exploitation: A look into the 1H-2025 vulnerability exploitation landscape. https://www.vulncheck.com/blog/state-of-exploitation-1h-2025