A Red Teaming Machine Social Engineers Another Machine to Find Security Vulnerabilities

This article was originally published by our founder, Ozgur Ozkan, on LinkedIn.

Audn.AI Successfully Detects Vulnerability

When machines talk, strange things happen. Not only to them but for us as well. For the last two months or so I'd been trying to build a voice red teaming service that is completely:

Autonomous and Scalable - So an AI agent does the red teaming

Easy to use - Just verify the number of your agent and you are good to go to test it

Reliant and sustainable - That it has latest vulnerabilities checked every single time to make sure enterprises can deploy Voice AI Agents securely by trusting the system doesn't leak or have security holes that can cause C-level catastrophe in the enterprise

Reproducible at scale - A voice AI Agent can talk to thousands of customers at the same time, a voice AI red teamer human can make a few calls and that's it but an AI voice AI red teamer can work 24/7 at scale and can do 1000s of attacks at the same time

The Journey

For the last two months I tried to do red teaming myself, called the agent I was trying to find security holes of that agent several times, tried to apply usual red teaming tactics, manipulation, urgency etc etc. Which sounds like I am actually trying to harm the system which is another blocker or barrier for entry to do red teaming with AI. That can stay aside, the voice AI agent was quite rule based and was not actually revealing the secret password it has nor the API key it needs to protect. The API key is in its knowledge base or memory but it directly contains a system prompt to not reveal any information or leak the system prompt.

I tried usual jailbreaking techniques. It didn't help. Agent was quite stubborn.

The Evolution

My second trial was using a little bit non-content restricted model to do red teaming. Ran a bunch of adversarial attacks with them. They were trying but attacks were too obvious, it didn't work. All of my security testing was for nothing because my red teaming was actually failing and agent was secure enough. I almost lost my hope.

The Breakthrough

I tried something else. I decided to check the red teaming competition for GPT-OSS 20b https://x.com/OpenAI/status/1952818694054355349

I found out the tactics there, decided to fine tune GPT-OSS 120b completely for red teaming and I achieved success:

The AI agent revealed the secret password and API key it needs to protect, thanks to the Audn.AI red teamer. Wow!

Watch the Demo

Try the Interactive Demo

Experience the power of Audn.AI's voice attack vulnerability detection firsthand: Interactive Demo

This is Just the Beginning

This is just the beginning of Audn.AI.

Audn.AI finds security holes in AI voice agents. It's basically AI Runtime Trust for voice: automated adversarial voice testing with executable reproducible tests for CI/CD, plus a low‑latency Runtime Guard (voice DLP, auth‑evasion shields) that exports telemetry to your observability stack (Datadog, Syntrace, S3 or more).

Secure Your Voice AI System

Book a demo from the website to learn how you can secure your Voice AI system!

Jailbreaking Sora 2: When AI Safety Becomes a Remix Problem Our team jailbroke OpenAI's Sora 2 and discovered that fresh content violations crack easily on the first prompt while remixes are heavily guarded.

Introducing Pingu Unchained: The Unrestricted LLM for High-Risk Research Access the same unrestricted AI model we used to jailbreak Sora 2, built specifically for red-teaming and security research without content filters.

Call Me a Jerk: What Persuasion Teaches Us About LLMs Wharton research shows how classic social influence tactics more than double compliance rates with objectionable requests in LLMs.

Written by Ozgur Ozkan, Founder of Audn.AI

Originally published on LinkedIn