When the EU AI Act’s high-risk system obligations take full effect on August 2, 2026, adversarial testing stops being a best practice and becomes a documentation requirement. Organizations operating high-risk AI systems will need to demonstrate that they’ve systematically probed their models for prompt injection, data leakage, jailbreaks, and policy violations — and produce evidence of it. Penalties run to 35 million euros or 7% of global annual turnover, whichever is higher. That regulatory pressure, combined with worldwide AI security spending hitting $25.53 billion in 2026 according to MarketsandMarkets, has turned what was once a research curiosity into a tooling category.
The open-source AI red-teaming stack has consolidated around a handful of tools that actually get deployed: NVIDIA’s Garak, Promptfoo (now part of OpenAI), Microsoft’s PyRIT, DeepTeam, CyberArk’s FuzzyAI, and Giskard. Enterprise platforms — Mindgard, HiddenLayer, HackerOne — wrap similar capabilities in managed services with compliance reporting. This guide covers the tools that matter, what each is genuinely good at, where they overlap, and how to pick the right combination for the AI systems you’re actually trying to secure.
What AI Red-Teaming Tools Actually Do
AI red-teaming tools simulate adversarial attacks against large language models and the applications built around them. The category exists because traditional fuzzers and SAST scanners assume deterministic software — same input, same output — while LLMs produce probabilistic responses that drift across runs. Adversa AI’s 2025 security report found that 35% of real-world AI security incidents traced back to simple prompts, with some leading to losses exceeding $100,000 per incident. The threat surface includes direct and indirect prompt injection, training-data extraction, jailbreaks that bypass safety alignment, PII leakage from RAG context, tool-misuse in agentic systems, and content-policy violations that create regulatory exposure.
Most tools in the category share four primitives: an attack generator that produces adversarial prompts, a target connector that ships those prompts to the model under test, a detector or scorer that judges whether the response constitutes a failure, and a reporter that maps findings to compliance frameworks. Where the tools diverge is in attack philosophy — static probe libraries drawn from published research versus dynamic, application-aware generation — and in how much of the application stack they cover. A model-layer scanner will tell you whether GPT-5 leaks training data; an application-layer scanner will tell you whether your customer-service chatbot, with its specific system prompt and RAG corpus, can be manipulated into discounting orders by 100%.
Garak: The LLM Vulnerability Scanner
Garak — the Generative AI Red-teaming and Assessment Kit — is the most mature open-source LLM scanner. Originally built by Leon Derczynski, it now lives at github.com/NVIDIA/garak under Apache 2.0, with v0.14.0 released on February 4, 2026. The tool’s positioning is straightforward: it’s nmap or Metasploit, but pointed at language models. You give it a target, it runs a battery of probes, it tells you which ones produced unsafe output.
The probe library is Garak’s defining feature. It ships with more than 120 probe modules grouped by attack technique: promptinject for direct prompt injection, dan for the DAN family of jailbreaks, encoding for encoding-based filter bypasses (Base64, ROT-13, quoted-printable, MIME), leakreplay for training-data extraction, packagehallucination for fabricated software-package suggestions that can be weaponized into supply-chain attacks, malwaregen for malware-generation attempts, xss for cross-site-scripting payloads in LLM output, and tap for Tree-of-Attack-with-Pruning. These are static, research-backed attacks, which is both the strength and the constraint — Garak finds known classes of vulnerabilities consistently, but it doesn’t generate novel attacks tuned to your specific application.
Garak supports 23 model backends, including OpenAI, Hugging Face, Anthropic, AWS Bedrock, NVIDIA NIM, and Watsonx. Reports come out as JSONL, with optional integration into the AI Vulnerability Database (AVID) for shared threat intelligence and into NeMo Guardrails for evaluating guardrail effectiveness. The v0.14.0 release added tier-biased security aggregation in reports, a refactored report generator, and updated calibration data. Recent issues on the repository, including PR #1577 for bootstrap confidence intervals on attack-success rates, point at where the project is heading: more statistical rigor in how findings are quantified.
bash
# Install and run a basic encoding-attack scan against GPT-5 nano
pip install garak
export OPENAI_API_KEY="sk-..."
python -m garak --target_type openai --target_name gpt-5-nano --probes encodingPromptfoo: The Application-Aware Test Framework
Where Garak treats the model as the unit of analysis, Promptfoo treats the application as the unit of analysis. The tool generates adversarial inputs tailored to your specific system prompt, RAG corpus, and tool integrations — effectively performing intelligent fuzzing of the prompt space rather than running a fixed probe library. That distinction matters when your vulnerability isn’t “GPT-5 can be jailbroken” but “our travel-booking agent will discount fares by 90% if you ask it nicely in French.”
Promptfoo runs declarative YAML configs through a CLI, integrates with GitHub Actions and other CI/CD systems, and supports the standard provider list — OpenAI, Anthropic, Gemini, Llama, Bedrock, Azure, and custom HTTP endpoints. The platform covers 50+ vulnerability types including direct and indirect prompt injection, PII leaks, jailbreaks, content-policy violations, and API misuse. Built-in presets map findings to OWASP LLM Top 10, OWASP API Top 10, NIST AI Risk Management Framework, MITRE ATLAS, and the EU AI Act, producing audit-ready reports without separate compliance tooling.
The project shifted ground in 2026. On March 9, OpenAI acquired Promptfoo. The tool remains open source under MIT and continues to be developed publicly, with more than 350,000 developers in its install base and adoption at companies including Shopify, Discord, and Microsoft. Recent releases shipped the Hydra multi-turn red-team strategy, which adapts dynamically based on target responses; an OWASP Agentic AI Top 10 preset covering the December 2025 ASI-prefixed risks; multi-modal attack strategies for video and voice models; and a VS Code extension for running scans from the editor. The combination of context-aware attack generation, mature CI/CD integration, and comprehensive compliance mapping has made Promptfoo the default choice for application security teams shipping LLM features.
PyRIT: Microsoft’s Framework, Now an Agent
PyRIT — Python Risk Identification Tool for generative AI — was Microsoft’s internal red-teaming framework, used to test Copilot before public release, then open-sourced under MIT. The repository has had a turbulent year: the original Azure/PyRIT was archived on March 27, 2026, and active development moved to microsoft/PyRIT. Underneath the relocation, the framework has been quietly absorbed into Microsoft’s product surface. The AI Red Teaming Agent in Microsoft Foundry is PyRIT, wrapped in a managed service with project integration, reporting, and Entra-based auth.
PyRIT’s design separates concerns more than its competitors. Targets define endpoints (OpenAI, Azure OpenAI, Hugging Face, custom REST). Converters transform prompts between formats — Base64, Unicode tricks, word-level mutation, character-level perturbation. Orchestrators (recently refactored into Attacks) implement multi-turn or single-turn attack strategies. Scorers evaluate responses using true/false logic, Likert scales, classification, or Azure AI Content Safety. The composability is the point: you can chain a converter that Base64-encodes the prompt, an orchestrator that runs a cross-prompt injection attack (XPIA), and a scorer that checks for PII regex matches. This makes PyRIT particularly strong for security researchers building novel attack pipelines, less so for application teams who want a turnkey scanner.
The framework requires Python 3.10–3.12. Recent breaking changes aligned the OpenAI target with the official openai SDK and unified Azure auth around get_azure_openai_auth, a shortcut wrapper around AsyncDefaultAzureCredential. If you’re already on Microsoft Foundry, the AI Red Teaming Agent is the right entry point and abstracts most of this away.
The Specialists: FuzzyAI, DeepTeam, Giskard
Three open-source tools cover specialized niches the headliners don’t fully address.
FuzzyAI, from CyberArk, focuses on discovering novel jailbreaks through systematic fuzzing rather than running known attack patterns. It implements genetic-algorithm prompt modification, ArtPrompt (ASCII-art-based bypasses that exploit visual reasoning), many-shot jailbreaking, crescendo attacks that gradually escalate harmful requests across turns, and Unicode smuggling. The tool ships with both a CLI and a web UI, supports OpenAI, Anthropic, Gemini, Azure, Ollama, and custom REST APIs, and includes Jupyter notebook examples for research workflows. If your threat model includes adversaries who will spend time crafting novel attacks rather than running off-the-shelf payloads, FuzzyAI complements Garak’s static-probe approach.
DeepTeam by Confident AI is the closest direct competitor to Promptfoo on the application-testing axis. It covers 40+ vulnerability types and 10+ adversarial attack methods, with explicit OWASP Top 10 and NIST AI RMF mapping. Its niche is teams that already use DeepEval for LLM evaluation and want red-teaming in the same framework rather than as a separate tool.
Giskard approaches AI security from the ML-engineering side rather than the application side. It probes traditional ML models and generative systems for evasion attacks, model extraction, data poisoning, and unintended bias, with a test orchestration engine that can deploy thousands of attack variations on demand. For data-science teams running tabular models, computer-vision systems, or recommendation engines alongside LLMs, Giskard handles attack surface that the LLM-specific tools don’t.
Reference: Common Probe Categories Across Tools
The probe taxonomy is reasonably stable across tools. If you’re scoping a red-team engagement or a compliance scan, these are the categories you want covered.
Enterprise Platforms: Mindgard, HiddenLayer, HackerOne
Open-source tools have a steep total-cost-of-ownership curve at scale. You maintain infrastructure, run attack generation yourself, format reports, and chase compliance evidence manually. For organizations under regulatory pressure or running dozens of AI deployments in parallel, enterprise platforms trade source-code transparency for operational coverage.
Mindgard offers automated AI red teaming with continuous monitoring and artifact scanning, plus managed services. HiddenLayer combines model scanning with runtime protection and threat intelligence. HackerOne layers human red-team expertise on top of automated tooling. Promptfoo’s enterprise tier — separate from the OpenAI-acquired open-source project — adds team collaboration, shared dashboards, SOC2, ISO 27001, and HIPAA compliance certifications. Redbolt AI is built on top of Garak, extending the open-source scanner with runtime protection for AI agents and additional reporting features.
Microsoft’s AI Red Teaming Agent in Microsoft Foundry deserves separate mention because it changes the integration story for Azure-native deployments. The agent runs PyRIT under the hood, scans Foundry projects directly, and produces reports inside the same surface where evaluation, tracing, and deployment already live. For teams already on Foundry, it removes the friction of bolting an external tool onto a Microsoft-managed stack.
How to Pick: Tool Selection by Use Case
The honest answer is that most serious AI security teams use more than one tool. Garak and Promptfoo are complementary rather than competing — Garak gives you a model-layer scanner with a deep probe library, Promptfoo gives you an application-layer test framework with CI/CD integration and compliance mapping. Running Garak on the model and Promptfoo on the application catches different classes of failure.
For a working developer building LLM features in a CI/CD pipeline, Promptfoo is the natural starting point: drop in a promptfooconfig.yaml, wire it into GitHub Actions, run a baseline scan against the OWASP LLM Top 10 preset, then expand coverage. For an ML team owning the model itself — fine-tuning, quantizing, evaluating new base models — Garak is the right scanner, with FuzzyAI added when you need to probe for novel jailbreaks. For a security researcher building custom attack pipelines, PyRIT’s compositional design is hard to beat. For an enterprise security organization that needs compliance reporting and managed services, Mindgard, HiddenLayer, or Promptfoo Enterprise reduce the operational load.
Pitfalls and Limitations
Three things to know before running any of these tools in anger.
First, automated scanning has a ceiling. Every published guide on AI red teaming — including Microsoft’s — recommends doing manual expert testing first, then layering automation on top. Automated tools excel at regression and systematic coverage; they cannot match human creativity in finding new attack patterns. Results from Hack The Box’s January 2026 AI security benchmarks make the gap explicit: most frontier models nail Easy-tier challenges but fail Hard-tier scenarios almost completely. Gemini 3 Pro solved 2 Hard challenges, Claude Sonnet 4.5 solved 1, everything else scored zero. Automated red-teaming inherits the same ceiling — the attack quality is bounded by the attack-generator model.
Second, probe fatigue is real. Many vulnerabilities surface as statistical patterns — a prompt-injection attack might succeed 12% of the time rather than 100%. Without confidence intervals on attack-success rates, individual probe results can be misleading. Garak’s recent work on bootstrap confidence intervals (PR #1577) is in the right direction; until that lands, treat single-run results as noisy signals rather than ground truth.
Third, compliance mapping is not compliance. Promptfoo’s OWASP and NIST presets give you organized findings against framework controls; they don’t give you a passed audit. Real compliance for the EU AI Act’s August 2026 deadline requires documented risk-management processes, demonstrated mitigations, and human review — none of which a tool produces by itself. The tools generate the evidence; the program produces the compliance.
Frequently Asked Questions
Is Promptfoo still open source after the OpenAI acquisition? Yes. The core Promptfoo project remains open source under MIT. OpenAI’s acquisition announcement explicitly committed to keeping the open-source project, which continues to receive public releases. The enterprise tier (SOC2, team dashboards, managed deployment) is separate.
Should I use Garak or Promptfoo for OWASP LLM Top 10 coverage? Promptfoo has a built-in OWASP LLM Top 10 preset that maps findings directly to the framework, and OWASP itself lists Promptfoo as a recommended security solution. Garak covers many of the same vulnerability categories but doesn’t ship native OWASP report mapping. For compliance-driven engagements, Promptfoo is faster to results.
Does PyRIT still receive updates after the Azure-to-Microsoft repository move? Yes. The Azure/PyRIT repository was archived in March 2026 and active development continues at microsoft/PyRIT. The framework also powers the AI Red Teaming Agent in Microsoft Foundry, so it has both an open-source path and a managed-service path.
Are these tools enough for EU AI Act compliance? No tool is enough on its own. The EU AI Act requires adversarial testing as part of a risk management system for high-risk AI systems, with full compliance required by August 2, 2026. The tools in this article generate the evidence regulators expect to see, but compliance also requires documentation, governance processes, and human oversight that no scanner produces.
Where the Category Goes Next
Two trends are reshaping the tooling landscape over the next year. The first is the move from single-turn probing to multi-turn, agentic red teaming. Promptfoo’s Hydra strategy, FuzzyAI’s crescendo attacks, and PyRIT’s XPIA orchestrator all point at the same thing: real adversaries don’t fire one prompt and walk away, and neither should the tools that simulate them. The OWASP Agentic Top 10 (December 2025, ASI-prefixed) codifies the threat model. Tools that test agentic systems end-to-end — including tool-use, memory, and planning — will be the differentiator in 2027.
The second trend is consolidation and integration. Promptfoo inside OpenAI, PyRIT inside Microsoft Foundry, Garak inside NVIDIA NeMo Evaluator — open-source AI red-teaming tools are increasingly the engines inside platform-vendor offerings. That makes the tools more accessible to enterprise teams but creates a new dependency: the AI security tooling you use is increasingly built by the same companies whose models you’re testing. Healthy adversarial testing requires preserving at least one tool in your stack that doesn’t share that conflict — which, for most teams, means keeping a vendor-independent open-source scanner like Garak in the rotation alongside whatever managed service does the bulk of the work.






