Home How to Build an Internal Purple Team in 90 Days

Detection & Response

How to Build an Internal Purple Team in 90 Days

April 26, 2026

Most internal red teams drift toward purple work within their first eighteen months — not because leadership planned it, but because the blue team can’t keep up with the attack surface and the red team’s findings rot in PDFs nobody operationalizes. The fix is to start purple deliberately. A 90-day buildout gets a small team running structured exercises, feeding production detections, and producing executive-readable metrics before the budget cycle closes.

This guide assumes a single security organization with an existing SOC, some form of EDR and SIEM, and one to three people who can be partially or fully reallocated to a purple function. No Cobalt Strike license, no managed service, no consultants. The goal at day 90 is not maturity — it’s a repeatable monthly exercise loop with defensible scoring and a detection backlog that survives the first staff change.

What “Purple Team” Actually Means Here

A purple team is not a third headcount group standing between red and blue. It’s a methodology where offensive and defensive operators run an exercise together, in real time, against a pre-agreed set of attacker techniques, with both sides watching the telemetry as it happens. Adversary emulation — replicating the specific tactics, techniques, and procedures (TTPs) of a known threat actor — is the engine. MITRE ATT&CK technique IDs (T1059.001, T1003.001, etc.) are the shared vocabulary that makes “did you see it?” a precise question instead of a vibe check.

The collaboration model matters more than the tooling. SCYTHE’s Purple Team Exercise Framework v4 (PTEF v4), released to address a common failure pattern — exercises that produce reports nobody reads — replaces binary pass/fail scoring with a 0–5 detection scale and integrates findings directly into a detection engineering lifecycle. Alfie Champion’s Practical Purple Teaming, published in September 2025, makes the same argument from a practitioner angle: the value is in repeatability and shared goals, not in a one-off engagement.

Team Models Compared

Red, Blue, and Purple Don’t Solve the Same Problem

Red

Stealth simulation

Tests whether the SOC catches a determined attacker. Output: an after-action report. Cadence: quarterly to annual.

Blue

Defense operations

Detects, investigates, contains. Output: incidents handled, detections written. Cadence: continuous.

Purple

Collaborative validation

Runs known TTPs openly with both sides watching. Output: validated production detections, scored gaps. Cadence: monthly.

Days 1–30: Foundation and First Exercise

The first month is about deciding what you’ll test, what you’ll measure, and proving the loop works once.

Pick a sponsor and a charter. Without a named executive sponsor — typically the CISO or SOC director — purple work loses to incident response every time the SOC gets busy. The charter is one page: scope, cadence, decision rights for production changes, and a rule that exercise findings get a backlog ticket within 48 hours. Document the rules of engagement (RoE) covering production access limits, data handling, and abort conditions.

Inventory your telemetry. Before any exercise, confirm what you can actually see. For Windows endpoints this means Sysmon (a Microsoft tool that produces detailed process, network, and file event logs) deployed with a curated config like sysmon-modular or SwiftOnSecurity’s config, plus PowerShell script-block logging and command-line auditing enabled. For Linux, auditd with a baseline rule set. Cloud workloads need provider-native logs (CloudTrail, Azure Activity, GCP Audit) shipping to your SIEM. Gaps you find here are findings even before you run an exercise.

Build the lab. A purple team needs a non-production environment that mirrors a slice of the real estate — typically a domain controller, two or three workstations, one Linux server, and the same EDR and SIEM agents production runs. Snapshots are non-negotiable; you’ll be running destructive tests and need to reset in minutes. Splunk’s Attack Range and similar projects automate this for teams that don’t want to hand-build.

Run a baseline exercise. Pick five techniques mapped to a recent threat actor your CTI function flags as relevant. For most enterprises in 2026 that means credential access (T1003.001 LSASS dumping), defense evasion (T1112 registry modification), and persistence (T1547.001 registry run keys) — techniques you should already detect, run to confirm you actually do. Use Atomic Red Team for the first round; its tests are small, scoped to a single technique, and include cleanup commands. Score each technique 0–5: 0 means no telemetry exists, 5 means automated containment fired.

Days 31–60: Cadence and Detection Engineering

Month two converts the one-off into a process.

Schedule the loop. A monthly exercise covering 8–15 techniques, plus weekly 30-minute “atomic” runs against single techniques to validate new detections. Lock the dates a quarter ahead so the SOC plans capacity around them. Each exercise has four phases: CTI handoff (red picks the actor and TTPs), preparation (blue confirms telemetry coverage), execution (live, both teams in the same channel), and lessons learned (within five business days, or it doesn’t happen).

Stand up the detection engineering pipeline. Every exercise produces three outputs: detections that worked, detections that need tuning, and gaps requiring net-new rules. The gap and tuning items go into a tracked backlog with named owners and a rolling SLA — typically 30 days for a first draft detection, 60 to production. Sigma rules for portability across SIEMs, with vendor-specific translations (KQL for Microsoft Sentinel, SPL for Splunk, ES|QL for Elastic) maintained alongside.

Add adversary emulation depth. Atomic Red Team gets you 1,225 tests across 261 ATT&CK techniques, but the tests are isolated — they don’t chain into a campaign. MITRE Caldera adds an agent-based command-and-control model that lets you run multi-step operations representing a full intrusion. Note that Caldera had a serious unauthenticated RCE — CVE-2025-27364, patched in v5.1.0 in February 2025 — so confirm you’re on a current build and never expose the server publicly. For cloud-heavy environments, Datadog’s Stratus Red Team covers AWS, Azure, GCP, and Kubernetes techniques the on-prem tools miss.

Track the metrics that actually matter. Volume metrics (“alerts handled”) tell leadership nothing. The metrics that prove a purple program is working are detection coverage by ATT&CK tactic, mean time to detect (MTTD) per technique, the percentage of tested techniques that produce actionable alerts, and the count of new production detections shipped per quarter. NIST CSF 2.0 and recent SEC disclosure rules have pushed these from nice-to-have toward audit-relevant.

90-Day Build Plan

What to Ship Each Month

Days

1–30

Foundation

Sponsor secured. Charter and RoE signed. Telemetry inventory complete. Lab built with snapshots. First baseline exercise: 5 techniques, scored 0–5, lessons-learned doc filed.

Days

31–60

Cadence

Monthly exercise schedule locked. Detection backlog with SLAs running. Caldera deployed for multi-step emulation. Sigma rules in version control. MTTD and coverage metrics dashboard live.

Days

61–90

Operationalize

Three exercises completed and trended. Executive metrics packet shipped to sponsor. CTI feeds driving TTP selection. Cloud techniques added via Stratus. Q2 plan with named adversaries on the calendar.

Days 61–90: Operationalize and Defend the Program

Month three is where most internal programs die. The exercises run fine, but the findings stop converting to detections, the metrics stop reaching leadership, and within two quarters the team is back to ad-hoc red work.

Trend the data. With three exercises in the books, you can show movement: detection coverage as a percentage of tested techniques, MTTD trending down on previously tested TTPs, and the cumulative count of production detections shipped. SCYTHE’s PTEF v4 maturity model gives you a five-level self-assessment to benchmark against — re-running it after every three exercises shows whether the program is actually maturing or just running in place.

Tie TTPs to threat intelligence. By day 90, exercise selection should be driven by what your CTI function says is targeting your sector, not by what’s convenient. If your CTI team flags an ALPHV/BlackCat-style affiliate or a Scattered Spider-style social engineering chain, that’s the next exercise. The MITRE Center for Threat-Informed Defense maintains adversary emulation plans you can use as starting points; the adversary_emulation_library repo bundled with Caldera includes ready-to-run plans.

Build the executive narrative. The metrics packet that goes to the CISO and ultimately the board is short: techniques tested this quarter, percentage with high-fidelity detection, detection backlog burndown, and one or two specific examples where a purple finding became a production detection that later fired on a real event. That last one is the program’s life insurance.

Tooling Reference

A 90-day buildout doesn’t need every tool — picking two or three from this list and using them well beats deploying five and maintaining none.

Open-Source Stack

Core Tooling for an Internal Purple Program

Tool

Role

Notes

Atomic Red Team

Emulation

1,225 tests across 261 ATT&CK techniques. PowerShell-driven, cleanup commands included. Best starting point for atomic technique validation.

MITRE Caldera

Emulation

Agent/server platform for chained, multi-step operations. Use v5.1.0+ for the CVE-2025-27364 patch. Never expose the server publicly.

Stratus Red Team

Emulation

Cloud-native equivalent from Datadog. Covers AWS, Azure, GCP, Kubernetes. Roughly 40 atomic tests against the ATT&CK Cloud matrix.

VECTR

Tracking

Free purple team management platform from Security Risk Advisors. Tracks TTPs, scores detections, trends results across exercises.

Sysmon

Telemetry

Microsoft endpoint sensor. Pair with sysmon-modular or SwiftOnSecurity config. Mandatory if Defender for Endpoint isn’t deployed.

Splunk Attack Range

Lab

Automated lab build. Spins up Windows domain, Linux hosts, attacker box, and Splunk SIEM in AWS or local. Resets cleanly.

PTEF v4

Methodology

SCYTHE’s framework. Free and tool-agnostic. Provides 0–5 scoring, gap taxonomy, maturity model, and detection engineering lifecycle.

Pitfalls That Sink 90-Day Programs

Treating purple as a red team side-hustle. If your one offensive engineer also runs the purple program in their spare time, you’ll get exercises that don’t get prepared properly and findings that don’t get tracked. Even with a one-person team, fence the purple work calendar.

No detection ownership. Findings without a named engineer and a date go nowhere. Every gap becomes a ticket, every ticket has an owner, and the lessons-learned meeting reviews the previous backlog before generating new findings.

Scoring with pass/fail. Binary scoring tells leadership nothing. A technique can produce telemetry but no alert (1–2), an alert with no context (3), a high-fidelity alert with response runbook (4), or fully automated containment (5). The 0–5 scale is what makes “we’re improving” provable.

Skipping CTI. A purple program that emulates whatever’s interesting becomes a hobby. The exercises that earn budget are the ones tied to threats your sector actually faces, with named adversaries and current campaigns.

Letting the lab rot. A lab with stale snapshots, drifting agent versions, and no parity with production produces results that don’t transfer. Refresh the baseline image quarterly.

Frequently Asked Questions

Can a one-person team run a purple program? Yes, with reduced cadence. A single operator running monthly exercises against five to eight techniques, with the SOC participating as the blue side, is functional. The bottleneck shifts from running the exercise to landing the detection backlog.

Do we need a license for Cobalt Strike or a commercial BAS platform? No. The full open-source stack — Atomic Red Team plus Caldera plus Stratus — covers the technique surface most enterprises need for the first year. Commercial BAS platforms add value at the continuous-validation tier (PTEF maturity level 4–5), not at buildout.

How does this interact with red team engagements? A standing internal purple program complements rather than replaces periodic stealth red team exercises. The purple loop validates known TTPs continuously; the red engagement tests whether detection holds against an attacker who’s actively trying to evade it. Most teams run one or two stealth engagements per year on top of the monthly purple cadence.

What’s the budget floor? Beyond existing staff time, the meaningful cash costs are lab compute (typically a few hundred dollars per month in AWS or Azure), training for the operators (SANS SEC598 or equivalent), and optional licensing for VECTR’s commercial tier. A first-year program runs comfortably under $50k in net-new spend if SIEM and EDR are already deployed.

What 90 Days Actually Buys You

At day 90 the program isn’t mature — it’s defensible. You have three exercises in the books, a scored detection map against ATT&CK, a backlog that’s actively shipping detections, and a metrics packet your CISO can put in front of the board. That’s the floor. The work that compounds — a CTI-driven exercise calendar, automated regression testing of detections, expansion into cloud and identity, and PTEF maturity advancement — happens over the next twelve months, but only because the first 90 days established the loop.

The teams that fail at this don’t fail at the technical work. They fail at the operating model: no sponsor, no cadence, no backlog ownership, no metrics that survive contact with leadership. Fix those four things in the first month and the rest of the program follows.