data classification policy data classification policy

Data Classification Policy: A Realistic Four-Tier Model

Most data classification policies fail the same way: they exist as a PDF, get referenced once during onboarding, and then quietly diverge from how anyone actually handles information. Auditors see a four-tier scheme on paper. Employees see a shared drive where last quarter’s board deck sits next to the office snack list, both unlabeled. The gap between the two is where breaches happen — not because attackers cracked encryption, but because nobody knew the file mattered.

A workable classification policy starts from a different assumption: people will not carefully label every document, and they will not memorize a five-level taxonomy. The policy has to survive that reality. The four-tier model below — Public, Internal, Confidential, Restricted — is the version most organizations end up with after they trim the aspirational version down to what actually gets used. It maps cleanly to NIST SP 800-60 impact levels, satisfies most regulatory frameworks, and gives security teams enough granularity to apply real controls without drowning users in decisions.

Why Four Tiers and Not Three or Five

Three-tier models (Public, Internal, Confidential) collapse two genuinely different risk profiles into one bucket. A customer list and a database of unencrypted payment credentials both end up labeled Confidential, but the controls each demands are wildly different. Security teams compensate by writing exception rules, which is the same problem the classification was supposed to solve.

Five-tier models (often Public, Internal, Sensitive, Confidential, Restricted) suffer the opposite issue. The middle tiers are indistinguishable to the average employee. Ask ten people whether a draft contract is Sensitive or Confidential and you’ll get a roughly even split. Decision fatigue drives everything toward the safe middle, which means the top tier — the one that’s supposed to trigger your strictest controls — gets used for almost nothing.

Four tiers is the floor where you can still differentiate “won’t hurt anyone” from “would end the company” while keeping the choice at any given moment binary in practice. Most documents are obviously Internal. The interesting decisions sit at the Confidential/Restricted boundary, and that boundary is where you concentrate training, examples, and tooling.

The Four Tiers Defined

The labels matter less than the consistency. Some organizations use Top Secret / Secret / Sensitive / Public from government conventions; others use Tier 1 through 4. Pick names your employees won’t have to translate. What matters is that each tier has a clear definition, a defined impact level, and a non-overlapping set of controls.

FOUR-TIER MODEL
Classification Tiers, Impact, and Examples
PUBLIC
IMPACT: NONE
Approved for unrestricted external release. Disclosure causes no harm.
Examples: Marketing collateral, published whitepapers, job postings, press releases, SEC filings already submitted.
INTERNAL
IMPACT: LOW
Default classification. For employees and contractors with a business need. Disclosure is embarrassing but not damaging.
Examples: Org charts, internal wikis, project plans, non-sensitive meeting notes, training materials, internal newsletters.
CONFIDENTIAL
IMPACT: MODERATE
Need-to-know basis. Disclosure causes financial loss, regulatory action, competitive harm, or breach notification obligations.
Examples: Customer PII, employee HR records, contracts, source code, security architecture, pre-release financials, M&A discussions.
RESTRICTED
IMPACT: HIGH
Explicitly named individuals only. Disclosure causes severe financial, legal, regulatory, or reputational damage. Existential risk to the business or its customers.
Examples: Cardholder data (PCI), unencrypted PHI, authentication secrets, signing keys, board materials, incident response details, trade secrets.

Public is the only tier where misclassification is essentially harmless. Default toward labeling things Internal rather than Public when you’re unsure — the reverse mistake is more expensive.

Internal is the workhorse and should be the default for unlabeled material. Most organizations get into trouble by setting Confidential as the default “to be safe,” which trains employees to ignore labels entirely because everything carries the same warning.

Confidential is where regulatory definitions start to bind. If a document contains data covered by GDPR, HIPAA, CCPA, or PCI DSS, it lands here at minimum. The MITRE ATT&CK technique T1530 (Data from Cloud Storage) and T1213 (Data from Information Repositories) target this tier specifically — adversaries hunt for confidential data in misconfigured SharePoint sites and Confluence wikis far more often than they go after the crown jewels directly.

Restricted is where you concentrate your strongest controls because the blast radius justifies the friction. PCI DSS v4.0.1, current as of 2024, requires explicit handling controls for cardholder data that map naturally to this tier. So does anything that would trigger material disclosure under SEC rules adopted in 2023, which require public companies to disclose cybersecurity incidents within four business days of determining materiality.

Mapping Tiers to Controls

The classification only matters if it drives concrete control differences. If Confidential and Restricted documents end up in the same SharePoint site with the same access list, you have one tier with two labels. The control matrix below is the minimum differentiation that justifies the four-tier scheme.

CONTROL MATRIX
Required Controls by Tier
ControlPUBLICINTERNALCONFIDENTIALRESTRICTED
Encryption at restOptionalRequiredRequiredRequired + customer-managed keys
Access modelOpenAll employeesRole-basedNamed individuals + approval
MFAN/ARequiredRequired (phishing-resistant preferred)Phishing-resistant only (FIDO2)
External sharingPermittedNDA requiredNDA + DPA + approvalProhibited without legal sign-off
DLP enforcementNoneMonitorBlock + alertBlock + alert + IR ticket
Audit loggingOptionalStandardRead + write eventsAll access, retained 1y+
Retention reviewN/AAnnualAnnual + scheduled deletionQuarterly + minimum retention
Endpoint storagePermittedManaged devicesManaged + encryptedProhibited (cloud-only)

The matrix matters more than the tier definitions. If you can’t articulate a meaningful control difference between Confidential and Restricted, collapse them. A three-tier policy with real teeth beats a four-tier policy where the top two tiers are labels without consequences.

Where Most Policies Fail

The policy document itself is rarely the problem. The problem is the gap between policy and operational reality, and that gap appears in predictable places.

Default classification is wrong. If unlabeled documents are treated as Internal but your shared drives default to “anyone in the organization can edit,” you’ve made Internal functionally equivalent to Public for anyone willing to copy a link. The default has to match the label, which usually means tightening shared-drive permissions before anyone writes a classification policy at all.

Labels exist but aren’t applied. Microsoft Purview, Google Drive labels, and similar tooling let you require classification at file creation, but most rollouts make labeling optional to avoid user friction. The result is that 80% of files stay unlabeled, which means your DLP and conditional access policies — which trigger on labels — apply to 20% of your data. Mandatory labeling is unpopular but necessary for the model to function.

Aggregation isn’t accounted for. Ten Internal documents combined can produce a Confidential dataset. A spreadsheet of employee names is Internal. The same spreadsheet with employee names, hire dates, salaries, and performance ratings is Confidential. Policies need an aggregation clause that lets data owners reclassify combined datasets, and reviewers need to actually use it.

The Restricted tier is empty. A common audit finding: organizations have a Restricted tier in policy but no documents classified at that level, because the controls are onerous and people route around them. If nothing in your environment qualifies as Restricted, either the tier is vestigial and should be removed, or — more likely — Restricted data exists but is mislabeled. Run periodic discovery scans against the actual data, not against the labels.

No declassification process. Documents accumulate at higher tiers because there’s no mechanism to lower them. Yesterday’s confidential earnings preview becomes today’s published 10-Q. The policy needs a path for data owners to declassify, and an automatic review trigger tied to retention dates.

Frequently Asked Questions

Should we use the same tiers across all data types — documents, databases, code repositories?

Yes, with type-specific control mappings. The tier names should be consistent so employees only learn one taxonomy, but the controls applied to a Confidential database differ from those applied to a Confidential document. Maintain one classification scheme and multiple control matrices indexed by data type.

How does this map to GDPR’s special category data?

Special category data under Article 9 (health, biometric, genetic, sexual orientation, religious belief, etc.) lands in Restricted by default. Standard personal data lands in Confidential. The distinction matters because Article 9 data carries explicit consent requirements and DPIA obligations that map to your highest-tier controls.

Who owns classification — security, legal, or the data owner?

The data owner classifies; security and legal define the framework and adjudicate disputes. If security classifies everything, owners disengage and accuracy collapses. If owners classify without guidance, the tiers drift apart across departments. The model that works: data owners classify with mandatory training, security audits a sample quarterly, legal arbitrates edge cases.

Do we need a separate policy for AI training data and model outputs?

Not a separate policy, but explicit guidance within the existing one. Training data inherits the classification of its sources (a model trained on Confidential customer data is itself Confidential). Outputs require a same-tier-or-stricter rule: the model’s responses are classified at the highest tier of any data it had access to. This avoids the loophole where Restricted source data gets laundered through a model into Internal-tier outputs.

What to Do Next Week

Pick the most-used shared drive in your organization. Run a discovery scan for regulated data — payment data, health data, credentials, customer PII. Compare what you find to how the drive is currently shared. The delta between “what’s there” and “who can see it” tells you whether your existing classification policy is operational or aspirational. Everything else follows from closing that gap.

Add a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Cybersecurity intelligence delivered directly to your inbox.

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Advertisement