Lab 17: DLP Architecture

Time: 50 minutes | Level: Architect | Docker: docker run -it --rm zchencow/innozverse-cybersec:latest bash

Objectives

  • Design a Data Loss Prevention (DLP) architecture

  • Define data classification tiers

  • Implement endpoint, network, and cloud DLP policies

  • Build a Python PII/PAN/SSN/PHI content classifier


Step 1: Data Classification Framework

Four-tier classification model:

Tier
Label
Description
Examples

1

Public

Approved for public release

Press releases, public website

2

Internal

Internal use only

Policies, org charts, project docs

3

Confidential

Restricted; business sensitive

Financial data, strategy docs, contracts

4

Restricted

Highly sensitive; regulatory

PII, PAN, PHI, source code, credentials

Data sensitivity indicators:

PII (Personally Identifiable Information):
  - Name + address combination
  - SSN, passport, national ID
  - Email address (alone or combined)
  - IP address (in some jurisdictions)

PAN (Primary Account Number):
  - Credit/debit card numbers
  - BAN (bank account numbers)

PHI (Protected Health Information):
  - Medical record numbers
  - Health diagnosis + patient identity
  - Insurance member IDs

IP (Intellectual Property):
  - Source code
  - Trade secrets
  - Patents, research data

Step 2: DLP Content Classifier

📸 Verified Output:


Step 3: DLP Deployment Types

Three deployment modes:

1. Endpoint DLP:

  • Agent on laptop/workstation

  • Controls: copy to USB, print, email attachment, screenshot

  • Can work offline (no network required)

  • Examples: Forcepoint DLP, Symantec DLP, Microsoft Purview DLP

2. Network DLP:

  • Inline (blocking) or out-of-band (monitoring)

  • Inspects: email, web upload, FTP, cloud sync

  • Requires SSL/TLS inspection for encrypted traffic

  • Examples: Zscaler, McAfee Network DLP, Palo Alto WildFire

3. Cloud DLP (CASB):

  • Scans data at rest in SaaS (SharePoint, Google Drive, Box)

  • Controls sharing permissions, external sharing, public access

  • API-based inspection of stored documents

  • Examples: Microsoft Purview, Netskope, Skyhigh Security


Step 4: DLP Policy Design

Policy structure:

Policy exceptions:

  • Payment team → can send PAN to approved payment processors

  • Finance → can share financial data to auditors (approved domain)

  • Legal → can export data to legal hold systems

💡 Start with monitoring mode — never deploy DLP in block mode from day one. Run monitor-only for 30-60 days; tune false positives; then progressively enable blocking on highest-risk channels.


Step 5: DLP for Regulated Data

GDPR Article 25 — Data Protection by Design:

  • Privacy considerations built into systems from the start

  • Data minimisation: collect only what's necessary

  • Purpose limitation: use data only for stated purpose

  • DLP is a technical measure for Article 25 compliance

GDPR DLP requirements:

PCI DSS DLP:

  • Prevent PAN transmission outside CDE (Cardholder Data Environment)

  • Monitor for PAN in unexpected locations (DLP scan of file shares)

  • Truncate PAN in logs (mask all but last 4 digits)


Step 6: Content Inspection Techniques

Technique
Description
Use Case

Regex patterns

Match text patterns

SSN, PAN, email, passport

Exact data match (EDM)

Hash-based exact match of records

Employee HR database

Document fingerprinting

Hash-based match of document structure

Confidential templates

ML classification

Model-based content classification

Unstructured text

OCR

Extract text from images/PDFs

Scanned documents

Luhn algorithm

Validate credit card numbers mathematically

Reduce PAN false positives

Luhn algorithm (credit card validation):


Step 7: Insider Threat and DLP

Insider threat indicators in DLP logs:

  • Bulk download before resignation (UEBA + DLP correlation)

  • After-hours uploads to personal cloud storage

  • Repeated DLP policy violations

  • Access to data outside normal job function

UEBA + DLP integration:


Step 8: Capstone — Enterprise DLP Programme

Scenario: Financial services firm; GDPR + PCI DSS; 5,000 employees


Summary

Component
Key Points

Data classification

Public → Internal → Confidential → Restricted

Content inspection

Regex, EDM, fingerprinting, ML, OCR

Endpoint DLP

Agent-based; controls USB, print, screenshot

Network DLP

Inline proxy; inspect email, web, FTP

Cloud DLP

CASB; scan SaaS at rest; control sharing

GDPR Article 25

Privacy by design; DLP as technical safeguard

Luhn check

Validate PAN to reduce false positives

Last updated