Lab 15: Data Governance & Compliance

Time: 50 minutes | Level: Architect | DB: PostgreSQL 15

Data governance ensures databases handle sensitive information in accordance with GDPR, HIPAA, and SOC2 requirements. This lab covers data classification, PII management, audit logging, data retention, and PostgreSQL Row-Level Security for compliance.


Step 1: Data Classification Framework

Data classification is the foundation of all compliance work. You must know what data you have before you can protect it.

Classification tiers:

┌─────────────────────────────────────────────────────────────┐
│                  DATA CLASSIFICATION TIERS                   │
├─────────────────────────────────────────────────────────────┤
│ TIER 1 - PUBLIC     │ Marketing content, public docs        │
│ TIER 2 - INTERNAL   │ Business data, non-sensitive records  │
│ TIER 3 - CONFIDENTIAL│ Financial data, employee records     │
│ TIER 4 - RESTRICTED │ PII, PHI, payment card data (PCI)     │
└─────────────────────────────────────────────────────────────┘

PostgreSQL column classification with comments:

-- Tag columns with classification metadata
COMMENT ON COLUMN users.email        IS 'PII:TIER4:GDPR:encrypted';
COMMENT ON COLUMN users.full_name    IS 'PII:TIER4:GDPR:pseudonymizable';
COMMENT ON COLUMN users.birth_date   IS 'PII:TIER4:GDPR:encrypted';
COMMENT ON COLUMN users.ssn          IS 'PII:TIER4:HIPAA:encrypted:tokenized';
COMMENT ON COLUMN users.created_at   IS 'OPERATIONAL:TIER2';
COMMENT ON COLUMN users.country_code IS 'OPERATIONAL:TIER2';

-- Query all PII columns across the database
SELECT 
    table_name,
    column_name,
    obj_description(
        (table_schema || '.' || table_name)::regclass, 'pg_class'
    ) AS table_comment,
    col_description(
        (table_schema || '.' || table_name)::regclass, 
        ordinal_position
    ) AS classification
FROM information_schema.columns
WHERE table_schema = 'public'
  AND col_description(
    (table_schema || '.' || table_name)::regclass, 
    ordinal_position
  ) LIKE '%PII%';

💡 Store your data inventory in a dedicated data_catalog table for automated compliance reporting. Many teams use tools like Apache Atlas or DataHub for this at scale.


Step 2: PII Column Detection & Schema Design

Design schemas that separate PII from operational data to minimize exposure.


Step 3: Row-Level Security (RLS) for GDPR

PostgreSQL RLS restricts which rows each user/role can see — critical for multi-tenant GDPR compliance.

💡 Always use FORCE ROW LEVEL SECURITY so even the table owner is subject to policies (except superusers). This prevents accidental data exposure in admin tools.


Step 4: Audit Logging

Every access to sensitive data must be logged for SOC2 and HIPAA compliance.


Step 5: GDPR Right to Erasure

GDPR Article 17 requires the ability to erase personal data. True deletion is often replaced with anonymization.

💡 Never truly "delete" a row if foreign keys reference it. Use the anonymization strategy — replace PII with cryptographic hashes. Keep the record skeleton for referential integrity.


Step 6: Data Retention Policies

Different regulations require different retention periods. Automate this with partitioned tables.


Step 7: Encryption Requirements & Data Masking

💡 Use AWS KMS or HashiCorp Vault to manage encryption keys — never store the key in the database config file in production.


Step 8: Capstone — Compliance Automation Framework

Build a complete compliance reporting tool that verifies your database's posture.

Run verification:

📸 Verified Output:


Summary

Concept
Implementation
Regulation

Data Classification

Column comments + catalog table

All

PII Separation

Vault pattern with token references

GDPR, HIPAA

Row-Level Security

PostgreSQL RLS policies per role

GDPR

Audit Logging

Trigger-based, partitioned audit_log

SOC2, HIPAA

Right to Erasure

Anonymization procedure (not hard delete)

GDPR Art. 17

Data Retention

Policy registry + partition drop

SOX, HIPAA, PCI

Column Encryption

pgcrypto pgp_sym_encrypt/decrypt

PCI-DSS, HIPAA

Data Masking

Views with regex masking

GDPR, internal

TLS Enforcement

pg_hba.conf hostssl + ssl_min_protocol

All

Key Management

AWS KMS / HashiCorp Vault integration

PCI-DSS

Last updated