Lab 08: InnoDB Internals

Time: 45 minutes | Level: Advanced | DB: MySQL 8.0

Overview

Understanding InnoDB's internal structure — B-tree indexes, clustered vs secondary indexes, page organization, and locking mechanisms — enables you to design schemas and queries that work with the storage engine rather than against it.


Step 1: B-Tree and Clustered Index Structure

docker run -d --name mysql-lab -e MYSQL_ROOT_PASSWORD=rootpass mysql:8.0
for i in $(seq 1 30); do docker exec mysql-lab mysql -uroot -prootpass -e "SELECT 1" 2>/dev/null && break || sleep 2; done

docker exec mysql-lab mysql -uroot -prootpass <<'EOF'
CREATE DATABASE innolab;
USE innolab;

-- Clustered index: the PRIMARY KEY IS the table (leaf pages store full rows)
CREATE TABLE employees (
  emp_id     INT NOT NULL,
  dept_id    INT NOT NULL,
  name       VARCHAR(100),
  salary     DECIMAL(10,2),
  hire_date  DATE,
  PRIMARY KEY (emp_id),                    -- Clustered index on emp_id
  INDEX idx_dept    (dept_id),             -- Secondary index
  INDEX idx_salary  (salary),             -- Secondary index
  INDEX idx_dept_salary (dept_id, salary)  -- Composite secondary index
) ENGINE=InnoDB ROW_FORMAT=DYNAMIC;

-- Insert data
INSERT INTO employees (emp_id, dept_id, name, salary, hire_date)
SELECT 
  seq,
  1 + (seq % 10),
  CONCAT('Employee_', seq),
  30000 + (seq * 7 % 70000),
  DATE_ADD('2020-01-01', INTERVAL (seq % 1460) DAY)
FROM (
  SELECT a.N + b.N * 10 + c.N * 100 + d.N * 1000 + 1 AS seq
  FROM 
    (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 
     UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) a,
    (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 
     UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) b,
    (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 
     UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) c,
    (SELECT 0 AS N UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 
     UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) d
  LIMIT 10000
) nums;

-- Check table stats
SHOW TABLE STATUS LIKE 'employees'\G
EOF

📸 Verified Output:

💡 Data_length = clustered index size (the actual rows). Index_length = all secondary indexes. The clustered index contains BOTH the B-tree structure AND the full row data at leaf nodes.


Step 2: Visualize Clustered vs Secondary Index Lookup

📸 Verified Output:

💡 Covering index = the secondary index contains ALL columns needed by the query. The engine never touches the clustered index. This is often a 2-5x speedup!


Step 3: InnoDB Page Structure

📸 Verified Output:


Step 4: Tablespace — File Per Table

📸 Verified Output:


Step 5: Record Locks vs Gap Locks

📸 Verified Output:

💡 Gap locks prevent INSERT of rows with keys between existing rows in the range. This prevents "phantom reads" in REPEATABLE READ but can cause deadlocks. READ COMMITTED isolation eliminates gap locks.


Step 6: Secondary Index Stores Primary Key

📸 Verified Output:


Step 7: Observe Page Splits with Random vs Sequential Inserts

📸 Verified Output:

💡 Random PK inserts (like UUIDs) cause page splits, leading to fragmentation and ~20-40% larger storage. Use AUTO_INCREMENT for sequential inserts, or use OPTIMIZE TABLE periodically.


Step 8: Capstone — InnoDB Internals Audit

📸 Verified Output:


Summary

Concept
Details
Impact

Clustered index

PK = table (rows stored at leaf nodes)

PK lookup = single tree traversal

Secondary index

Stores (indexed_cols, PK) at leaf nodes

Lookup = 2 tree traversals

Covering index

All needed columns in one index

Eliminates second tree traversal

Page size

16KB (default)

Interior nodes hold ~450 INT keys

ROW_FORMAT=DYNAMIC

Variable-length off-page storage

Better for VARCHAR/TEXT/BLOB

file_per_table

Each table = separate .ibd file

Enables individual table management

Gap lock

Locks range between records

Prevents phantoms in REPEATABLE READ

Record lock

Locks specific row

Concurrent DML coordination

Page splits

Random inserts fragment pages

Use sequential PKs for write efficiency

Key Takeaways

  • The PK IS the data in InnoDB — choose it carefully (prefer INT AUTO_INCREMENT)

  • UUID primary keys cause page splits and ~40% larger indexes — use binary UUID or surrogate INT

  • Covering indexes eliminate "double lookups" — include all SELECT columns when possible

  • Gap locks prevent phantom reads but cause more deadlocks — use READ COMMITTED if that's a problem

  • SHOW TABLE STATUS shows Data_length (clustered index), Index_length, Data_free (fragmentation)

Last updated