Lab 18: Docker Internals

Time: 40 minutes | Level: Advanced | Docker: docker run -it --rm --privileged ubuntu:22.04 bash

Docker is not magic — it's a well-engineered userspace tool that orchestrates Linux kernel primitives you already know: namespaces (Lab 16), cgroups (Lab 17), and union filesystems. In this lab you'll install Docker inside a container, examine the overlay2 filesystem, trace the runc/containerd/dockerd call chain, inspect container networking internals, and understand exactly what happens when you run docker run.


Step 1: The Docker Architecture — runc → containerd → dockerd

apt-get update -qq && apt-get install -y -qq docker.io
docker --version
runc --version
containerd --version

📸 Verified Output:

Docker version 28.2.2, build 28.2.2-0ubuntu1~22.04.1
runc version 1.3.3-0ubuntu1~22.04.3
spec: 1.2.1
go: go1.23.1
libseccomp: 2.5.3
containerd github.com/containerd/containerd 1.7.28

The call chain when you run docker run ubuntu:22.04 bash:

User


dockerd (Docker daemon — API server, image management, networking)
  │  speaks: OCI Image Spec, OCI Runtime Spec

containerd (high-level container runtime — lifecycle, snapshots, content store)
  │  uses: containerd-shim-runc-v2 (keeps container alive after containerd restart)

runc (low-level OCI runtime — actual syscalls: clone, unshare, pivot_root)
  │  calls: clone(CLONE_NEWPID|CLONE_NEWNET|CLONE_NEWNS|...)

kernel namespaces + cgroups

📸 Verified Output:

💡 OCI = Open Container Initiative. Both the image format and runtime spec are OCI standards. This means you can use runc directly with any OCI-compliant image, bypassing Docker entirely.


Step 2: The /var/lib/docker Layout

📸 Verified Output:

📸 Verified Output:

Directory
Contents

image/overlay2/

Image metadata, layer chain (imagedb, layerdb, content)

overlay2/

Actual layer data — each layer is a directory

containers/

Per-container config, logs, state (one dir per container ID)

network/

Network configuration (bridge, macvlan definitions)

volumes/

Named volume data

buildkit/

BuildKit cache for docker build

💡 You can safely delete all Docker state with systemctl stop docker && rm -rf /var/lib/docker — but you'll lose all images, containers, and volumes!


Step 3: overlay2 — Union Filesystem Layers

overlay2 stacks read-only layers (from the image) with a read-write layer (the container):

📸 Verified Output:

📸 Verified Output:

📸 Verified Output:

📸 Verified Output:

💡 This is copy-on-write (CoW). When a container modifies a file from the image layer, the kernel copies the original to UpperDir first, then modifies the copy. The original image layer is never touched.


Step 4: Inspect Container Internals with docker inspect

📸 Verified Output (key fields):

📸 Verified Output:

💡 NanoCpus: 500000000 = 0.5 CPUs. Docker uses nanosecond CPU units internally, which map to cpu.max = "50000 100000" in the cgroup.


Step 5: Container Networking — veth Pairs and the docker0 Bridge

📸 Verified Output:

📸 Verified Output:

The @if14 and @if13 notation shows the peer interface index — they form a virtual Ethernet cable between the host bridge and the container namespace.

💡 Interface index binding: In the output above, interface 13 (container's eth0) is the peer of interface 14 (host's veth3a4b5c). When a packet leaves the container, it travels through this virtual cable to the bridge, then to the host's network.


Step 6: Image Layers and Multi-Stage Build Internals

📸 Verified Output:

📸 Verified Output:

💡 Cache invalidation: Any change to a layer invalidates all subsequent layers. This is why you should: (1) put COPY after RUN apt-get install, and (2) use --mount=type=cache in BuildKit for package manager caches.


Step 7: What runc Actually Does — OCI Bundle

📸 Verified Output:

💡 containerd-shim: After runc starts the container and exits, the containerd-shim-runc-v2 process stays alive to: (1) keep stdin/stdout pipes open, (2) report exit status, and (3) allow containerd to restart without killing containers.


Step 8: Capstone — Trace a docker run from syscall to process

Scenario: A junior engineer asks: "What EXACTLY happens when I run docker run nginx?" Trace the complete path.

📸 Verified Output:


Summary

Component
Technology
Purpose

Process isolation

PID + UTS + IPC namespaces

Container has own PID tree, hostname, IPC

Filesystem

overlay2 (OverlayFS)

Image layers + CoW writable layer

Networking

veth pair + bridge (docker0)

Virtual network cable to bridge

Resource limits

cgroup v2 (memory.max, cpu.max)

Enforce --memory, --cpus

Low-level runtime

runc (OCI)

Clone syscalls, pivot_root, exec

Mid-level runtime

containerd + shim

Lifecycle, image, snapshots

High-level API

dockerd

REST API, image registry, UX

Image format

OCI Image Spec

Manifest + config + layers (tar.gz)

Runtime spec

OCI Runtime Spec (config.json)

Defines namespaces, mounts, caps

Key insight: docker run = pull image → unpack layers → generate config.json → call containerd → containerd calls runc → runc calls clone() + execve(). Everything else is plumbing.

Last updated