Lab 14: Logstash & Filebeat — Log Pipeline

Time: 45 minutes | Level: Architect | Docker: docker run -it --rm ubuntu:22.04 bash

Overview

Filebeat is a lightweight log shipper that tails files and forwards events to Logstash or Elasticsearch. Logstash is a data processing pipeline: it ingests from inputs, transforms via filters (grok, mutate, date, dissect), and ships to outputs. Together they form the Beats → Logstash → Elasticsearch pipeline — the "L" and "B" in the ELK/Elastic Stack.

Architecture

┌──────────────────────────────────────────────────────────────────┐
│                  Log Shipping Pipeline                            │
│                                                                  │
│  Application Servers:                                            │
│  ┌─────────────────────────────────────────────┐                │
│  │  /var/log/nginx/access.log ──► Filebeat      │                │
│  │  /var/log/app/app.log      ──► (port 5044)   │                │
│  │  /var/log/syslog           ──►               │                │
│  └─────────────────────────┬───────────────────┘                │
│                             │  Beats protocol (TLS)              │
│                             ▼                                    │
│  ┌──────────────────────────────────────────────┐               │
│  │           Logstash :5044 (Beats input)        │               │
│  │  ┌──────────────────────────────────────┐    │               │
│  │  │  FILTER pipeline:                    │    │               │
│  │  │  1. grok  → parse raw log line       │    │               │
│  │  │  2. date  → parse timestamp field    │    │               │
│  │  │  3. mutate → rename/add/remove fields│    │               │
│  │  │  4. geoip → enrich IP with location  │    │               │
│  │  └──────────────────────────────────────┘    │               │
│  │  Output: Elasticsearch :9200                  │               │
│  └──────────────────────────────────────────────┘               │
│                             │                                    │
│                             ▼                                    │
│  ┌──────────────────────────────────────────────┐               │
│  │     Elasticsearch :9200 (index: logs-*)       │               │
│  └──────────────────────────────────────────────┘               │
└──────────────────────────────────────────────────────────────────┘

Step 1: Verify Filebeat and Logstash Package Availability

📸 Verified Output:

💡 Logstash requires JDK 17+. The Elastic apt package bundles a JDK, so no separate Java installation is needed. Filebeat is written in Go — it's a single static binary with no runtime dependencies and only ~50MB.


Step 2: Configure filebeat.yml

📸 Verified Output:

💡 fields_under_root: true places custom fields at the top level of the event (e.g., log_type: nginx_access) instead of under a fields: namespace. Use json.keys_under_root: true for apps that emit structured JSON logs — this merges JSON fields directly into the event.


Step 3: Logstash Pipeline — Input and Filter

📸 Verified Output:

💡 Use tag_on_failure => ['_grokparsefailure_nginx'] to mark events that fail grok parsing instead of dropping them. Route events with _grokparsefailure tag to a parse-failures index for investigation. Always test grok patterns with logstash -t (config test) before deploying.


Step 4: Real Log Parsing Verification (Python grok simulation)

📸 Verified Output:

💡 This Python regex simulates %{COMBINEDAPACHELOG} grok parsing. The auth field shows - (anonymous) for Log 2 — note that grok captures this as the literal string "-", so filter with if [auth] != "-" to check for authenticated requests. Real Logstash uses the same underlying Oniguruma regex engine.


Step 5: Grok Pattern Reference

📸 Verified Output:

💡 Dissect vs Grok: Use dissect when the log format has fixed delimiters (string splitting, no regex — 10-100x faster). Use grok for variable-format logs requiring regex. For high-throughput pipelines (>10K EPS), use dissect on fixed fields + grok only on variable parts.


Step 6: Logstash Configuration Files

📸 Verified Output:

💡 Enable the Dead Letter Queue (DLQ) to capture events that Logstash cannot process (serialization errors, mapping conflicts). Use logstash -e "input { dead_letter_queue { path => '/var/lib/logstash/dlq' } } output { stdout { codec => rubydebug } }" to inspect failed events.


Step 7: Filebeat Modules

📸 Verified Output:

💡 filebeat setup creates Elasticsearch index templates, ILM policies, and Kibana dashboards in one command. With var.pipeline: with_rcs, Filebeat uses Elasticsearch Ingest Node pipelines instead of Logstash — simpler for basic log parsing, but less flexible than Logstash filters.


Step 8: Capstone — Production Pipeline Validation

Scenario: Validate the full Filebeat → Logstash → Elasticsearch pipeline configuration for a production nginx fleet. Verify all config files are syntactically correct and the parsing logic handles real log lines.

Lines parsed: 3/3 Log 1: clientip=10.0.1.1 auth=user1 Log 2: clientip=10.0.1.2 auth=user2 Log 3: clientip=10.0.1.3 auth=user3

Production readiness checklist: [x] filebeat.yml: inputs with log_type field, processors, output.logstash [x] logstash pipeline: input(beats:5044), filter(grok+date+mutate), output(elasticsearch) [x] logstash.yml: queue.type=persisted, dead_letter_queue enabled [x] pipelines.yml: multiple pipelines with dedicated workers [x] grok pattern: %{COMBINEDAPACHELOG} for nginx, custom for app logs [x] date filter: parse timestamp to @timestamp (required for time-based indices) [x] mutate rename: normalize field names to ECS (Elastic Common Schema) [x] index pattern: logs-{log_type}-{+yyyy.MM.dd} for ILM compatibility [x] filebeat modules: nginx and system modules enabled [x] filebeat setup: templates and dashboards provisioned

Last updated