Lab 13: Elasticsearch & Kibana — Log Indexing

Time: 45 minutes | Level: Architect | Docker: docker run -it --rm ubuntu:22.04 bash

Overview

Elasticsearch is a distributed, RESTful search and analytics engine built on Apache Lucene. It stores JSON documents in indices, distributes them across shards, and enables full-text search via analyzers. Kibana is the visualization UI for the Elastic Stack. This lab covers installation, configuration, index mappings, Query DSL, and Index Lifecycle Management (ILM).

Architecture

┌─────────────────────────────────────────────────────────────┐
│                   Elasticsearch Cluster                      │
│                                                             │
│  ┌─────────────────────────────────────────────────────┐   │
│  │  Index: logs-2024.01.01                             │   │
│  │  ┌──────────────┐  ┌──────────────┐  ┌──────────┐  │   │
│  │  │  Primary     │  │  Primary     │  │ Primary  │  │   │
│  │  │  Shard 0     │  │  Shard 1     │  │ Shard 2  │  │   │
│  │  └──────┬───────┘  └──────┬───────┘  └────┬─────┘  │   │
│  │         │                 │               │         │   │
│  │  ┌──────▼───────┐  ┌──────▼───────┐  ┌────▼─────┐  │   │
│  │  │  Replica     │  │  Replica     │  │ Replica  │  │   │
│  │  │  Shard 0     │  │  Shard 1     │  │ Shard 2  │  │   │
│  │  └──────────────┘  └──────────────┘  └──────────┘  │   │
│  └─────────────────────────────────────────────────────┘   │
│                                                             │
│  Node 1 (master+data)  Node 2 (data)  Node 3 (data)        │
│                                                             │
│  Client → Elasticsearch REST API :9200                      │
│  Kibana → Elasticsearch :9200 → Browser :5601              │
└─────────────────────────────────────────────────────────────┘

Step 1: Add Elasticsearch Repository and Check Package

📸 Verified Output:

💡 Elasticsearch 8.x enables security by default (TLS + basic auth). For development/testing, add xpack.security.enabled: false to elasticsearch.yml. Never disable security in production — use the auto-generated elastic user password from first startup.


Step 2: Configure elasticsearch.yml

📸 Verified Output:

💡 JVM heap is the most critical tuning parameter. Set -Xms and -Xmx to the same value to prevent heap resizing. Elasticsearch is memory-hungry: leave 50% of RAM for the OS page cache (Lucene uses OS page cache heavily for segment reads).


Step 3: _cat APIs for Cluster Inspection

📸 Verified Output:

💡 Cluster status: green = all primary + replica shards assigned; yellow = primaries assigned, some replicas missing (single-node is always yellow unless number_of_replicas: 0); red = unassigned primaries — data may be unavailable. Monitor with GET /_cluster/health.


Step 4: Index Mappings

📸 Verified Output:

💡 Use "dynamic": "strict" in production to reject documents with unmapped fields — this prevents mapping explosions (ES has a default limit of 1000 fields per index). Use "dynamic": "true" only for exploration. Add "fields": {"keyword": {...}} to text fields you need to both search and aggregate.


Step 5: Query DSL

📸 Verified Output:

💡 Use filter context (inside bool.filter) instead of must for conditions that don't affect scoring (date ranges, exact matches). Filter results are cached, making them much faster for repeated queries. must affects the relevance _score — use it for full-text search where ranking matters.


Step 6: Index Lifecycle Management (ILM)

📸 Verified Output:

💡 ILM requires Data Streams or aliases with is_write_index: true. The rollover action creates a new index (logs-000002) when conditions are met, updating the alias to point to the new write index. Use GET /<index>/_ilm/explain to check current ILM phase and any errors.


Step 7: Kibana Configuration

📸 Verified Output:

💡 In Kibana 8.x, Data Views replaced Index Patterns. A Data View maps to one or more indices via a wildcard pattern and designates the @timestamp field. Create Data Views via Kibana UI → Stack Management → Data Views, or use the Saved Objects API.


Step 8: Capstone — Production Log Indexing Setup

Scenario: Configure a 3-node Elasticsearch cluster to index application logs with a proper mapping, ILM policy, and Kibana data view for a production e-commerce platform.

📸 Verified Output:

💡 The most common production mistake: not setting vm.max_map_count=262144 on the host. Elasticsearch requires this for memory-mapped files (Lucene segments). Add to /etc/sysctl.conf: vm.max_map_count=262144 and run sysctl -p. Docker users: set on the host, not inside the container.


Summary

Concept
Key Details

Indices

Logical namespace for documents; wildcards (logs-*) span multiple

Shards

Primary (write) + Replica (read/HA); set at index creation, immutable

Mappings

Field type definitions; dynamic: strict prevents unmapped fields

Analyzers

standard: tokenize + lowercase + filter; used for text fields only

_cat APIs

/_cat/health, /_cat/nodes, /_cat/indices, /_cat/shards

Query DSL

match (full-text), term (exact), range, bool (compound)

Aggregations

terms, date_histogram, avg, percentiles — analytics on top of search

ILM

hot→warm→cold→delete; rollover at size/age thresholds

elasticsearch.yml

cluster.name, network.host, discovery.type, path.data/logs

kibana.yml

elasticsearch.hosts (list for HA), server.publicBaseUrl

ES 8.x latest

Version 8.19.12 (from elastic apt repo as of 2025)

JVM tuning

-Xms = -Xmx = 50% RAM; max 31GB for compressed OOPs

Last updated