Lab 19: Elasticsearch Search

Time: 45 minutes | Level: Advanced | DB: Elasticsearch 8.11

Overview

Elasticsearch is a distributed search and analytics engine built on Apache Lucene. It excels at full-text search, aggregations, and log analytics. This lab covers index creation, document indexing, the Query DSL, aggregations, and analyzers.


Step 1: Launch Elasticsearch

docker run -d --name es-lab \
  -e "discovery.type=single-node" \
  -e "xpack.security.enabled=false" \
  -e "ES_JAVA_OPTS=-Xms512m -Xmx512m" \
  -p 9200:9200 \
  elasticsearch:8.11.0

echo "Waiting for Elasticsearch (30-60 seconds)..."
for i in $(seq 1 30); do
  curl -s http://localhost:9200/_cluster/health 2>/dev/null | grep -q '"status"' && break || sleep 3
done

echo "Elasticsearch ready!"
curl -s http://localhost:9200/_cluster/health | python3 -m json.tool

📸 Verified Output:


Step 2: Create Index with Explicit Mapping

📸 Verified Output:

💡 Explicit mapping is preferred over dynamic mapping. Without it, Elasticsearch guesses types — "price": "9.99" might map as text instead of float.


Step 3: Index Documents

📸 Verified Output:


Step 4: Query DSL — match, term, range

📸 Verified Output:


Step 5: Boolean Queries — must, should, filter

📸 Verified Output:

💡 must = AND (affects score), filter = AND (no score, cached), should = OR (boosts score), must_not = NOT. Use filter for exact conditions (dates, booleans, numbers) — it's faster and cached.


Step 6: Aggregations

📸 Verified Output:


Step 7: _explain and Analyzers

📸 Verified Output:


Step 8: Capstone — Date Histogram Aggregation

📸 Verified Output:


Summary

Feature
ES Concept
Example

Index

Collection of documents

PUT /products

Mapping

Schema definition

mappings.properties

text

Full-text searchable

Analyzed, tokenized

keyword

Exact match

term, terms query

match

Full-text search

Analyzes query string

term

Exact value match

No analysis

range

Numeric/date ranges

gte, lte

bool

Combine queries

must, filter, should, must_not

aggs

Aggregation framework

terms, avg, histogram

_explain

Why document matched

Score breakdown

_analyze

Test analyzers

Token inspection

Key Takeaways

  • filter context is faster than must for exact conditions — results are cached

  • text vs keyword: text = analyzed/searchable; keyword = exact match/aggregations

  • Analyzers determine how text is tokenized — custom analyzers for product/content search

  • _explain shows scoring breakdown — essential for debugging relevance issues

  • date_histogram + terms aggregations = powerful analytics without writing any ETL

Last updated