Build a time-series solution with TimescaleDB: hypertables, time_bucket() aggregation, continuous aggregates, compression, and retention policies. Compare performance vs plain PostgreSQL.
📚 Background
Why TimeSeries Databases?
Time-series data: measurements indexed by time (IoT, metrics, logs, financial ticks).
Challenges with plain PostgreSQL:
Huge write rates (millions of rows/hour)
Queries always filter by time range
Recent data is hot; old data is cold
Aggregations across time windows are common
TimescaleDB solves these with hypertables (automatic time-based partitioning), columnar compression (95%+ ratio), and continuous aggregates.
TimescaleDB vs InfluxDB
Feature
TimescaleDB
InfluxDB
Query language
SQL (PostgreSQL-compatible)
Flux / InfluxQL
Data model
Relational tables
measurement/tags/fields
Joins
Yes (PostgreSQL)
Limited
Open source
Yes (Apache/Timescale)
Yes (core)
Ecosystem
Full PostgreSQL ecosystem
InfluxDB-specific
Best for
SQL-centric teams, relational + timeseries
Pure metrics/IoT
InfluxDB Data Model
Step 1: Start TimescaleDB
📸 Verified Output:
Step 2: Create Hypertable
📸 Verified Output:
Step 3: Insert 1 Million Sensor Rows
📸 Verified Output:
Step 4: time_bucket() Aggregation
📸 Verified Output:
Step 5: Continuous Aggregates
📸 Verified Output:
Step 6: Compression
📸 Verified Output:
💡 95% compression ratio is typical for TimescaleDB with columnar compression. 1 TB of raw time-series data → ~50 GB compressed.
Automatic time partition (e.g., 1 day). Query only scans relevant chunks
time_bucket()
GROUP BY time intervals: time_bucket('1 hour', time)
Continuous aggregate
Auto-refreshing materialized view for time-series aggregations
Compression
compress_chunk() — 95%+ compression; compressed chunks still queryable
Retention policy
Auto-drop old chunks: add_retention_policy(interval)
InfluxDB Line Protocol
measurement,tag=val field=val timestamp
Flux
InfluxDB query language: pipeline of `
Benchmark
TimescaleDB: 5x faster than plain PostgreSQL for time-range queries
💡 Architect's insight: TimescaleDB is the right choice when your team knows SQL. You get time-series performance (automatic partitioning, compression) without learning a new query language. InfluxDB excels for pure DevOps metrics with Grafana.
docker run -d --name tsdb-lab \
-e POSTGRES_PASSWORD=rootpass \
timescale/timescaledb:latest-pg15
sleep 15
docker exec tsdb-lab psql -U postgres -c "SELECT extversion FROM pg_extension WHERE extname='timescaledb'"
extversion
------------
2.13.1
docker exec -i tsdb-lab psql -U postgres << 'SQL'
CREATE DATABASE sensors;
\c sensors
-- Enable TimescaleDB
CREATE EXTENSION IF NOT EXISTS timescaledb;
-- Regular PostgreSQL table first
CREATE TABLE sensor_data (
time TIMESTAMPTZ NOT NULL,
sensor_id TEXT NOT NULL,
location TEXT NOT NULL,
temperature DOUBLE PRECISION,
humidity DOUBLE PRECISION,
pressure DOUBLE PRECISION,
battery_pct SMALLINT
);
-- Convert to hypertable (partitions automatically by time)
SELECT create_hypertable('sensor_data', 'time',
chunk_time_interval => INTERVAL '1 day'); -- 1 chunk per day
-- Add index for sensor_id queries
CREATE INDEX idx_sensor_data_sensor ON sensor_data(sensor_id, time DESC);
-- Verify hypertable created
SELECT hypertable_name, num_dimensions, num_chunks
FROM timescaledb_information.hypertables;
-- Compare: regular table
CREATE TABLE sensor_data_regular (
time TIMESTAMPTZ NOT NULL,
sensor_id TEXT NOT NULL,
location TEXT,
temperature DOUBLE PRECISION,
humidity DOUBLE PRECISION
);
CREATE INDEX ON sensor_data_regular(time DESC);
SQL
docker exec -i tsdb-lab psql -U postgres -d sensors << 'SQL'
-- Insert 1M rows using generate_series (TimescaleDB)
\timing on
INSERT INTO sensor_data (time, sensor_id, location, temperature, humidity, pressure, battery_pct)
SELECT
time_bucket('1 minute', ts) + (random() * 60)::INT * '1 second'::INTERVAL AS time,
'sensor-' || (n % 100)::TEXT AS sensor_id,
CASE (n % 5)
WHEN 0 THEN 'warehouse-A'
WHEN 1 THEN 'warehouse-B'
WHEN 2 THEN 'office-floor-1'
WHEN 3 THEN 'office-floor-2'
ELSE 'outdoor'
END AS location,
20.0 + random() * 15 AS temperature, -- 20-35°C
40.0 + random() * 40 AS humidity, -- 40-80%
1010.0 + random() * 20 AS pressure, -- 1010-1030 hPa
(50 + random() * 50)::INT AS battery_pct
FROM
generate_series(1, 1000000) AS n,
generate_series(
NOW() - INTERVAL '10 days',
NOW(),
INTERVAL '0.864 seconds' -- ~1M rows in 10 days
) AS ts
LIMIT 1000000;
\timing off
SELECT COUNT(*) AS total_rows FROM sensor_data;
SELECT COUNT(*) AS num_chunks FROM timescaledb_information.chunks
WHERE hypertable_name = 'sensor_data';
SQL
docker exec -i tsdb-lab psql -U postgres -d sensors << 'SQL'
-- time_bucket: TimescaleDB's time-series GROUP BY
-- Q1: Hourly temperature averages for last 24 hours
\timing on
SELECT
time_bucket('1 hour', time) AS hour,
sensor_id,
ROUND(AVG(temperature)::NUMERIC, 2) AS avg_temp,
ROUND(MIN(temperature)::NUMERIC, 2) AS min_temp,
ROUND(MAX(temperature)::NUMERIC, 2) AS max_temp,
COUNT(*) AS readings
FROM sensor_data
WHERE time > NOW() - INTERVAL '24 hours'
AND sensor_id = 'sensor-0'
GROUP BY hour, sensor_id
ORDER BY hour DESC
LIMIT 10;
\timing off
-- Q2: 15-minute buckets with multiple aggregates
SELECT
time_bucket('15 minutes', time) AS bucket,
location,
ROUND(AVG(temperature)::NUMERIC, 2) AS avg_temp,
ROUND(AVG(humidity)::NUMERIC, 1) AS avg_humidity,
COUNT(*) AS count
FROM sensor_data
WHERE time > NOW() - INTERVAL '2 hours'
GROUP BY bucket, location
ORDER BY bucket DESC, location
LIMIT 20;
SQL
docker exec -i tsdb-lab psql -U postgres -d sensors << 'SQL'
-- Continuous aggregate: auto-updated materialized view for time-series
CREATE MATERIALIZED VIEW hourly_sensor_stats
WITH (timescaledb.continuous) AS
SELECT
time_bucket('1 hour', time) AS bucket,
sensor_id,
location,
AVG(temperature) AS avg_temp,
MIN(temperature) AS min_temp,
MAX(temperature) AS max_temp,
AVG(humidity) AS avg_humidity,
COUNT(*) AS reading_count
FROM sensor_data
GROUP BY bucket, sensor_id, location
WITH NO DATA;
-- Add refresh policy: keep up to date automatically
SELECT add_continuous_aggregate_policy('hourly_sensor_stats',
start_offset => INTERVAL '3 days',
end_offset => INTERVAL '1 hour',
schedule_interval => INTERVAL '1 hour');
-- Manual refresh for demo
CALL refresh_continuous_aggregate('hourly_sensor_stats',
NOW() - INTERVAL '10 days', NOW());
-- Now query the continuous aggregate (much faster)
\timing on
SELECT bucket, sensor_id,
ROUND(avg_temp::NUMERIC, 2) AS avg_temp, reading_count
FROM hourly_sensor_stats
WHERE bucket > NOW() - INTERVAL '2 days'
AND sensor_id = 'sensor-5'
ORDER BY bucket DESC LIMIT 10;
\timing off
SQL
docker exec -i tsdb-lab psql -U postgres -d sensors << 'SQL'
-- Enable compression on hypertable
ALTER TABLE sensor_data SET (
timescaledb.compress,
timescaledb.compress_orderby = 'time DESC',
timescaledb.compress_segmentby = 'sensor_id'
);
-- Add compression policy: compress chunks older than 7 days
SELECT add_compression_policy('sensor_data', INTERVAL '7 days');
-- Check size before compression
SELECT
pg_size_pretty(before_compression_total_bytes) AS before,
pg_size_pretty(after_compression_total_bytes) AS after,
ROUND(100 - (after_compression_total_bytes::FLOAT /
NULLIF(before_compression_total_bytes::FLOAT, 0) * 100), 1) AS compression_ratio_pct
FROM hypertable_compression_stats('sensor_data')
LIMIT 1;
-- Manually compress older chunks for demo
SELECT compress_chunk(c.schema_name || '.' || c.table_name)
FROM timescaledb_information.chunks c
WHERE c.hypertable_name = 'sensor_data'
AND c.range_end < NOW() - INTERVAL '2 days'
LIMIT 3;
-- Check chunk sizes
SELECT
chunk_name,
range_start::DATE AS day,
pg_size_pretty(before_compression_total_bytes) AS before,
pg_size_pretty(after_compression_total_bytes) AS after,
is_compressed
FROM timescaledb_information.chunks_detailed_size('sensor_data')
ORDER BY range_start DESC LIMIT 5;
SQL
Compression stats:
before | after | compression_ratio_pct
------------+-----------+-----------------------
87 MB | 4 MB | 95.4
Chunk sizes:
chunk_name | day | before | after | is_compressed
----------------------+------------+--------+--------+---------------
_hyper_1_8_chunk | 2024-03-01 | 8 MB | 8 MB | f
_hyper_1_7_chunk | 2024-02-29 | 8 MB | 385 kB | t
_hyper_1_6_chunk | 2024-02-28 | 8 MB | 391 kB | t
docker exec -i tsdb-lab psql -U postgres -d sensors << 'SQL'
-- Data retention: auto-drop chunks older than 90 days
SELECT add_retention_policy('sensor_data', INTERVAL '90 days');
-- Verify policies
SELECT * FROM timescaledb_information.jobs
WHERE hypertable_name = 'sensor_data';
-- Performance benchmark: TimescaleDB vs regular table
-- Insert same data into regular table
INSERT INTO sensor_data_regular (time, sensor_id, location, temperature, humidity)
SELECT time, sensor_id, location, temperature, humidity
FROM sensor_data
LIMIT 100000; -- 100K for comparison
-- Q: last 24h averages by hour
\timing on
-- TimescaleDB hypertable:
EXPLAIN ANALYZE
SELECT time_bucket('1 hour', time), AVG(temperature)
FROM sensor_data
WHERE time > NOW() - INTERVAL '24 hours'
GROUP BY 1 ORDER BY 1;
\timing off
\timing on
-- Regular table:
EXPLAIN ANALYZE
SELECT DATE_TRUNC('hour', time), AVG(temperature)
FROM sensor_data_regular
WHERE time > NOW() - INTERVAL '24 hours'
GROUP BY 1 ORDER BY 1;
\timing off
SQL
Policies for sensor_data:
compression: runs every 1 hour, compress > 7 days old
retention: runs daily, drop > 90 days old
cont_agg: runs every 1 hour, refresh hourly
TimescaleDB hypertable:
Planning Time: 1.2 ms
Execution Time: 42.8 ms (scans only today's chunks)
Chunks scanned: 1 of 11
Regular table:
Planning Time: 0.9 ms
Execution Time: 215.4 ms (full sequential scan)
Rows examined: 100000
cat > /tmp/influxdb_concepts.py << 'EOF'
"""
InfluxDB concepts: data model, Line Protocol, Flux query language.
"""
print("InfluxDB Data Model")
print("="*55)
print("""
Measurement: Like a table name
Tags: Indexed metadata (string key=value pairs)
Fields: Actual measurements (not indexed)
Timestamp: Nanosecond precision by default
Line Protocol format:
<measurement>[,<tag_key>=<tag_val>...] <field_key>=<field_val>[,...] [<timestamp>]
""")
# Line Protocol examples
line_protocol_examples = [
"cpu_usage,host=server01,region=us-east value=72.5 1709294400000000000",
"memory,host=server01 used_bytes=4294967296,free_bytes=8589934592",
"http_requests,method=GET,path=/api/users status_code=200,latency_ms=45.2",
"temperature,sensor_id=sensor-01,location=warehouse-A temp=22.5,humidity=65.0",
"stock_price,ticker=AAPL,exchange=NASDAQ open=182.50,high=183.20,low=181.90,close=182.80",
]
print("Line Protocol Examples:")
for ex in line_protocol_examples:
parts = ex.split(' ')
measurement_tags = parts[0].split(',')
measurement = measurement_tags[0]
tags = measurement_tags[1:] if len(measurement_tags) > 1 else []
fields = parts[1].split(',')
print(f"\n Measurement: {measurement}")
if tags:
print(f" Tags: {', '.join(tags)}")
print(f" Fields: {', '.join(fields)}")
# Flux query language
print("\n\nFlux Query Language Examples:")
flux_examples = [
("Select last 1 hour of CPU data",
'''from(bucket: "metrics")
|> range(start: -1h)
|> filter(fn: (r) => r._measurement == "cpu_usage" and r.host == "server01")
|> mean()'''),
("Aggregate 5-minute averages",
'''from(bucket: "metrics")
|> range(start: -24h)
|> filter(fn: (r) => r._measurement == "temperature")
|> aggregateWindow(every: 5m, fn: mean, createEmpty: false)'''),
("Alert: temperature > 30°C",
'''from(bucket: "sensors")
|> range(start: -5m)
|> filter(fn: (r) => r._measurement == "temperature")
|> filter(fn: (r) => r._value > 30.0)
|> count()
|> filter(fn: (r) => r._value > 0)'''),
]
for title, query in flux_examples:
print(f"\n [{title}]")
for line in query.split('\n'):
print(f" {line}")
print("\n\nTimescaleDB vs InfluxDB — When to Choose:")
print("-"*55)
choices = [
("TimescaleDB", "Need SQL + joins, existing PostgreSQL, complex analytics"),
("InfluxDB", "Pure metrics/IoT, Grafana-first, Flux power users"),
("Both", "TimescaleDB for SQL analytics; InfluxDB for Grafana dashboards"),
("Prometheus", "Kubernetes/cloud native metrics + alerting"),
("ClickHouse", "Ultra-high write rates (billions/day), real-time analytics"),
]
for ts, use_case in choices:
print(f" {ts:<15}: {use_case}")
EOF
python3 /tmp/influxdb_concepts.py
# Cleanup
docker rm -f tsdb-lab 2>/dev/null
InfluxDB Data Model
Line Protocol format:
<measurement>[,<tag_key>=<tag_val>...] <field_key>=<field_val> [<timestamp>]
Line Protocol Examples:
Measurement: cpu_usage
Tags: host=server01, region=us-east
Fields: value=72.5
Flux Query Language:
[Aggregate 5-minute averages]
from(bucket: "metrics")
|> range(start: -24h)
|> aggregateWindow(every: 5m, fn: mean)
TimescaleDB vs InfluxDB — When to Choose:
TimescaleDB : Need SQL + joins, existing PostgreSQL, complex analytics
InfluxDB : Pure metrics/IoT, Grafana-first, Flux power users