Lab 13: Streams & Lambdas

Objective

Use Java Streams API for data processing — filter, map, reduce, collect, flatMap, and parallel streams — and write concise lambda expressions and method references.

Background

The Streams API (Java 8+) brings functional-style data processing to Java. A Stream is a lazy sequence of elements supporting aggregate operations. Streams enable declarative, pipeline-based data transformation that replaces verbose for-loop boilerplate. Combined with lambdas and method references, they are the most impactful Java 8 feature for day-to-day code.

Time

45 minutes

Prerequisites

  • Lab 08 (Interfaces — Functional Interfaces)

  • Lab 09 (Collections)

  • Lab 12 (Generics)

Tools

  • Java 21 (Eclipse Temurin)

  • Docker image: innozverse-java:latest


Lab Instructions

Step 1: Stream Pipeline Basics

💡 Streams are lazy — intermediate operations (filter, map) don't execute until a terminal operation (collect, forEach, count) is called. This enables short-circuit optimization: stream().filter(...).findFirst() stops at the first match without processing the rest.

📸 Verified Output:


Step 2: Lambdas & Method References

💡 Method references are shorthand for lambdas that just call a method. String::toUpperCase = s -> s.toUpperCase(). System.out::println = x -> System.out.println(x). They're more readable for simple cases and automatically adapt to any functional interface with the right signature.

📸 Verified Output:


Step 3: Collectors — Grouping & Partitioning

💡 Collectors.groupingBy(classifier, downstream) is incredibly powerful. The downstream collector can be counting(), summingDouble(), joining(), another groupingBy() (nested grouping), or mapping() + toList(). This replaces hundreds of lines of imperative grouping code with a single expression.

📸 Verified Output:


Step 4: flatMap & Optional

💡 flatMap on streams flattens Stream<Stream<T>> into Stream<T>. It's how you process nested collections in one pipeline. Optional is not a collection — it holds zero or one values. Never call .get() without .isPresent() first; use .orElse(), .orElseGet(), or .orElseThrow() instead.

📸 Verified Output:


Step 5: reduce & Custom Collectors

💡 Collector.of(supplier, accumulator, combiner, finisher) lets you build any collection strategy. The combiner is only used in parallel streams to merge partial results. Use custom collectors when built-in ones don't cover your aggregation needs (running stats, multi-pass aggregations, etc.).

📸 Verified Output:


Step 6: Parallel Streams

💡 Parallel streams split work across ForkJoinPool threads — they help when: (1) the dataset is large (100K+), (2) operations are CPU-intensive and independent, (3) order doesn't matter. They hurt for: small datasets (thread overhead dominates), I/O-bound operations, or when elements depend on each other. Never use parallel for forEach with shared mutable state.

📸 Verified Output:

(times vary by CPU; primes count is always 78498)


Step 7: Primitive Streams

💡 Primitive streams (IntStream, LongStream, DoubleStream) are faster than Stream<Integer> because they avoid boxing/unboxing. For numeric processing, always use mapToInt(), mapToDouble(), and their specialized collectors. The speedup is significant for large datasets.

📸 Verified Output:


Step 8: Complete Example — Sales Data Pipeline

💡 Nested groupingBy is idiomatic Java for pivot tables — group by region, then product, getting a Map<Region, Map<Product, Total>>. The entire pipeline is lazy, composable, and parallelizable with .parallel(). This replaces SQL GROUP BY in application-layer processing.

📸 Verified Output:

(values vary by random seed; structure is always consistent)


Verification

Summary

You've mastered Java Streams: pipeline construction, lambdas, method references, all major collectors, flatMap, Optional, reduce, parallel streams, primitive streams, and a complete sales data pipeline. Streams are the heart of modern Java — they make code shorter, more readable, and often faster.

Further Reading

Last updated