Skip to content

Streaming Data with Kinesis and MSK — SA Quick Reference

What It Is

A digital "shock absorber" that processes data as it happens instead of waiting for scheduled batches. It sits between your data sources and your databases to prevent traffic spikes from crashing your systems.

Why Customers Care

  • Real-time visibility: Power instant dashboards and fraud detection rather than waiting hours for updates.
  • System Resilience: Protects downstream databases from crashing during sudden surges in IoT or application traffic.
  • Reduced Operational Burden: Move away from managing complex, high-maintenance data pipelines.

Key Differentiators vs Alternatives

  • AWS-Native Serverless (Kinesis): Extreme ease of use with near-zero infrastructure management for standard workloads.
  • Kafka Compatibility (MSK): Seamlessly migrate existing Kafka ecosystems to a managed service without rewriting code.
  • Zero-ETL Delivery (Firehose): Automatically transforms and delivers data directly to S3 or Redash with minimal configuration.

When to Recommend It

Recommend when a customer is moving from "Batch" to "Real-Time." Look for signals like high-frequency IoT sensors, microservices needing decoupling, or "the database crashes during peak hours." It’s ideal for companies maturing from simple data storage to event-driven architectures.

Top 3 Objections & Responses

"Kinesis is too expensive for our high-volume traffic." → We can use "Provisioned Mode" to lock in costs for predictable workloads, ensuring you only pay for the capacity you actually need.

"We are already heavily invested in the Apache Kafka ecosystem." → Amazon MSK is designed specifically for you; you keep your existing plugins and code, but AWS handles the patching and hardware management.

"Won't adding another layer increase our latency?" → Kinesis and MSK are built for millisecond-scale processing; the "buffer" actually prevents downstream latency caused by overloaded databases.

5 Things to Know Before the Call

  1. Watch the Partition Key: Using a low-cardinality key (like RegionID) creates "Hot Shards" that overwhelm single units of capacity.
  2. Kinesis Streams vs. Firehose: Streams are for processing data; Firehose is for delivering data to destinations like S3.
  3. The "Shock Absorber" Concept: Always frame these services as a way to decouple producers from consumers to ensure stability.
  4. Automation via Lambda: Kinesis Firehose can trigger Lambda to transform raw JSON into efficient formats like Parquet on the fly.
  5. CDC is a Use Case: Mention AWS DMS if the customer wants to stream real-time changes from their existing databases into their analytics engine.

Competitive Snapshot

vs AWS Advantage
Self-Managed Kafka (On-Prem) MSK removes the massive operational overhead of patching, scaling, and managing hardware.
Traditional Batch ETL Kinesis enables "Zero-ETL" patterns, moving data from source to analytics in seconds, not hours.

Source: Streaming Data with Kinesis and MSK course section