Skip to content
Yasser Alattas | MLOps, Kubernetes & SRE Blog
Go back

ETL Strategies and Where Kafka Fits

ETL is not one architecture. It is a set of trade-offs around latency, ownership, operational cost, and how fresh the data needs to be.

The mistake I see often is treating every data movement problem the same way. Some workloads need a nightly batch. Some need near real-time events. Some only need clean, reliable extracts with clear ownership.

Quick comparison

StrategyBest forTrade-off
Batch ETLReports, finance, historical analysisSimple and cheap, but delayed
ELTAnalytics warehouses and lakehousesFlexible, but pushes complexity into SQL/dbt layers
CDCDatabase change replicationLow impact on services, but schema changes need discipline
Streaming ETLReal-time workflows, fraud, notifications, operationsFast and scalable, but operationally more complex
Reverse ETLSending modeled data back to business toolsUseful for activation, but easy to create hidden coupling

There is no universal winner. The right choice depends on the SLA.

If the business can wait until tomorrow, batch is usually enough. If downstream systems need to react within seconds, streaming becomes part of the core architecture.

Where Kafka contributes

Kafka is useful when data movement becomes a platform concern, not just a pipeline concern.

Instead of connecting every producer directly to every consumer, Kafka provides a durable event backbone:

This changes the architecture from tightly coupled pipelines to event-driven data flows.

For example, an order service can publish order.created. Analytics, notifications, fraud checks, inventory, and ML feature pipelines can all consume the same event without the order service knowing about each consumer.

That separation is the real value.

Kafka is not only about speed. It is about decoupling, replayability, and operational control.

Where Strimzi fits

If Kafka runs on Kubernetes, Strimzi is a strong provider for managing it.

Strimzi gives Kafka a Kubernetes-native operating model:

This matters because Kafka is not a simple stateless workload. Broker storage, networking, certificates, users, topic configuration, and rolling upgrades all need careful handling.

Strimzi does not remove the need to understand Kafka. It gives teams a better control plane for running Kafka consistently on Kubernetes.

Practical architecture direction

A practical data architecture usually combines multiple strategies:

  1. Use batch or ELT for heavy analytical workloads.
  2. Use CDC when database changes need to be replicated reliably.
  3. Use Kafka for events that multiple systems need to consume independently.
  4. Use streaming ETL only where latency justifies the operational cost.
  5. Use Strimzi when Kafka is part of the Kubernetes platform and needs GitOps-friendly operations.

The goal is not to make everything real-time.

The goal is to put each data flow on the right path: simple where possible, event-driven where valuable, and operationally controlled where critical.


Share this post on:

Next Post
Imposter Syndrome at Work: When You Look Fine on the Outside but Doubt Yourself Inside