Apache Kafka in Five Minutes

What is Apache Kafka?

  • Framework for building data pipelines and stream-based applications
  • Fault tolerant, resilient
  • Very high throughput
  • Horizontal scalable
  • Integrates well with Big Data frameworks like Apache Flink or Apache Spark
  • Apache project ⇒ Apache license (i.e. OS software)

Common use cases

  • Messaging systems (e.g. loosed coupled microservices communication)
  • Gathering metrics from different locations (e.g. IoT)
  • Collecting application logs
  • Stream processing / transformation

Components

Inside the cluster

Logs

  • Each partition / replica = transactional log
  • Data in log is immutable
  • Each message in log gets unique id (offset)
  • Offsets are per partition
  • Message order guarantee within partition
  • Data is temporarily kept (thus messages are replayable)