Apache Kafka CheatSheet

Topics

List existing topics 

On HDP (Hortonworks)

Before running every Kafka client script on HDP you will need to go inside bin directory

cd /usr/hdp/current/kafka-broker/bin/
./kafka-topics.sh --zookeeper $(hostname -f):2181 --list

On CDH (Cloudera)

kafka-topics --zookeeper $(hostname -f):2181 --list

Create a new topic 

On HDP (Hortonworks)

./kafka-topics.sh --create --zookeeper $(hostname -f):2181 --replication-factor 2 --partitions 4 --topic benchmark

On CDH (Cloudera)

kafka-topics --create --zookeeper $(hostname -f):2181 --replication-factor 2 --partitions 4 --topic benchmark

Describe a topic 

On HDP (Hortonworks)

./kafka-topics.sh --describe --zookeeper $(hostname -f):2181 --topic benchmark

On CDH (Cloudera)

kafka-topics.sh --describe --zookeeper $(hostname -f):2181 --topic benchmark

Describe all topics

On HDP (Hortonworks)

./kafka-topics.sh --describe --zookeeper $(hostname -f):2181

On CDH (Cloudera)

kafka-topics --describe --zookeeper $(hostname -f):2181

Delete a topic 

On HDP (Hortonworks)

./kafka-topics.sh --zookeeper $(hostname -f):2181 --delete --topic benchmark

On CDH (Cloudera)

kafka-topics --zookeeper $(hostname -f):2181 --delete --topic benchmark2

Add a new partition 

On HDP (Hortonworks)

./kafka-topics.sh --alter --zookeeper $(hostname -f):2181 --topic benchmark --partitions 4

On CDH (Cloudera)

kafka-topics --alter --zookeeper $(hostname -f):2181 --topic benchmark --partitions 5

WARNING: If partitions are increased for a topic that has a key, the partition logic or ordering of the messages will be affected

Show under replicated Partitions for topics 

On HDP (Hortonworks)

./kafka-topics.sh --zookeeper $(hostname -f):2181 --describe --under-replicated-partitions

On CDH (Cloudera)

kafka-topics --zookeeper $(hostname -f):2181 --describe --under-replicated-partitions

Delete content of a topic

On HDP (Hortonworks)

./kafka-topics.sh --zookeeper $(hostname -f):2181 --alter --topic benchmark --config retention.ms=1000

On CDH (Cloudera)

kafka-topics --zookeeper $(hostname -f):2181 --describe --under-replicated-partitions

Producers

Produce messages standard input 

On HDP (Hortonworks)

./kafka-console-producer.sh --broker-list $(hostname -f):6667 --topic benchmark

On CDH (Cloudera)

kafka-console-producer --broker-list $(hostname -f):9092 --topic benchmark

Produce messages file

On HDP (Cloudera)

./kafka-console-producer.sh --broker-list $(hostname -f):6667 --topic benchmark < message.txt

On CDH (Cloudera)

kafka-console-producer --broker-list $(hostname -f):9092 --topic benchmark < message.txt

Kerberos

export KAFKA_CLIENT_KERBEROS_PARAMS="-Djava.security.auth.login.config=/Path/to/Jaas/File" 
./kafka-console-producer.sh --broker-list $(hostname -f):6667 --topic benchmark --producer.config /tmp/producer.config

My /tmp/producer.config file will have below data in it

security.protocol=SASL_PLAINTEXT

Consumers

Start a consumer from beginning of the log

On HDP (Hortonworks)

./kafka-console-consumer.sh --bootstrap-server $(hostname -f):6667 --topic benchmark --from-beginning

On CDH (Cloudera)

kafka-console-consumer --bootstrap-server $(hostname -f):9092 --topic benchmark --from-beginning

Consume 1 message 

./kafka-console-consumer.sh --bootstrap-server $(hostname -f):6667 --topic benchmark  --max-messages 1

To consume data at particular offset

./kafka-console-consumer.sh --bootstrap-server <BROKER_HOST:PORT> --topic <TOPIC-NAME> --partition <partition_number> --offset <offset> 

Below command can be used in Kafka 1.0 or later for group

./kafka-console-consumer.sh --bootstrap-server <BROKER_HOST:PORT> --topic <TOPIC-NAME> --partition <partition_number> --offset <offset> --group <group-name> 

Offsets

To get latest offset of topic

bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list $(hostname -f)6667 --topic <topic-name> --time -1

To get the oldest offset of topic

bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list $(hostname -f):6667 --topic <topic-name> --time -2

Properties

In older version of Kafka (HDP 2.6.4) this property was set to true but in newer version it is set to false

unclean.leader.election.enable=true

Authentication and Authorization

security.protocolUser AuthenticationAuthorizationEncryption
PLAINTEXTNONONO
SSLNOHost BasedYES
SASL_PLAINTEXTPLAIN | KRB5 | SCRAMACL / RangerNO
SASL_SSLPLAIN | KRB5 | SCRAMACL / RangerYES

Kafka Upgrade

inter.broker.protocol.version = current_kafka_version
log.message.format.version = current_kafka_version

Kafka Performance Tuning

Tuning Producers

batch.size 
linger.ms

Tuning Consumers

Consumer group should have a number of consumers equal to the number of topic partitions

Handling large messages

message.max.bytes
log.segment.bytes
replica.fetch.max.bytes
max.partition.fetch.bytes
fetch.max.bytes

Network and I/O Threads

num.network.threads
queued.max.requests
num.io.threads

ISR Management

replica.lag.time.max.ms
num.replica.fetchers
replica.fetch.min.bytes
replica.fetch.wait.max.ms
unclean.leader.election.enable

Kafka Zookeeper Performance Tuning

zookeeper.session.timeout.ms
jute.maxbuffer
maxClientCnxns

Leave a Comment