ivanursul

kafka

Kafka Consumer memory usage

I’m working with Kafka for more than 2 years and I wasn’t sure if Kafka Consumer eats more RAM memory when it has more partitions. I couldn’t find any useful information on the internet, so I decided to measure everything by myself. Inputs I started with 1 broker, since I am interested in actual memory consumption for 1 and 1000 partition topics. I know, lauching Kafka in a cluster can differs, because we have replication processes, acknowledgments, and other cluster things, but let’s skip it for now. Two basic commands for launching Kafka single node cluster: bin/zookeeper-server-start.sh config/zookeeper.properties bin/kafka-server-start.sh config/server.properties I created two topics, topic1, with 1 partition, and topic2, with 1000 partitions. I believe, the difference between partitions is enough for understanding memory consumption. bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic topic1 bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1000 --topic topic2 It’s good that Kafka provides us with kafka-producer-perf-test.sh, a performance script, which let us load test Kafka. bin/kafka-producer-perf-test.sh --topic topic1 --num-records 99999999999999 --throughput 1 --producer-props bootstrap.servers=localhost:9092 key.serializer=org.apache.kafka.common.serialization.StringSerializer value.serializer=org.apache.kafka.common.serialization.StringSerializer --record-size 100 So, I consequently launched load tests to insert data into two topics with a throughput of 1, 200, 500 and 1000 messages/second. I collected all...

Microservices interaction at scale using Apache Kafka

Intro Why do people do microservices architecture? Let’s start with a review of a classical, old-school, n-tier application, which is often called a monolith: Everyone worked with this types of applications, we were studying them in university, we did lot’s of projects with this architecture and at first glance, it’s a good architecture. We can immediately start doing systems on this architecture, they are perfectly simple and easy to implement. But as such system becomes bigger, it starts receiving lots of different issues. The first issue is about scalability, it doesn’t scale well. If some part of such system will need to scale, then the overall system should be scaled. Since the system is divided into modules, it runs as one physical process and there’s no option to scale one module. This can lead to expensive large instances, which you will have to maintain. The second issue is about engineers. Since it will become enormously big, you will have to hire a really big team for such system. Evolution How can you address this issues? One of the ways you can evolve is to make a transition to a microservices architecture The way it works is that you have a...

Kafka differences: topics, partitions, publish/subscribe and queue.

When I first started to look at Kafka as a event publishing system, my first question was ‘How to create pub/sub and queue type of events?’. Unfortunately, there was no quick answers for that, because the way Kafka works. My previous article was about Kafka basics, which you can, of course, read and get more information about this cool commit log. By this article I’ll try to explain why Kafka is different from other similar system, how it differs, and will try to answer to all interesting questions, which I had in the beginning. Why Kafka is a commit log? Simply, because Kafka works different to other pub/sub systems. It’s a commit log, where new messages are being appended constantly. Each message has it’s own unique id, which is called offset Okay, and how to consume this so called ‘commit log’ ? Consumer stores one thing - offset, and he is responsible for reading messages. Consider this console consumer bin/kafka-console-consumer.sh –zookeeper localhost:2181 –topic test –from-beginning As you see, this consumer will read all log from the beginning. Are messages being deleted ? Yes, after some time, there’s a retention policy. So, how to create pub/sub model ? Every consumer should...

Apache Kafka. Basics

Preface Today, we live in a world, which defines no ip addresses and is dynamically changing minute by minute. As amount of services increase every day, we receive new problems. Story Just imagine, that you have a monolithic application, which you want to rewrite into microservices architecture. You start with a single service, which, let’s say, maintains profile functionality. You use MongoDB as a database. At this step you don’t have any troubles, because, it’s a single one, and it doesn’t interact with other world. You developed some required amount of endpoints, everything works fine. Then imagine, that your next step is to start doing another service, let’s say, billing service, which uses PostgreSQL for storing transactions. Besides that, you need to know, when Profile service receives new put request and updates some profile. So you have two options - develop an external API for your case or work with PostgreSQL db straightly. So, you chose to work with db, and your problems begin with this point: you coupled profile and billing service together. You are signing a contract, that from now on, you need to think about two databases. And this sucks. Here’s why After some time you receive...