Kafka Consumer memory usage

I’m working with Kafka for more than 2 years and I wasn’t sure if Kafka Consumer eats more RAM memory when it has more partitions. I couldn’t find any useful information on the internet, so I decided to measure everything by myself. Inputs I started with 1 broker, since I am interested in actual memory consumption for 1 and 1000 partition topics. I know, lauching Kafka in a cluster can differs, because we have replication processes, acknowledgments, and other cluster things, but let’s skip it for now. Two basic commands for launching Kafka single node cluster: bin/zookeeper-server-start.sh config/zookeeper.properties bin/kafka-server-start.sh config/server.properties I created two topics, topic1, with 1 partition, and topic2, with 1000 partitions. I believe, the difference between partitions is enough for understanding memory consumption. bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic topic1 bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1000 --topic topic2 It’s good that Kafka provides us with kafka-producer-perf-test.sh, a performance script, which let us load test Kafka. bin/kafka-producer-perf-test.sh --topic topic1 --num-records 99999999999999 --throughput 1 --producer-props bootstrap.servers=localhost:9092 key.serializer=org.apache.kafka.common.serialization.StringSerializer value.serializer=org.apache.kafka.common.serialization.StringSerializer --record-size 100 So, I consequently launched load tests to insert data into two topics with a throughput of 1, 200, 500 and 1000 messages/second. I collected all...

Key things you should know about being a freelancer

Hi, my name is Ivan Ursul and I am a freelance engineer since 2015. It’s been a while since I started my career as an independent freelancer. I started it as an engineer in Upwork, in one of their teams, where I was involved first in reporting backend service, then in the time-tracker pipeline, which is served as a backend for Upwork Tracker Application(UTA) client. Today I continue my work with Upwork, but I am also actively working with other customers, who are very different. I’ve successfully completed 22 projects since the very beginning. That’s why I decided to write an article about different aspects of everyday life of a freelance engineer. You may agree or disagree, anyway, I encourage you to leave your thoughts under this article. This article will be grounded on Upwork platform, I haven’t used other platforms, but I am quite sure the approach is the same. Learn your customer You will have to find out the common things about your clients. Are all of them technical? Do you prefer to work with non-technical people? What is your industry domain? These are the questions you should have answers to. After you realize what combines your customers...

Detecting memory leaks using JVisualVM and Memory Analyzer Tool

Few days ago I had a problem on one of the projects that I am working on, we had a memory leak. During the two days period our services crashed three times, so I decided to investigate it. Everything which I’m going to talk about is not a rocket science, there’s no clever and tricky tips, it’s just a straighforward explanation how you can find memory leaks. Exposing JMX I had a problem on a production instance, so I started my services with JMX feature enabled. Just start your apps with following params: -Djavax.management.builder.initial= -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=${whatever_port} -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false Starting JVisualVM Just enter your terminal and type jvisualvm. You should get following screen: Add a remote connection, specify JMX port and connect. Waiting You have to wait some time before retained memory will take place and you will be able to analyse it. It’s up to you how long to wait, in my case, it was enough to wait 4-5 hours to get 100% proof of what part of the system is leaking. Getting heap dump Now go to Monitor section, press Heap Dump button and specify path where heap dump should be saved. In my case it was /tmp/**.hproof. Then...

How we wrote chicken egg counter on a Raspberry PI

How it started Besides my main work on Upwork I quite often pick different projets. So I found a project, where I had to write a program for recognizing chicken eggs on a factory stream line. Customer wanted to install the application on computer with web camera, put this camera at a top of stream line and the application had to calculate eggs and send them to the DB. He also wanted to run this program on a cheap computer. The quality of the network in the factory isn’t stable, so the program had to be resilient to outstand the network issues. There was enough challenges for me, so I decided to participate on this project. The biggest challenge here was that I had no serious experience with OpenCV and image recognition, so I wanted to test myself if I can deep dive into unknown field and return with successful result. Customer wanted to have 99% of recognition. This whole post will be a story how this application was designed, how it was written and what problems did I faced during the development. I will try to explain each architecture decision, from the beginning and to the end of the...

Migrating from Ghost to Jekyll

Jekyll Why Jekyll ? I decided to use Jekyll, because I had a blog on Ghost platform. I was waiting for a new 1.0 release, but then I suddenly realize that I don’t want to use it, because: I have to maintain it on my own I have to pay 10$ every week for 1GB DigitalOcean instance SSL certificates Ghost is written on javascript, so there’s a specific scalability Jekyll, on the other hand, is hosted on GitHub and is a great and modern instrument for writing your blog. The idea is that you store all your images and posts on GitHub. Speed My first question was about performance. If it’s a static files, hosted on GitHub, then they have to be extremely slow. No, that’s not true and according to my measures, new version on Jekyll is even faster, than Ghost version. Convenience Another question I asked myself was how I will write posts. Because Jekyll has no admin gui, I need to find a way to write posts. I find MacDown tool very convenient for writing posts on my local laptop. Another problem comes about pre-showing posts. For example, you want to see how your post will look...