This post covers the design philosophy, components, and architecture of Kafka, i.e. everything which matters!

Let’s understand what is stream processing before we delve deeper. Jay Kreps, who implemented Kafka along with other members while working at LinkedIn explains about stream processing, here.

Let’s cover programming paradigms -

This post focusses on a particular type of recursive problem, SELECTION! This is one of the most commonly occurring recursions (and Dynamic Programming) pattern. The post will discuss some of the famous problems of this category and what’s the approach to solve them.

In Selection pattern, we find all combinations of the given input that match certain criteria. As the name suggests we simply include/exclude, or select, each item in our input list and add to the (potential) response list. Finding out all the combinations (which satisfy the given criteria) is the core of the selection problem.

Subset Problem

Generate all combinations…

Consensus is a fundamental problem in fault-tolerant distributed systems; involves multiple servers agreeing on values. Once servers reach a decision on a value, that decision is final. It is also referred to as Byzantine agreement or simply agreement. It’s called as fundamental as it’s required for the system to function correctly (Kubernetes will not work properly if Etcd which uses consensus algorithm fails). Consensus enables to get the reliability and performance of a distributed system without having to deal with the consequences of distribution (e.g. disagreements/divergence between nodes). This post focuses on a very high-level overview of consensus patterns.


This is a series of posts that try to uncover the common patterns in Distributed Data Systems (referred to in the post as DDS). The objective is to balance the theory, and practicality to cover major aspects in the quickest and crispiest possible way. DDS refers to a wide range of systems like Casandra, DynamoDB, HBase, Kafka, Redis, Couchbase, Zookeeper, Elasticsearch, and even the clustered relational DBs. In short, it covers any system which deals with data (irrespective of the data model, format/representation, or any other constraints) in a distributed environment. …

Partitioning (part of DDS Pattern), is about dividing the data set evenly and to spreading it among multiple machines. What happens if a node gets bumped out and we have only partitioning in place? Read/write stops since there is no data redundancy. So the only possible solution is to keep multiple clones or replica of the data set.

Replication is about keeping a copy of the same data set on multiple machines. Each node that stores a copy is known as a replica. It helps to achieve high availability and durability. One of the most fundamental aspects of replication is…

This post talks briefly about Partitioning, one of the patterns used by the Distributed Data Systems (DDS). This is how DDS applies the famous divide and concur algorithm to solve the problem of scale. Along with scale, partitioning also helps to achieve fault tolerance and low-latency. Partitioning is also referred to as Sharding.

There is a limit to the amount of data can be put in a single node. And even if we can put all data in a single node, what if the cost of doing so is very high and we are not able to achieve the required…

Distribution basically means we have more than one node and they could be sitting alongside on the same rack in a data center or could be as far as into two farthest apart data centers (in USA and in India). These nodes serve the request from clients and they also talk to each other for replication and maintaining cluster health (like gossip in Cassandra). This page deals with the limitations which get inforced because of embracing distribution.

The author of Distributed System for Fun and Profit points two physical limitations of distributed systems:

The post covers some of the important aspects of solving Graph problems. The graph in this post refers not only to Graph but some of its specialized variants like Binary Tree, Trie, etc as well. Here, I will be covering some of the important aspects of handling these questions in interviews.

Graph and its specialized subtypes

Let’s go through these in detail-

Graph is a collection of nodes with edges between some of them. A graph can either be directed or undirected. Directed edges are like a one-way street (i.e source → destination), undirected edges are a two-way street. In the undirected graph, an edge…

Master and the node(formerly known as minions) instances form the Kubernetes cluster. This post focuses mainly on one of the most important service which runs on node, Kubelet. The post covers the setup/installation of kubelet and deployment of pods.

Before jumping on Kubelet, let’s understand Pod

The pod is lowest level of abstraction in Kubernetes world. It is collection of multiple containers that are treated as single unit of deployment and all containers share the same resource i.e. network (IP address) and volume.

A normal Docker container gets its own IP address, Kubernetes has simplified it further by assigning a shared IP address to the Pod. The…

Every time you hear about a String or Text; ask this question -What’s the encoding? Things like- default encoding or plain text or no-encoding might sound practical but doesn’t have any relevance in reality. You might be designing a Microservice or just facing an interviewer, start by clarifying the character set (i.e. charset) or encoding of the strings. This is required because Computers use a sequence of bits to represent/encode any character currently available in human civilization. If you want to go through the encoding at the more fundamental level, I will recommend you to this classic article. …

Siddheshwar Kumar

Distributed System Engineer| ML/DL Enthusiast | Java, Python, and Go Programmer

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store