Distributed system design

If we get there, we’ve already understood why algorithm complexity is essential, or maybe no. Nothing is impossible. When we run something huge, constant complexity is critical. We better trade space for reducing computational complexity. We can survive with linear complexity, but caching is the only option with quadratic or exponential. Developing a distributed system, we are doomed to operate at the edge of each machine’s computational power. Assume we have one million everyday users, or maybe two. We want to track their relations and allow them to publish posts. We cannot put all this into the single instance of the Postgres, and every time user wants to refresh his feed, we request the database directly. Usage of a single database for media content and user metadata also would be slightly unwise. Imagine joining them on such a scale if we put users, posts, and friends’ relations to different tables in one database. To execute join statements in logarithmic time, we need indexes. If we put index, we still have logarithmic time but for both read and write operation. To fix it, we start to use caching. Moreover, our marketing department asks us to build messenger on top of it. So we have to put this functionality with low latency requirements on our heavily cached system. It would be challenging. We go global. Our userbase became geographically distributed with a connection latency of more than 100 ms to make a roundtrip between them.

Key-value data structure

We have a bunch of computers executing our application. It is very similar to a zoo with animals. If you let them exist by themselves, you would get a jungle. So we need Zookeeper, or something that kind. An elephant needs a giant cage, but for a parrot, the small one is enough. We need to maintain order and navigate throughout our zoo. So the name of the animal could be a key, and a cage is a value. Some parameters, such as big rocks, pools, etc., could be stored as well. We can easily set up a new zoo because an elephant needs a vast cage. We don’t need to specify the size next time.

The best data structure for this is the associative massive. Its Hashtable implementation gives us constant complexity for accessing values by the given key. It uses the random access nature of the array and some additional space to reduce the hash function probability of collision. But in return, it provides us with a tool for mapping or dimensionality reduction. Even it is elementary, but it is crucial to wrap our minds around this conception. But having just such storage is not enough to run distributed system. We also have to think about concurrent access. There are several ways to reach it. First of all, we should think about what proportion of reading and writes do we have. If writes are rare, we can use write locking with an immutable guarantee for values. Somehow we have to linearize computation to prevent divergence of the distributed system state. If we have ten thousand clients concurrent for writing, it is a slowdown for our system, whatever we use a lock or some optimistic locking with a verification phase.

Vector clock

Once, my curiosity brought me to the therm microservice architecture. After this, I discovered event-driven architecture. Next was event sourcing. Still, I’ve not got why the consistency problem of distributed systems needs all those abstractions. But once Martin Fowler threw it on us, we have to deal with it. Focusing on events is another technique to fight the consistency problem, which is about linearizing events without raising latency. If we build a banking system, we develop client applications generating transactions, and some services accept or decline transactions. Let’s presume we sent transaction riding underground before we receive response train moved, but we repeat transaction. So payment service come up with two transactions with different identifiers. We can attribute the current balance amount with the last processed transaction’s identifier to implement a vector clock data structure with partially ordering and using compare and swap mechanism. So we can guarantee we process only first among all generated on the client-side before acknowledging the previously processed transaction.

Eventual consistency

In order to boost the read throughput of storage, we can replicate data, but from the moment we have several copies of the data, we have to enforce consistency. The approach to make data consistent is serializing the modification operations and declining out-of-order ones. There are ways to reach it. Mutual exclusion or exclusive lock could cause the slow down of the application’s performance, increasing waiting time to enter the critical section. And optimistic locking, which is also could cause such a negative impact on performance as starvation. We can limit the number of servers competing for one lock, and it could work fine until we are in a single datacenter with low network latency between our servers. But now, parts of our system could get separated geographically, and we may face network partitioning problems. Maybe sometimes we would accept a small divergence between the state in different datacenters, and it is ok that we have consistency but with a short delay at the end. But guaranteed writing to all replicas has linear complexity. We can do better.

Gossiping

The following scenario roughly characterizes gossip protocol: One node selects another randomly from the pool of known peers. The nodes exchange information and update information. It helps to disseminate data through servers, making our replication process smart and reliable with logarithmic complexity. It bases on the peer-to-peer principle, avoiding the necessity of having a single master or single point of failure. There are underlying conceptions mitigating contention over lock acquiring: fain-grain and coarse-grain lock mechanism. We reckon that our services could be in different datacentres, sometimes, even more, solely datacenter could be down, and we still should be able to serve clients. That is how employing algorithms, we can build a highly available system.

Conclusion: hiding complexity

To separate the complexity of the distributed system from business logic was introduced service mesh conception. It encapsulates application business logic into the data plane and all the rest into the control panel. Control panel abstracts service interconnection from the developer and configures topology, organize application monitoring and maintenance. That’s how the developer now doesn’t need to spend time to go deep under the hood of Zookeeper, Chubby or etcd, but immediately use Kafka, Kubernetes, Hadoop, and some other ready-to-use from box solutions.