Thursday, November 16, 2017

Replicated Object. Part 3: Subjector Model

Parallel execution

Preface

This article is a continuation of the series of articles about asynchrony:

  1. Asynchronous Programming: Back to the Future.
  2. Asynchronous Programming Part 2: Teleportation through Portals.

After 3 years, I have decided to expand and generalize the available spectrum of asynchronous interaction based on coroutines. In addition to these articles, it is also recommended to read the article related to god adapter:

  1. Replicated Object. Part 2: God Adapter.

Introduction

Consider an electron. What do we know about it? A negatively charged elementary particle, a lepton having some mass. This means that it can participate in at least electromagnetic and gravitational interactions.

Saturday, August 19, 2017

Kinetics of Large Distributed Clusters

Summary

  1. Martin Kleppmann's fatal mistake.
  2. Physicochemical kinetics does mathematics.
  3. The half-life of the cluster.
  4. We solve nonlinear differential equations without solving them.
  5. Nodes as a catalyst.
  6. The predictive power of graphs.
  7. 100 million years.
  8. Synergy.

In the previous article, we discussed in detail Brewer's article and Brewer's theorem. This time we will analyze the post of Martin Kleppmann "The probability of data loss in large clusters".

In the mentioned post, the author attempts to simulate the following task. To ensure the preservation of data, the data replication method is usually used. In this case, in fact, it does not matter whether erasure is used or not. In the original post, the author sets the probability of dropping one node, and then raises the question: what is the probability of data loss when the number of nodes increases?

The answer is shown in this picture:

Data loss

Sunday, August 13, 2017

Latency of Geo-Distributed Databases

Theorem 0. The minimum guaranteed latency for the globally highly available strong consistency database is 133 ms.

Earth

1 Abstract

The article introduces step-by-step the formal vocabulary and auxiliary lemmas and theorems to prove the main theorem 0. Finally, as a consequence, the CAL theorem is formulated.

2 Introduction

Modern applications require intensive work with huge amount of data. It includes both massive transactional processing and analytical research. As an answer to the current demand, the new generation of databases appears: NewSQL databases. Those databases provide the following important characteristics: horizontal scalability, geo-availability, and strong consistency.

NewSQL era opens new possibilities to store and process so called Big Data. At the same time, the important question appears: "how fast the databases might be?". It is very challenging task to improve the performance and latency parameters because it involves almost all layers while building the databases: from hardware questions about data centers connectivity and availability to software sophisticated algorithms and architectural design.

Thus, we need to understand the degree of latency optimizations and corresponding limitations that we have to deal with. The article tries to find answers to that challenge.

Saturday, March 4, 2017

CAP Theorem Myths

Introduction

cap

The article explains the most widespread myths of CAP theorem. One of the reason is to analyze recent Spanner, TrueTime & The CAP Theorem article and to make clear understanding about terms involved in the theorem and discussed a lot under different contexts.

We consider that article closer to the end, armed with the concepts and knowledge. Before that, we analyze the most common myths associated with the CAP theorem.