Introduction to Akka Cluster Sharding

Reading Time: 3 minutes

When we think of sharding or partitioning it’s typically related to databases. Databases uses sharding to improve resilience and elasticity. The Akka Toolkit provides Cluster Sharding as a way to introduce sharding in your application. Instead of distributing database records across a cluster, we’re distributing actors across nodes in the cluster, and it enables running at most one instance of a given actor at any point in time, acting as a consistency boundary.

Common use cases for cluster sharding are when you have many stateful actors that together consume more resources for example memory that can run safely on one machine and allows us to interact with them using a logical identifier without having to know the physical location in the cluster, which can change over time.

Akka Cluster Sharding consists of four main components

Entities
Shards
Shard Region
Shard Coordinator

Let us discuss each of these in detail.

Entities

The basic unit in akka cluster sharding is an actor called an entity and entities represent a specific data type. Entities has a unique identifier or Entity ID. For example, a unique device or a user. So if you have 3000 devices, you could chart them as 3000 device actors, each with a unique ID. Only one instance per ID runs in the cluster at any time. Even when we bounce to another node. There is only one entity per entity ID in the cluster. Messages are addressed to the entity ID and processed by the entity. This allows the entity to act as a single source of truth, acting as a consistency boundary for the data that it manages.

Shards

Entities are distributed in shards. Each shard manages a number of entities and creates entity actors on demand. And each shard has a unique ID mapping entities to a shard ID is how we control the distribution. And this distribution of entities is defined by a user function. Usually it’s a hashing function of the entity ID. By default the shard ID is the absolute value of the hash code of the entity ID modular, the total number of shards, which again is user define.

Shard Region

Shards gets distribute into different shard regions. Each shard region contains a number of shards. For a type of entity, there is usually one shard region per JVM. A shard region will look up the location of the shard for the entity the first time when it doesn’t already know its location, and then forwards the messages to the appropriate node region, and the entity. There are two types of regions:

A non proxy and
A proxy mode.

The non proxy mode is the one that will share a region that will actually create the entities, and the proxy mode is the mode where no entities will create. It’s simply the proxy, which will forward and send the messages onto the appropriate shard region.

Shard coordinator

The shard coordinator is responsible to manage shards, it’s a cluster singleton. And this will monitor all cluster nodes, and their status. It’s responsible for ensuring that the system knows where to send messages addressed to a specific entity. And it decides which shard gets to live in which region, which is to stay on which node. A customizable shard allocation strategy decides where to allocate a shard. Cluster sharding comes with the default version.

If the shard coordinator itself goes down, as a result, it starts up another node and replaces its state. And this is backed either by akka distributed data, which is custom crdt by default, or you can use akka persistence plugin.

Conclusion

Cluster sharding is useful when you need to distribute actors across several nodes in the cluster and want to be able to interact with them using their logical identifier, but without having to care about their physical location in the cluster, which might also change over time. Sharding has many different features and we’ll come to know about all of them in my upcoming blogs. So stay tuned!

References

https://doc.akka.io/docs/akka/2.5/cluster-sharding.html

Introduction to Akka Cluster Sharding

Entities

Shards

Shard Region

Shard coordinator

Conclusion

References

Recommend

Google released Google Ads API version 7.0

主流 DEX 五大维度对比，你最看好谁？

How to Build a Serverless Full-stack Application Using Git, Google Drive, and Pu...

AWS Serverless Services: Lambda - Knoldus Blogs

支付巨头 Visa 为何没有率先支持比特币，而选择了稳定币？

特斯拉晒奶茶外卖单卖惨，这公关是请的外包团队吗

ZKSwap已于今日发放治理代币gZKS

以太坊抛开大饼一路狂奔为哪般？

Fighting Against a Pandemic of Dread

Why Google’s cookieless data policy isn’t the only concern in search marketing

About Joyk