How Cell-Based Architecture Helps Big Systems Scale

by Raju AnsariMay 10th, 2025
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

As systems scale, availability and fault tolerance become harder to maintain. Cell-based architecture addresses these challenges by breaking services into self-contained "cells," each with its own dependencies and traffic routing. This design improves resilience, simplifies scaling, and reduces deployment risk. It’s a proven model for building modern distributed systems that are safer, faster, and more scalable.
featured image - How Cell-Based Architecture Helps Big Systems Scale
Raju Ansari HackerNoon profile picture
0-item
1-item

One of the biggest challenges in modern systems is operating at scale. As systems grow, they must meet rising demands for high availability, scalability, and resiliency. To address these needs, we rely on techniques like horizontal and vertical scaling, caching, and redundancy. However, treating a service as a single unit introduces risk—any software bug or operational error can bring down the entire system, which can be disastrous. In this article, we explore cell-based architecture, a design approach that improves resiliency, boosts availability, and enables virtually limitless scalability.

Why is Adopting Cell-Based Architecture Important?

By adopting cell architecture in distributed systems, you’ll enjoy:


  1. High Availability: Cells enhance system availability as they reduce the chance of failures. A system with n cells will have n times as many failure events, but each with 1/nth of the impact.


  2. High Scalability: Cell-based architecture enables horizontal scaling, allowing services to scale out rather than up. This makes it especially effective in scenarios where quota limits on dependent services become a bottleneck.


  3. System resiliency: Cells are self-contained service images, thus, any issue or service outage in one cell doesn’t impact another cell. This design also helps increase the mean time between failures and shorten the mean time to recovery.


  4. Safer Deployment: Cells promote phase-wise deployments and reduce the blast radius from problematic deployments. Service deployment is safer when we use a phase-wise deployment approach.

What is Cell-Based Architecture?

In cell-based architecture, a service is made up of two things:

  1. Many cells, where each cell has the same image of your service, a complete replica of your service with all the dependencies, and

  2. a routing layer to route traffic to a dedicated cell. A customer request will go to a dedicated cell based on the routing algorithm or mapping structure. Below is an example to help illustrate this.


Consider your mobile device connected to a mobile tower. This tower serves the devices in a limited range. If another tower malfunctions, it doesn't impact your mobile service as each tower forms a dedicated cell. Moreover, if your tower faces any issue like an outage, the mobile device will reconnect to another nearby tower. This ensures uninterrupted service, leading to enhanced service availability & resiliency.


Cell-based architecture in software works similarly. They serve limited users, reduce the service degradation, and help achieve high scalability. The diagram below depicts a real-world example of cell architecture.


Real-world example of cell architecture.

Cell-Based Architecture Design

As discussed in the previous section, a service comprises:

  1. many cells and

  2. routing layers.


Cell architecture design appears pretty straightforward. However, many factors make it challenging and require applying the technique judiciously. In the real world, most services are not designed based on cells from inception. They evolve as they grow, and at one point, when they can’t be further scaled, we adopt cell architecture. It is always great to incorporate this approach from day one of designing your service, which saves us from scaling issues and re-architecture challenges such as migration, backward compatibility, and service operation. Below is a high-level design of a cell-based architecture of a service.


An illustration of a cell-based architecture service.

Let’s look at the building blocks of cell-based architecture.

Cells

A cell is an instance of your service that comprises:

  1. All the dependencies and their interconnections.

  2. Storage - It has a fixed maximum capacity, for example, it can serve a maximum of 1000 customers. To serve an increased demand, just spin up a new cell.


The diagram below shows a service with all the dependencies (and without any cells). It is still a scalable service employing horizontal/vertical scaling, caching, etc.

A service with dependencies (but without any cells)


Below is the cellular architecture of the same service. We have the same service replica in two cells (accounts) with all the dependencies. A request to the service is routed to one of the cells based on the routing algorithm.


Cellular architecture of the same service in the previous figure.

Routing Layer

This is a thin layer that handles request routing. It maps the customer’s request to a dedicated cell. For example, you can have a modulo operation on the hash value of the request to route a request. Usually, we keep this layer as light as possible so that we don’t cause any latency and failure in the request processing.

Routing Key and Algorithms

The routing key is the dimension on which cells are allocated. It can be userId, accountId or a hash of a combination of fields in the request. It is recommended to use well-distributed attributes for routing keys.


Routing algorithms can be as simple as a modulo operation to a database lookup. Some of the algorithms that you can employ are as follows:


  • Modulo mapping: Use a modular operator to map keys to a cell.
int getCell(int routingKey) { 
  return routingKey / getNumberOfCell(); 
} 


int getCell(int routingKey) { 
    return consistentHash(routingKey, getNumberOfCell()); 
} 


  • Table Mapping: Use a mapping table that maps routing keys to cells.




The most common approaches are hashing and table mapping. Table mapping allows you to override a map. However, it comes with the challenge of refreshing the map as, and when, the number of cells changes in our architecture. We can also use a cell routing algorithm to dedicate a cell to a large customer or test environment to address the noisy neighbor problem.

Cell Migration

In some situations, we may need to perform cell migration if our services are stateful. It is always advisable to make your service stateless to avoid this situation; however, if it is unavoidable, be careful in cell migration to avoid any service disruption. Migration involves three steps:-

  1. Moving the data from one cell to another

  2. shifting the traffic to the new cell and

  3. cleaning up the data from the older cell.


We start by copying the data from one cell to another cell, then we update the routing logic to shift the traffic. Once the traffic is completely shifted to the new cel,l we clean up the data from the older cell.

Tips and Tricks

  1. Design stateless service components. Stateless services are easy to maintain in cell architecture. Stateful service requires cell migration overhead.
  2. Configure cells based on the regional traffic. You don’t need to have the same number of cells in each. For example, you can have 10 cells in the US region and 5 cells in ASIA based on the demand.
  3. Carefully choose the attributes for the routingKey.
  4. Incorporate cells from day 1 in your design.
  5. Define the lightest possible component for the routing layer.
  6. Design carefully for cell migration.

Conclusion

Cell-based architecture is a powerful and proven architecture to make systems more resilient and highly scalable. This approach ensures scalability even if the dependent service quota is a limiting factor. It is a simple pattern, however requires careful service design to make them operate independently in cells. I will present a case study of adopting cell architecture to address the scaling issue after launching and running the service in production for 6 months.

Sources

https://docs.aws.amazon.com/wellarchitected/latest/reducing-scope-of-impact-with-cell-based-architecture/what-is-a-cell-based-architecture.html



Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks