paint-brush
Getting started with Distributed SQLby@alejandroduarte
628 reads
628 reads

Getting started with Distributed SQL

by Alejandro DuarteJanuary 9th, 2023
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Distributed databases allow you to scale reads and writes by adding nodes to the cluster while keeping all the benefits of relational databases. A key feature of a distributed SQL database is that it makes a cluster look like a single logical database. MariaDB Xpand rebalances the data when a node goes down, a new node is added, or a hotspot is detected.
featured image - Getting started with Distributed SQL
Alejandro Duarte HackerNoon profile picture


A distributed SQL database allows you to scale reads and writes by adding nodes to the cluster while keeping all the benefits of relational databases. A key feature of a distributed SQL database is that it makes a cluster look like a single logical database. Applications don’t need to know the number of nodes or implement sharding logic.

MariaDB Xpand

MariaDB Xpand is a distributed SQL database compatible with MariaDB in which SQL tables are automatically partitioned based on the hash of a subset of columns (the primary key by default). Each partition forms a slice of data that is placed in a node. When a write happens, the hash is calculated, and the operation is forwarded to the correct node. MariaDB Xpand rebalances the data when a node goes down, a new node is added, or a hotspot is detected. All this happens automatically and transparently to the application.


Multiple writes can be sent in parallel to multiple nodes. Reads are forwarded to the nodes that contain the slices with the data and can run in parallel minimizing lag. This architecture is able to handle large amounts of data and maintain high levels of performance that surpass non-distributed databases. The query is sent to the data instead of data to the query.


The following diagram is an example of a MariaDB Xpand database cluster with three nodes. Each slice is shown in a different color. The diagram also shows how replicas are automatically maintained for high availability:


A distributed SQL database with three slices


During a scale-out process, a new node is added (xpand_4 in the following figure). MariaDB automatically rebalances the data to increase the overall capacity of the system:


MariaDB Xpand database after an automatic rebalancing of data


Distributed SQL is the best alternative to manual database sharding. It frees DevOps engineers and DBAs from tedious management operations such as data rebalancing. It also frees developers from implementing and maintaining custom sharding logic that increases the complexity of queries (especially with cross-node joins). A developer can simply write a SQL query as if it was a single-node database.


Getting started with Docker

The best way to get started with Distributed SQL is to try it out. Docker makes this process extremely simple. To spin up a single-node MariaDB Xpand instance suitable for testing and experimentation using Docker, run the following:


docker run --rm -p3306:3306 --ulimit memlock=-1 --name xpand mariadb/xpand-single


You can connect to the database as if it was a single-node MariaDB server. See this website to learn how to connect from multiple programming languages (Java, JavaScript/NodeJS, Python, C++). You can also connect from SQL clients such as DBeaver, DataGrip, or any IDE extensions for SQL databases.


Once you have tried your application with a single-node Xpand instance, the best way to test its scalability and other advanced features (like automatic scaling and columnar indexing for analytics) is to head through SkySQL, a cloud service that offers a free tier (no credit card required).


There also are plenty of online resources to learn more about Distributed SQL and MariaDB Xpand: