paint-brush
Why Do We Use Hexagons And Not Sqaures to Aggregate Location Databy@aditi1002
1,422 reads
1,422 reads

Why Do We Use Hexagons And Not Sqaures to Aggregate Location Data

by Aditi SinhaApril 4th, 2020
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

In Spatial Data Science, we use grids that are regular polygons repeating over a surface, edge to edge to cover any space without overlaps and gaps. There are just three types of grids or that can tessellate: squares, equilateral triangles & hexagons. Hexagonal grids are more symmetric than geohashes. The global grid system covers the entire surface of the earth and can be very useful to run models that work in context to location and in real-time.
featured image - Why Do We Use Hexagons And Not Sqaures to Aggregate Location Data
Aditi Sinha HackerNoon profile picture

If you are a two-degree marketplace like Uber, you cater to millions of users requesting a ride through your driver partners accepting and fulfilling those requests. For a three-degree marketplace like Swiggy, there is another static component added (like restaurants or stores), where delivery partners pick up the orders.

Borrowing from the quote, “Everything happens somewhere” — all of these events and actions described take place at a specific location!

Often, companies end up not leveraging the lat/long component in their data and running their analyses at the city level. But, cities are too large, geographically diverse and the parameters vary way too much!

Area level polygons are much more practical but still broad. They don’t have uniform shapes or sizes and are subject to changes very frequently. Even the zones or clusters drawn by the operations teams’ local knowledge require updating and have arbitrary edges.

Making sense of your spatial data and deriving precise insights requires these analyses to become more granular and uniform.

Grid system brings that fine granularity to the table. It works brilliantly in bucketing all your lat-longs into “cells”. These cells can also be clustered to represent a particular neighborhood or area and can be aggregated at different levels.

Hence, this system becomes critical to crunch large spatial data sets to match the supply & demand fragmented across the city.

What is meant by grids?

In Spatial Data Science, we use grids that are regular polygons repeating over a surface, edge to edge to cover any space without overlaps and gaps — a phenomenon called tessellation. Each cell can be assigned a unique id for spatial indexing (aggregating the points inside that cell).

A wide variety of grids of different shapes have been proposed including squares, rectangles, triangles, hexagons or diamonds. The global grid system covers the entire surface of the earth.

If you are a hyperlocal, on-demand company, grids as small as 0.5 sq. km can be very useful to run models that work in context to location and in real-time. Examples include surge pricing in high demand areas, promotions in low demand areas & distribution models of delivery folks on the ground.

What are the different grid types?

There are just three types of grids or that can tessellate: squares, equilateral triangles & hexagons.

1. Square grids:

The most common application of square grids occurs in raster datasets and geohashes. For the scope of this piece, we will focus on geohashes.

Geohash is a hierarchical data structure to transform a 2D spatial point (lat & long) into a short string of alphabets and numbers. They divide the world into a grid of 32 cells with 4 rows and 8 columns.

You can keep splitting up each cell into a grid of 32 cells. Thus, the longer the string of the geohash, the higher the accuracy! You can also easily identify if geohashes are close together if they can have a common prefix. So, the longer the common prefix, the closer they are.

For example, the coordinate pair (57.64911, 10.40744) near the tip of the peninsula of Jutland, Denmark produces a slightly shorter hash of u4pruydqqvj. [1]

2. Triangular grids:

Triangular grids are not very commonly used. Besides, their unfamiliarity, one of the reasons for that is they have a large perimeter and a small area which means it’s harder to piece them together on the map.

Another reason is that each triangle is connected to only three adjacent triangles, which limits the number of options to move and to make connections. (Check the image below)

Moreover, while for hexagons and squares, there are always two faces parallel to each other, for triangles, there are two directions in which lines are parallel centered from the axis of movement. Thus, in a way, there are not completely symmetrical. [2]

3. Hexagonal grids:

Besides looking appealing, hexagons are more symmetric than geohashes. They are very close to circles in terms of shape to provide a more accurate sampling. [3]

As a result, this system has been increasingly adopted by companies like Uber.

Fun fact: A hex grid and triangular grids are a dual of each other — putting a dot in the center of each hexagon & connecting them to all the adjacent ones, you get a triangular grid and vice versa! [4]

Moving on the vertices of a hexagonal grid is equivalent to playing in the spaces on a triangular grid. Square grids, on the other hand, are a dual of themselves.

Why hexagons?

At Locale, one question that we get a lot from our customers is, “why are we using hexbins and not geohashes?”

Well, the choice depends on your exact use case and you would have to make some tradeoffs, no matter what you use. So, let’s take some parameters and deep dive.

Distance from Nearest Cells:
This diagram shows the distance of the center of triangles, square & hexagon to its neighbors.

A triangle has three kinds of distance (through the edge, vertex and across the center of the edge), a square has two (across the edge & the diagonal) and hexagon only has one — another reason why triangles are not really favored.

This property of hexagons makes it very easy to perform analysis and is preferred when your analysis includes aspects of connectivity or movement. [4]

All the neighbors in a hexagon form a ring around it with equal radius. The kRing function provides grid cells within distance “k” of an origin index. In the diagram below, here is the 1st kRing of the shaded hexagon and square.

Fitting on Curved Surfaces:

Hexagons are the densest way to pack circles in tessellation and reduce edge effects. (Circles have the least perimeter to area ratio but can’t form a continuous grid).

The more similar a polygon is to a circle, the closer the points near the border area are to the center. Thus, any point inside a hexagon is closer to its center as compared to an equal-area square or triangle.

Now, when large areas come into play and the curvature of the earth is important to consider, hexagons are therefore better suited to fit the curvature and suffer less distortion. [5]

Explicit Patterns in Data:

Hexagons allow any curvature of the patterns in the data to show easily and explicitly because they break up lines.

For linear figures like squares and rectangles, this becomes tricky. These shapes draw our attention to the straight, unbroken and parallel lines which hinder the patterns present in the data. Refer to the diagram below. [6]

Why geohashes?

That brings me to my next question, “When would someone use fishnets or square grids?”

Aggregation/ Division of Cells:

Different kinds of models require different granularity and that’s where aggregation and division become important.

If you need to increase the spatial resolution of a square grid, you just need to divide it into 4. Similarly, to aggregate, you need to combine four grids into one.

For hexagons, aggregations and divisions are not uniform at different scales as shown in the image below. The finer cells are only approximately contained in the parent cell. [7]

Squares are preferred over hexagons for hierarchical analysis. Combining square grids is fairly simple. No spatial operations are necessary for combing multiple grids built on the same template — you can use matrix algebra.

Very Intuitive & Familiar:

We also think in terms of “squares”. Up, down, left, right are simple to understand. We have built cities and civilizations on squares and rectangles. Since our primary coordinate system is squared, people find it difficult to work on other systems.

They are also sometimes used for connectivity analysis as they have eight neighbors (including diagonals).

Some Real-Life Examples

Hexagons are found widely in nature. For example, honeycombs, graphite, benzene, silicene etc. Chinese Checkers is played on a hex grid and several variants of chess have also been invented for a hex board.

The hex grid is a distinguishing feature of the games from many wargame publishers, and a few other games (such as The Settlers of Catan)! [8]

About Locale:

At Locale, we are building an “operational” analytics platform using location data for supply and operations teams in on-demand companies. If you want to delve further, check our website out or get in touch with me on LinkedIn or Twitter.

Originally posted here.

Read Similar:

  • Geospatial Clustering: Types and Use Cases [Link]
  • A Guide to Kickstart with Geospatial Analysis [Link]

References:

If you wish to read more, check out the links to delve further:

Previously published at https://blog.locale.ai/spatial-modelling-tidbits-honeycomb-or-fishnets/