How to Use Postgres to Recreate YikYak

Written by adam-fallon | Published 2020/08/26
Tech Story Tags: postgres | yikyak | database | geospatial | location-based-technology | social-network | maps | geolocation

TLDR YikYak was an anonymous social network that used your location to show you posts 5km around you. Users of the app could create new posts and the people around them could view the posts and vote up or down. Postgres enables users to incrementally add features that help us do new things with our data. The Postgres database enables us to make use of this data-structure via geospatial extensions. We can now use these built in functions from those extensions to show us the two nearby posts.via the TL;DR App

YikYak was an anonymous social network that used your location to show you posts 5km around you. Users of the app could create new posts and the people around them could view the posts and vote up or down.
YikYak filed a few patents for the tech that helped them achieve this. The patents mention segmenting users into buckets by their physical location. One modern tool we have to recreate this type of user segmentation is a data-structure called an R-Tree.
R-trees are tree data structures used for spatial access methods, i.e., for indexing multi-dimensional information such as geographical coordinates, rectangles or polygons.
Luckily the Postgres database enables us to make use of this data-structure via geospatial extensions. In this post I am going to;
Show how we can enable those extensions.Seed a few posts into our database.Find the posts in a small around a specific latitude and longitude using a SQL query.
Let’s get started!

Creating tables.

Firstly you will need an instance of Postgres. It is easy to set up in Docker (I’ve detailed a post here showing how).
I am going to be using DBeaver for this tutorial but you could use psql or any other Postgres connector. Let’s creating a new table for our posts.
Ready to go — So below we have a simple example of table for storing new posts. I am using a split latitude and longitude to show how the extensions work, but you could also combine the two into a POINT datatype if you are planning to use a lot of columns.
CREATE TABLE post ( id int8 NOT NULL GENERATED ALWAYS AS IDENTITY, post_content text NOT NULL, latitude float8 NOT NULL, longitude float8 NOT NULL );
On executing that you should have a table you can start insert values into.

Inserting posts.

So let’s start out by inserting two posts, the first posted from 10 Downing Street, and the second from Buckingham Palace.
INSERT INTO post VALUES ( default, 'I absolutely love the Queen. I hope she thinks I am doing a good job.', 51.5034, 0.1276 ); 
INSERT INTO post VALUES ( default, 'The new Prime Minister is a prat! I do hope he doesnt come over often', 51.5014, 0.1419 );
Now let’s put another post in from an aspiring politics student who is located in Cambridge University (65 miles away). Now we have an outlier that won’t show up once we do location bound queries later in this tutorial.
INSERT INTO post VALUES ( default, 'Day one of my politics degree. Shall be most fun to stalk the halls of Westminister in 4 years.', 52.2053, 0.1218 );

Installing Postgres extensions

We would like to be able to stand in St. James park (a large park between 10 Downing Street and Buckingham Palace) and see the two posts close by, but not the one from Cambridge.
So how do we do that? Through extensions! Postgres enables users to incrementally add features that help us do new things with our data.
Once they are installed we can use the latitude and longitude of 51.5032, -0.1349 to create a new select query on our posts table.
You can install extensions in Postgres simply by running a query. The two extensions we need are and earthdistance.
CREATE EXTENSION IF NOT EXISTS cube; 
CREATE EXTENSION IF NOT EXISTS earthdistance;
After executing those two queries, you should see them under the ‘Extensions’ tab in DBeaver.

Finding nearby posts.

We can now use these built in functions from those extensions to show us the two nearby posts.
SELECT * FROM post
WHERE earth_box(ll_to_earth(51.5032,-0.1349), 50000) 
@> ll_to_earth(latitude, longitude);
The earth_box function takes two parameters, a point (which is returned by the ll_to_earth function) and a value for the size of the bounding box we want which is in metres.
By using the contains? operator (@>) we are saying we only want values in the table in the bounding box generated by the earth_box function.
When executing that query we will see the two posts we were expecting! Try increasing the bounding box range out and you will be able to see the Cambridge post.
So now we have a working example of how to recreate the YikYak location-based functionality.

So…how?

Okay so why did we need those extensions? Can’t we just take the world, split it into squares and determine which box a latitude and longitude falls into?
Thats what we would like to do — but there are complications caused by the fact that the world is a sphere. To find posts “in your area” you are querying to find straight line distances between two points, your lat-long and for each row in the database. In a sphere there are no straight lines.
There is a way to determine the distance between two points known as the Great-Circle distance. Instead of using straight lines we use circles or curves known as geodesics. Through any two points on a sphere that are not directly opposite each other, there is a unique great circle.
The earthdistance extension allows us to generate queries using the contains? operator from the cube extension to generate efficient distance lookups between points.

Conclusion

One thing to note is that this query will do a sequential scan of the entire table, which can be slow once you get up to thousands of posts.
If you do decide to use this setup in your application you should create an index on the latitude, longitude to dramatically speed up queries. That would look like this.
CREATE INDEX loc_index ON post USING gist (ll_to_earth(latitude, longitude));
Postgres will then determine whether it needs to use this index to speed up queries. You can check if the index is being used by using a tool to view the execution plan when you run the query detailed above. If it says SEQ_SCAN it is not using the index.
And we’re done! If you’ve noticed any mistakes or improvements I can make please drop me an email at [email protected]

Written by adam-fallon | London Based Software Engineer
Published by HackerNoon on 2020/08/26