Data Mastery: SQL — Full Join

Written by laurenjglass9 | Published 2018/11/11
Tech Story Tags: programming | data-science | data | sql | interview-questions

TLDRvia the TL;DR App

Data Mastery: SQL — Full Join

There are so many exciting projects out there in the Data World. Artificial Intelligence, Machine Learning, Neural Nets, Blockchain, and more are sweeping the technology industry. In order to get to the cutting-edge stuff, first and foremost, data needs to be stored, evaluated, and tested. The best place to do that is SQL (or a library that operates with SQL-like commands, see my article on Python’s Pandas library).

This series Data Mastery: SQL, will teach you the essential subjects. These are not exhaustive tutorials. Instead they are focused preparation guides — with brevity and efficiency in mind. It is meant for:

  • Software Engineers who want to analyze their creation’s data
  • Product Managers, Marketers, and others who want to be data-driven
  • Beginning Data Scientists, Data Engineers, Data Analysts, or Business Intelligence Developers preparing for interviews

See my resource list of the books I used to prepare for my big interview

Each article will contain a brief technical explanation of the subject, an example question, and an answer. Follow up articles will contain challenging questions so you can evaluate your learning.

This series does not come with accompanying data sets. The advantage to this is when you are on the drawing board, whether in an interview or project design, you do not have test data to play with. You have to think abstract.

Full Join

In order to see the full power of the FULL JOIN we will introduce a new table, user_messages, that shows the daily number of messages each user sent to their friends.

date       | userid    | num_messages
-------------------------------------
2018–09–01 | 983489272 | 6
2018–09–01 | 234342423 | 3
2018–09–01 | 305852827 | 4
2018–09–01 | 867736361 | 2
2018–09–02 | 874726454 | 15

Here is our daily_user_score table from earlier in the series:

date       | userid    | sessionid | score
------------------------------------------
2018–09–01 | 983489272 | 125       | 112
2018–09–01 | 234342423 | 34        | 112
2018–09–01 | 567584329 | 207       | 618
2018–09–02 | 983489272 | 126       | 410
2018–09–02 | 983489272 | 127       | 339

A FULL JOIN allows you to combine all data from both tables in your query.

The next important item to have in your toolbox is the COALESCE function. This function takes a list of columns. It will check each column in the list and select the first non-NULL value. If all are NULL it will return NULL. It is often used with a FULL JOIN because some values may only exist in one table or the other.

Example: Userid 567584329 only exists in daily_user_score. COALESCE ensures we will have it in results using a FULL JOIN.

If we wanted to compare the number of messages and scores for users who played or messaged on 2018–09–01, we would employ a FULL JOIN and our query would look like this:

SELECT COALESCE(m.userid, s.userid) AS userid, 
       m.num_messages, 
       s.score
FROM user_messages AS m
FULL OUTER JOIN daily_user_score AS s
ON m.userid = s.userid
AND m.date = s.date
AND m.date = ‘2018–09–01’;

The results of this query are:

userid    | num_messages | score
--------------------------------
983489272 | 6            | 112
234342423 | 3            | 112
567584329 | NULL         | 618
305852827 | 4            | NULL
867736361 | 2            | NULL

Notice the NULL values in num_messages and score? You will find the corresponding userid in only one of the two tables. COALESCE allowed us to stitch them together into one resulting dataset.

Try it yourself

Show the revenue and number of messages of each user who did either.

Recall our table user_revenue:

userid    | revenue
--------------------
983489272 | 100
234342423 | 33
567584329 | 57
305852827 | 8
867736361 | 29

Answer

SELECT COALESCE(m.userid, r.userid) AS userid, 
       m.num_messages,
       r.revenue
FROM user_messages AS m
FULL OUTER JOIN user_revenue AS r
ON m.userid = r.userid;

(You can also switch the tables to either side of the FULL JOIN, it does not matter)

This query returns:

userid    | num_messages | revenue
-------------------------------
983489272 | 6            | 100
234342423 | 3            | 33
305852827 | 4            | 8
867736361 | 2            | 29
874726454 | 15           | NULL
567584329 | NULL         | 57

Thanks for reading! If you have questions feel free to comment & I will try to get back to you.

Connect with me on Instagram @ljglass

Connect with me on LinkedIn

Check out my essentials list on Amazon

Search for me using my nametag on Instagram!


Published by HackerNoon on 2018/11/11