Data Mastery: SQL — Inner Join

Written by laurenjglass9 | Published 2018/11/12
Tech Story Tags: interview-questions | data-science | programming | data | sql

TLDRvia the TL;DR App

Data Mastery: SQL — Inner Join

There are so many exciting projects out there in the Data World. Artificial Intelligence, Machine Learning, Neural Nets, Blockchain, and more are sweeping the technology industry. In order to get to the cutting-edge stuff, first and foremost, data needs to be stored, evaluated, and tested. The best place to do that is SQL (or a library that operates with SQL-like commands, see my article on Python’s Pandas library).

This series Data Mastery: SQL, will teach you the essential subjects. These are not exhaustive tutorials. Instead they are focused preparation guides — with brevity and efficiency in mind. It is meant for:

  • Software Engineers who want to analyze their creation’s data
  • Product Managers, Marketers, and others who want to be data-driven
  • Beginning Data Scientists, Data Engineers, Data Analysts, or Business Intelligence Developers preparing for interviews

See my resource list of the books I used to prepare for my big interview

Each article will contain a brief technical explanation of the subject, an example question, and an answer. Follow up articles will contain challenging questions so you can evaluate your learning.

This series does not come with accompanying data sets. The advantage to this is when you are on the drawing board, whether in an interview or project design, you do not have test data to play with. You have to think abstract.

Inner Join

A join matches up two tables around chosen columns that contain the same data. Then you can analyze data from both tables.

Recall our daily_user_score table from earlier in the series:

date       | userid    | sessionid | score
------------------------------------------
2018–09–01 | 983489272 | 125       | 112
2018–09–01 | 234342423 | 34        | 112
2018–09–01 | 567584329 | 207       | 618
2018–09–02 | 983489272 | 126       | 410
2018–09–02 | 983489272 | 127       | 339

Let’s introduce a new table, user_revenue:

userid    | revenue
--------------------
983489272 | 100
234342423 | 33
567584329 | 57
305852827 | 8
867736361 | 29

An INNER JOIN only returns data where the chosen columns match up exactly in both tables. You choose columns with the ON/AND keywords and a condition.

Say we want to see for each user who has played a game their total score and how much they paid our company.

Here is the query:

SELECT d.userid,
       SUM(d.score) AS total_score, 
       r.revenue
FROM daily_user_score AS d
INNER JOIN user_revenue AS r
ON d.userid = r.userid
GROUP BY d.userid, r.revenue;

And here are the results:

userid    | total_score | revenue
---------------------------------
234342423 | 112         | 33
567584329 | 618         | 57
983489272 | 861         | 100

Two things to note: It is good practice is to assign an alias to each table. Also notice the users in the results exist in both tables — this is the INNER JOIN.

Try it yourself

Find each user who has played a game, the most recent day they played, and how much money they paid.

Answer

SELECT d.userid, MAX(d.date) AS recent_date, r.revenue
FROM daily_user_score AS d
INNER JOIN user_revenue AS r
ON d.userid = r.userid
GROUP BY d.userid, r.revenue;

This query returns:

userid    | recent_date | revenue
---------------------------------
234342423 | 2018–09–01  | 33
567584329 | 2018–09–01  | 57
983489272 | 2018–09–02  | 100

Thanks for reading! If you have questions feel free to comment & I will try to get back to you.

Connect with me on Instagram @lauren__glass

Connect with me on LinkedIn

Check out my essentials list on Amazon

Search for me using my nametag on Instagram!


Published by HackerNoon on 2018/11/12