I have done a number of SQL-based projects as part of my school work, but due to academic integrity limitations (ie, not allowed to post my solutions), I cannot share these projects on my website. So what's the next best alternative? Solve a murder mystery using SQL of course! 🔪 All queries are written in SQLite, let's dive right in!
Step 0
Before I start querying the database, I want to get a better understanding of each of the table, particularly the PRIMARY KEY, FOREIGN KEY, and REFERENCES.
QUERY:
SELECT sql
FROM sqlite_master
RESULT:
It would also be helpful to take a look at the schema diagram to visually see how the tables are related to one another.
DIAGRAM:
Before I query the selected tables based on the criteria, I usually run below query on the table first to gain an idea of what the data in the table looks like so I can tailor my query to match the data in the table.
QUERY:
SELECT *
FROM table_name
LIMIT 5
Step 1
Now I'm ready to get started! I know that 'the crime was a murder that occurred sometime on Jan.15, 2018 and that it took place in SQL City', so I first queried the crime_scene_report table to find the relevant report.
QUERY:
SELECT *
FROM crime_scene_report
WHERE date = 20180115 AND city = 'SQL City' AND type = 'murder'
RESULT:
Based on the description of the crime report, I know there are two witnesses. I have the first name of one of the witness and the street address of both witnesses.
Step 2
Let's start with the witness named Annabel first by querying the person table based on the street name and the first name.
QUERY:
SELECT *
FROM person
WHERE address_street_name = 'Franklin Ave'
AND name LIKE 'Annabel%'
RESULT:
Step 3
I can then use Annabel's id from the person table to find the interview transcript from the interview table. For this query, I used a subquery within the main query to directly get Annabel's id from the person table.
QUERY:
SELECT *
FROM interview
WHERE person_id = (SELECT id
FROM person
WHERE address_street_name = 'Franklin Ave'
AND name LIKE 'Annabel%')
RESULT:
Looks like Annabel identified the killer as someone from her gym and the killer went to the gym on January 9th.
Step 4
Since the get_fit_now_check_in table does not include additional member information, I joined the get_fit_now_check_in table with the get_fit_now_member table to more easily identify the members that used the gym on January 9th.
QUERY:
SELECT *
FROM get_fit_now_check_in AS ci
JOIN get_fit_now_member AS mbr
ON ci.membership_id = mbr.id
WHERE ci.check_in_date = 20180109
RESULT:
We can see that Annabel indeed went to the gym on January 9th, as she claimed in the interview. The suspects would be the gym members that attended gym during the same period as Annabel.
Step 5
To find the suspects that attended the gym during the same period as Annabel, I added on to the query from previous step to filter by the check_in_time and check_out_time. For the time period to overlap, they must have left the gym after Annabel entered and entered the gym before Annabel left. Since the tables are relatively small in this example, we can visually inspect the result from the previous query and confirm the result is reasonable. To make the query more readable, I try to use common table expressions (CTEs) for most of the queries that involved more than one subquery.
QUERY:
WITH gym_named AS (
SELECT *
FROM get_fit_now_check_in AS ci
JOIN get_fit_now_member AS mbr
ON ci.membership_id = mbr.id
WHERE ci.check_in_date = 20180109
),
annabel AS (
SELECT *
FROM gym_named
WHERE check_in_date = 20180109
AND name LIKE 'Annabel%'
)
SELECT *
FROM gym_named
WHERE check_out_time >= (SELECT check_in_time FROM annabel)
AND check_in_time <= (SELECT check_out_time FROM annabel)
RESULT:
We can identify two suspects, Joe and Jeremy.
Step 6
Before moving on to finding out more information about the two suspects, let's identify the other witness and find out what the other witness has to say about the case first. I know the witness lives at the last house of Northwestern Dr so assuming the house numbers are in increasing order, we can find out the witness from the person table.
QUERY:
SELECT *
FROM person
WHERE address_number = (SELECT MAX(address_number)
FROM person
WHERE address_street_name = 'Northwestern Dr')
AND address_street_name = 'Northwestern Dr'
RESULT:
We now know who the other witness is, we can find out what this witness said in their interview.
Step 7
Similar to what was done for Annabel, I queried the interview table to find the interview transcript for Morty. Now we have two nested subqueries which makes the query a little hard to read, but since the logic here is pretty simple, I just left it as is.
QUERY:
SELECT *
FROM interview
WHERE person_id = (SELECT id
FROM person
WHERE address_number = (SELECT MAX(address_number)
FROM person
WHERE address_street_name = 'Northwestern Dr')
AND address_street_name = 'Northwestern Dr')
RESULT:
From the transcript, we can confirm that the witness goes to the gym and is a gold member. We also know his membership number and his license plate number.
Step 8
Let's now cross check the information Morty provided with the information we found out based on what Annabel provided.
QUERY:
WITH gym_named AS (
SELECT *
FROM get_fit_now_check_in AS ci
JOIN get_fit_now_member AS mbr
ON ci.membership_id = mbr.id
WHERE ci.check_in_date = 20180109
),
annabel AS (
SELECT *
FROM gym_named
WHERE check_in_date = 20180109 AND name LIKE 'Annabel%'
)
SELECT *
FROM gym_named
WHERE check_out_time >= (SELECT check_in_time FROM annabel)
AND check_in_time <= (SELECT check_out_time FROM annabel)
AND membership_status = 'gold'
AND membership_id LIKE '48Z%'
RESULT:
Unfortunately, knowing the gym information did not help us narrow down the suspects as both Joe and Jeremy are gold members with membership_id starting with 48Z. But at least we now know the Annabel is not a suspect.
Step 9
Luckily, Morty gave us one more clue, which is the suspect's car license plate. We can find the license_id from the person table and then find the plate_number from the drivers_license table. Since the drivers_license table does not include the name of the person, I join the drivers_license table to the person table to create the dl_named table to easily identify the name of the killer.
QUERY:
WITH gym_named AS (
SELECT *
FROM get_fit_now_check_in AS ci
JOIN get_fit_now_member AS mbr
ON ci.membership_id = mbr.id
WHERE ci.check_in_date = 20180109
),
annabel AS (
SELECT *
FROM gym_named
WHERE check_in_date = 20180109
AND name LIKE 'Annabel%'
),
gym_suspects AS (
SELECT *
FROM gym_named
WHERE check_out_time >= (SELECT check_in_time FROM annabel)
AND check_in_time <= (SELECT check_out_time FROM annabel)
AND membership_status = 'gold'
AND membership_id LIKE '48Z%'
),
dl_named AS (
SELECT ps.id AS person_id, ps.name, ps.license_id, ps.ssn, dl.*
FROM person AS ps
JOIN drivers_license AS dl
ON ps.license_id = dl.id
)
SELECT *
FROM dl_named
WHERE person_id IN (SELECT person_id
FROM gym_suspects)
AND plate_number LIKE '%H42W%'
RESULT:
Voila!
Step 10
After finding out the killer, I used the provided script to check my answers, and sure enough, I had correctly identified the killer. But wait there is more!
RESULT:
So I made a small change to the previous query to find the transcript from the killer.
QUERY:
WITH gym_named AS (
SELECT *
FROM get_fit_now_check_in AS ci
JOIN get_fit_now_member AS mbr
ON ci.membership_id = mbr.id
WHERE ci.check_in_date = 20180109
),
annabel AS (
SELECT *
FROM gym_named
WHERE check_in_date = 20180109
AND name LIKE 'Annabel%'
),
gym_suspects AS (
SELECT *
FROM gym_named
WHERE check_out_time >= (SELECT check_in_time FROM annabel)
AND check_in_time <= (SELECT check_out_time FROM annabel)
AND membership_status = 'gold'
AND membership_id LIKE '48Z%'
),
dl_named AS (
SELECT ps.id AS person_id, ps.name, ps.license_id, ps.ssn, dl.*
FROM person AS ps
JOIN drivers_license AS dl
ON ps.license_id = dl.id
),
killer AS (
SELECT *
FROM dl_named
WHERE person_id IN (SELECT person_id
FROM gym_suspects)
AND plate_number LIKE '%H42W%'
)
SELECT *
FROM interview
WHERE person_id = (SELECT person_id
FROM killer)
RESULT:
Evidently, this was a hired murder! 😲 The killer's transcript provided very detailed description of the woman that hired him, so let's try to find out who the 'real' killer is!
Step 11
We know that we are looking for someone who is 1) female, 2) has red hair, 3) drives Tesla Model S, 4) is around 65" - 67" tall, and attended the SQL Symphony Concert 3 times in December 2017.
I first identified the suspects from the drivers_license table matching the physical and car description of the suspect, joining the drivers_license table with the person table so the person_id can be easily queried and the name can be easily identified.
I then identified the relevant concerts from the facebook_event_checkin table, grouped the table by the person_id to find the number of attendance for each person to identify suspects that have attended the event 3 times.
Finally, the 'real' killer is the only suspect in both the dl_suspects and concert_suspects subquery tables.
QUERY:
WITH dl_suspects AS (
SELECT ps.id AS person_id, ps.name, ps.license_id, ps.ssn, dl.*
FROM person AS ps
JOIN drivers_license AS dl
ON ps.license_id = dl.id
WHERE dl.gender = 'female'
AND dl.hair_color = 'red'
AND dl.car_make = 'Tesla'
AND dl.car_model = 'Model S'
AND 65 <= dl.height <= 67
),
concert AS (
SELECT *
FROM facebook_event_checkin
WHERE 20171201 <= date <= 20171231
AND event_name LIKE '%SQL Symphony Concert%'
),
concert_suspects AS (
SELECT person_id, COUNT(*) AS num_attendance
FROM concert
GROUP BY person_id
HAVING num_attendance = 3
)
SELECT *
FROM dl_suspects
WHERE person_id IN (SELECT person_id
FROM concert_suspects)
RESULT:
At last! 🍾🍾🍾
That was really fun! Of course, the dataset used in this project was very small and quite easy to work through, real life data would be much larger and much messier!
Also see my GitHub for the full list of SQL commands used.