SQL Case Study Part 1
1. E-Commerce Sales Analysis
Scenario: You work for an online retail company that wants to analyze customer purchasing patterns.
Tasks:
- Design a database to store customers, orders, products, and payments.
- Find the top 5 best-selling products.
- Identify customers who haven’t purchased anything in the last 6 months.
- Calculate monthly revenue trends.
Schema:
CREATE TABLE customers (
customer_id SERIAL PRIMARY KEY,
name VARCHAR(100),
email VARCHAR(100)
);
CREATE TABLE products (
product_id SERIAL PRIMARY KEY,
name VARCHAR(100),
price DECIMAL(10,2)
);
CREATE TABLE orders (
order_id SERIAL PRIMARY KEY,
customer_id INT REFERENCES customers(customer_id),
order_date DATE
);
CREATE TABLE order_items (
order_item_id SERIAL PRIMARY KEY,
order_id INT REFERENCES orders(order_id),
product_id INT REFERENCES products(product_id),
quantity INT
);
Data
INSERT INTO customers (name, email) VALUES
('Alice Johnson', 'alice@example.com'),
('Bob Smith', 'bob@example.com'),
('Charlie Brown', 'charlie@example.com');
INSERT INTO products (name, price) VALUES
('Laptop', 1200.00),
('Smartphone', 800.00),
('Tablet', 400.00);
INSERT INTO orders (customer_id, order_date) VALUES
(1, '2024-01-10'),
(2, '2024-01-12'),
(3, '2024-02-15');
INSERT INTO order_items (order_id, product_id, quantity) VALUES
(1, 1, 2),
(1, 2, 1),
(2, 3, 5),
(3, 2, 3);
Queries:
1️⃣ Find the top 5 best-selling products:
SELECT p.name, SUM(oi.quantity) AS total_sold
FROM order_items oi
JOIN products p ON oi.product_id = p.product_id
GROUP BY p.name
ORDER BY total_sold DESC
LIMIT 5;
name | total_sold |
Tablet | 5 |
Smartphone | 4 |
Laptop | 2 |
2️⃣ Identify inactive customers (no purchases in the last 6 months):
SELECT c.name, c.email
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
AND o.order_date >= CURRENT_DATE - INTERVAL '6 months'
WHERE o.order_id IS NULL;
OutPut
(Since all customers have orders within 6 months, no inactive customers.)
3️⃣ Calculate monthly revenue:
SELECT DATE_TRUNC('month', o.order_date) AS month, SUM(p.price * oi.quantity) AS total_revenue
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
GROUP BY month
ORDER BY month;
month | total_revenue |
2024-01-01 | 4400.00 |
2024-02-01 | 2400.00 |
2. Hospital Management System
Scenario: A hospital wants to track patient visits, doctor appointments, and medical prescriptions.
Tasks:
- Design tables for patients, doctors, appointments, prescriptions, and treatments.
- Retrieve all appointments for a specific doctor in a given week.
- Find the most common diagnosis in the past year.
- Identify patients who have missed more than 3 appointments.
Schema:
CREATE TABLE patients (
patient_id SERIAL PRIMARY KEY,
name VARCHAR(100),
date_of_birth DATE
);
CREATE TABLE doctors (
doctor_id SERIAL PRIMARY KEY,
name VARCHAR(100),
specialty VARCHAR(100)
);
CREATE TABLE appointments (
appointment_id SERIAL PRIMARY KEY,
patient_id INT REFERENCES patients(patient_id),
doctor_id INT REFERENCES doctors(doctor_id),
appointment_date TIMESTAMP
);
Data
INSERT INTO patients (name, date_of_birth) VALUES
('John Doe', '1980-05-20'),
('Jane Smith', '1992-08-15'),
('Michael Johnson', '1975-12-10');
INSERT INTO doctors (name, specialty) VALUES
('Dr. Wilson', 'Cardiology'),
('Dr. Adams', 'Neurology'),
('Dr. Brown', 'General Medicine');
INSERT INTO appointments (patient_id, doctor_id, appointment_date) VALUES
(1, 1, '2024-02-10 10:00:00'),
(2, 2, '2024-02-12 11:30:00'),
(3, 3, '2024-02-15 09:00:00');
Queries:
1️⃣ Retrieve all appointments for a doctor in a given week:
SELECT p.name, a.appointment_date
FROM appointments a
JOIN patients p ON a.patient_id = p.patient_id
WHERE a.doctor_id = 1
AND a.appointment_date BETWEEN CURRENT_DATE AND CURRENT_DATE + INTERVAL '7 days'
ORDER BY a.appointment_date;
OutPut
name | appointment_date |
John Doe | 2024-02-10 10:00:00 |
2️⃣ Find the most common diagnosis:
SELECT diagnosis, COUNT(*) AS count
FROM medical_records
GROUP BY diagnosis
ORDER BY count DESC
LIMIT 1;
3️⃣ Identify patients who missed more than 3 appointments:
SELECT p.name, COUNT(a.appointment_id) AS missed_appointments
FROM appointments a
JOIN patients p ON a.patient_id = p.patient_id
WHERE a.status = 'Missed'
GROUP BY p.name
HAVING COUNT(a.appointment_id) > 3;
3. Banking Transaction Monitoring
Scenario: A bank wants to track customer transactions and detect potential fraud.
Tasks:
- Create tables for customers, accounts, transactions, and branches.
- Identify customers with unusually high withdrawals in a single day.
- Find accounts with no transactions in the last year.
- Determine the total deposits and withdrawals per branch.
CREATE TABLE customers (
customer_id SERIAL PRIMARY KEY,
name VARCHAR(100)
);
CREATE TABLE accounts (
account_id SERIAL PRIMARY KEY,
customer_id INT REFERENCES customers(customer_id),
balance DECIMAL(15,2)
);
CREATE TABLE transactions (
transaction_id SERIAL PRIMARY KEY,
account_id INT REFERENCES accounts(account_id),
transaction_date TIMESTAMP,
amount DECIMAL(15,2),
transaction_type VARCHAR(10) CHECK (transaction_type IN ('Deposit', 'Withdrawal'))
);
Data
INSERT INTO customers (name) VALUES
('Emma Watson'),
('John Wick'),
('Robert Downey');
INSERT INTO accounts (customer_id, balance) VALUES
(1, 5000.00),
(2, 15000.00),
(3, 7500.00);
INSERT INTO transactions (account_id, transaction_date, amount, transaction_type) VALUES
(1, '2024-02-01 14:30:00', 1000.00, 'Deposit'),
(2, '2024-02-03 09:15:00', 500.00, 'Withdrawal'),
(3, '2024-02-05 17:45:00', 3000.00, 'Deposit');
Query
1️⃣ Identify customers with unusually high withdrawals in a day:
SELECT account_id, transaction_date::DATE AS date, SUM(amount) AS total_withdrawn
FROM transactions
WHERE transaction_type = 'Withdrawal'
GROUP BY account_id, date
HAVING SUM(amount) > 10000; -- Assuming $10,000 as the threshold
OutPut : (No Rows Returned)
2️⃣ Find inactive accounts:
SELECT a.account_id
FROM accounts a
LEFT JOIN transactions t ON a.account_id = t.account_id
AND t.transaction_date >= CURRENT_DATE - INTERVAL '1 year'
WHERE t.transaction_id IS NULL;
3️⃣ Total deposits and withdrawals per branch:
SELECT branch_id, transaction_type, SUM(amount) AS total_amount
FROM transactions
GROUP BY branch_id, transaction_type;
4. Ride-Sharing Analytics (Uber/Lyft Model)
Scenario: A ride-sharing company wants to analyze trip data for better pricing and service optimization.
Tasks:
- Design tables for drivers, riders, trips, and payments.
- Calculate the average fare per mile in each city.
- Identify the busiest times of the day for rides.
- Find drivers with the highest number of completed trips.
CREATE TABLE drivers (
driver_id SERIAL PRIMARY KEY,
name VARCHAR(100)
);
CREATE TABLE riders (
rider_id SERIAL PRIMARY KEY,
name VARCHAR(100)
);
CREATE TABLE trips (
trip_id SERIAL PRIMARY KEY,
driver_id INT REFERENCES drivers(driver_id),
rider_id INT REFERENCES riders(rider_id),
fare DECIMAL(10,2),
distance DECIMAL(5,2),
trip_date TIMESTAMP
);
Data
INSERT INTO drivers (name) VALUES
('James Carter'),
('Sophia Green'),
('Liam Miller');
INSERT INTO riders (name) VALUES
('Ethan Taylor'),
('Olivia Brown'),
('Mason White');
INSERT INTO trips (driver_id, rider_id, fare, distance, trip_date) VALUES
(1, 1, 25.00, 10.5, '2024-02-01 08:30:00'),
(2, 2, 18.00, 7.2, '2024-02-01 09:00:00'),
(3, 3, 35.00, 15.0, '2024-02-01 10:30:00');
Query
1️⃣ Calculate the average fare per mile:
SELECT AVG(fare / distance) AS avg_fare_per_mile FROM trips WHERE distance > 0;
avg_fare_per_mile |
2.14 |
2️⃣ Find the busiest times of the day:
SELECT EXTRACT(HOUR FROM trip_date) AS hour_of_day, COUNT(*) AS trip_count
FROM trips
GROUP BY hour_of_day
ORDER BY trip_count DESC
LIMIT 3;
OutPut
hour_of_day | trip_count |
8 | 1 |
9 | 1 |
10 | 1 |
3️⃣ Find top 5 drivers:
SELECT d.name, COUNT(t.trip_id) AS trips_completed
FROM trips t
JOIN drivers d ON t.driver_id = d.driver_id
GROUP BY d.name
ORDER BY trips_completed DESC
LIMIT 5;
5. Movie Streaming Platform (Netflix Model)
Scenario: A streaming service wants to understand user preferences and engagement.
Tasks:
- Create tables for users, movies, watch history, and subscriptions.
- Determine the most-watched movie genre in the last 6 months.
- Find users who started but didn’t finish a movie.
- Identify users eligible for a subscription renewal discount.
Schema
CREATE TABLE users (
user_id SERIAL PRIMARY KEY,
name VARCHAR(100)
);
CREATE TABLE movies (
movie_id SERIAL PRIMARY KEY,
title VARCHAR(100),
genre VARCHAR(50)
);
CREATE TABLE watch_history (
history_id SERIAL PRIMARY KEY,
user_id INT REFERENCES users(user_id),
movie_id INT REFERENCES movies(movie_id),
watch_date DATE,
completed BOOLEAN
);
Data
INSERT INTO users (name) VALUES
('Sarah Johnson'),
('David Smith'),
('Emily Brown');
INSERT INTO movies (title, genre) VALUES
('Inception', 'Sci-Fi'),
('Titanic', 'Romance'),
('Avengers', 'Action');
INSERT INTO watch_history (user_id, movie_id, watch_date, completed) VALUES
(1, 1, '2024-01-15', TRUE),
(2, 2, '2024-01-16', FALSE),
(3, 3, '2024-01-17', TRUE);
Query
1️⃣ Most-watched genre:
SELECT m.genre, COUNT(*) AS views
FROM watch_history w
JOIN movies m ON w.movie_id = m.movie_id
GROUP BY m.genre
ORDER BY views DESC
LIMIT 1;
OutPut
genre | views |
Sci-Fi | 1 |
Romance | 1 |
Action | 1 |
2️⃣ Find users who started but didn’t finish a movie:
SELECT u.name, m.title
FROM watch_history w
JOIN users u ON w.user_id = u.user_id
JOIN movies m ON w.movie_id = m.movie_id
WHERE w.completed = FALSE;
3️⃣ Identify users eligible for a discount (inactive users for 3+ months):
SELECT u.name, u.user_id
FROM users u
LEFT JOIN watch_history w ON u.user_id = w.user_id
AND w.watch_date >= CURRENT_DATE - INTERVAL '3 months'
WHERE w.history_id IS NULL;
6. HR Employee Performance Tracking
This case study focuses on tracking employee performance, including attendance, projects completed, and performance reviews.
1️⃣ Database Schema
CREATE TABLE employees (
employee_id SERIAL PRIMARY KEY,
name VARCHAR(100),
department VARCHAR(100),
hire_date DATE
);
CREATE TABLE attendance (
attendance_id SERIAL PRIMARY KEY,
employee_id INT REFERENCES employees(employee_id),
attendance_date DATE,
status VARCHAR(10) CHECK (status IN ('Present', 'Absent', 'Late'))
);
CREATE TABLE projects (
project_id SERIAL PRIMARY KEY,
employee_id INT REFERENCES employees(employee_id),
project_name VARCHAR(255),
completion_date DATE
);
CREATE TABLE performance_reviews (
review_id SERIAL PRIMARY KEY,
employee_id INT REFERENCES employees(employee_id),
review_date DATE,
rating INT CHECK (rating BETWEEN 1 AND 5),
comments TEXT
);
2️⃣ Sample Data
-- Insert employees
INSERT INTO employees (name, department, hire_date) VALUES
('Alice Johnson', 'IT', '2020-03-15'),
('Bob Smith', 'Marketing', '2019-07-20'),
('Charlie Brown', 'HR', '2021-06-01');
-- Insert attendance records
INSERT INTO attendance (employee_id, attendance_date, status) VALUES
(1, '2024-02-01', 'Present'),
(1, '2024-02-02', 'Absent'),
(2, '2024-02-01', 'Late'),
(2, '2024-02-02', 'Present'),
(3, '2024-02-01', 'Present'),
(3, '2024-02-02', 'Present');
-- Insert projects completed
INSERT INTO projects (employee_id, project_name, completion_date) VALUES
(1, 'Website Redesign', '2024-01-30'),
(2, 'Marketing Campaign Q1', '2024-02-01'),
(3, 'Employee Benefits Survey', '2024-01-25');
-- Insert performance reviews
INSERT INTO performance_reviews (employee_id, review_date, rating, comments) VALUES
(1, '2024-01-31', 4, 'Great technical skills, needs better communication.'),
(2, '2024-02-01', 5, 'Excellent performance in campaign strategy.'),
(3, '2024-01-29', 3, 'Good but needs improvement in team collaboration.');
3️⃣ Queries and Expected Output
1️⃣ Count Employee Attendance for the Last Month
name | days_present |
Charlie Brown | 2 |
Alice Johnson | 1 |
Bob Smith | 1 |
2️⃣ Employees with the Most Completed Projects
SELECT e.name, COUNT(p.project_id) AS total_projects
FROM projects p
JOIN employees e ON p.employee_id = e.employee_id
GROUP BY e.name
ORDER BY total_projects DESC
LIMIT 3;
name | total_projects |
Alice Johnson | 1 |
Bob Smith | 1 |
Charlie Brown | 1 |
3️⃣ Employees with the Best Performance Ratings
SELECT e.name, AVG(pr.rating) AS avg_rating
FROM performance_reviews pr
JOIN employees e ON pr.employee_id = e.employee_id
GROUP BY e.name
ORDER BY avg_rating DESC
LIMIT 3;
name | avg_rating |
Bob Smith | 5.0 |
Alice Johnson | 4.0 |
Charlie Brown | 3.0 |
4️⃣ Employees with Frequent Absences (More than 1 absence)
SELECT e.name, COUNT(a.attendance_id) AS absent_days
FROM attendance a
JOIN employees e ON a.employee_id = e.employee_id
WHERE a.status = 'Absent'
GROUP BY e.name
HAVING COUNT(a.attendance_id) > 1;
Output : (No Rows Returned)