SQL Case Study Part 1

1. E-Commerce Sales Analysis

Scenario: You work for an online retail company that wants to analyze customer purchasing patterns.
Tasks:

Design a database to store customers, orders, products, and payments.
Find the top 5 best-selling products.
Identify customers who haven’t purchased anything in the last 6 months.
Calculate monthly revenue trends.

Schema:

CREATE TABLE customers (
    customer_id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    email VARCHAR(100)
);

CREATE TABLE products (
    product_id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    price DECIMAL(10,2)
);

CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INT REFERENCES customers(customer_id),
    order_date DATE
);

CREATE TABLE order_items (
    order_item_id SERIAL PRIMARY KEY,
    order_id INT REFERENCES orders(order_id),
    product_id INT REFERENCES products(product_id),
    quantity INT
);

Data

INSERT INTO customers (name, email) VALUES
('Alice Johnson', 'alice@example.com'),
('Bob Smith', 'bob@example.com'),
('Charlie Brown', 'charlie@example.com');

INSERT INTO products (name, price) VALUES
('Laptop', 1200.00),
('Smartphone', 800.00),
('Tablet', 400.00);

INSERT INTO orders (customer_id, order_date) VALUES
(1, '2024-01-10'),
(2, '2024-01-12'),
(3, '2024-02-15');

INSERT INTO order_items (order_id, product_id, quantity) VALUES
(1, 1, 2),
(1, 2, 1),
(2, 3, 5),
(3, 2, 3);

Queries:

1️⃣ Find the top 5 best-selling products:

SELECT p.name, SUM(oi.quantity) AS total_sold
FROM order_items oi
JOIN products p ON oi.product_id = p.product_id
GROUP BY p.name
ORDER BY total_sold DESC
LIMIT 5;

name	total_sold
Tablet	5
Smartphone	4
Laptop	2

2️⃣ Identify inactive customers (no purchases in the last 6 months):

SELECT c.name, c.email
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id 
      AND o.order_date >= CURRENT_DATE - INTERVAL '6 months'
WHERE o.order_id IS NULL;

OutPut

(Since all customers have orders within 6 months, no inactive customers.)

3️⃣ Calculate monthly revenue:

SELECT DATE_TRUNC('month', o.order_date) AS month, SUM(p.price * oi.quantity) AS total_revenue
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
GROUP BY month
ORDER BY month;

month	total_revenue
2024-01-01	4400.00
2024-02-01	2400.00

2. Hospital Management System

Scenario: A hospital wants to track patient visits, doctor appointments, and medical prescriptions.
Tasks:

Design tables for patients, doctors, appointments, prescriptions, and treatments.
Retrieve all appointments for a specific doctor in a given week.
Find the most common diagnosis in the past year.
Identify patients who have missed more than 3 appointments.

Schema:

CREATE TABLE patients (
    patient_id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    date_of_birth DATE
);

CREATE TABLE doctors (
    doctor_id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    specialty VARCHAR(100)
);

CREATE TABLE appointments (
    appointment_id SERIAL PRIMARY KEY,
    patient_id INT REFERENCES patients(patient_id),
    doctor_id INT REFERENCES doctors(doctor_id),
    appointment_date TIMESTAMP
);

Data

INSERT INTO patients (name, date_of_birth) VALUES
('John Doe', '1980-05-20'),
('Jane Smith', '1992-08-15'),
('Michael Johnson', '1975-12-10');

INSERT INTO doctors (name, specialty) VALUES
('Dr. Wilson', 'Cardiology'),
('Dr. Adams', 'Neurology'),
('Dr. Brown', 'General Medicine');

INSERT INTO appointments (patient_id, doctor_id, appointment_date) VALUES
(1, 1, '2024-02-10 10:00:00'),
(2, 2, '2024-02-12 11:30:00'),
(3, 3, '2024-02-15 09:00:00');

Queries:

1️⃣ Retrieve all appointments for a doctor in a given week:

SELECT p.name, a.appointment_date
FROM appointments a
JOIN patients p ON a.patient_id = p.patient_id
WHERE a.doctor_id = 1
AND a.appointment_date BETWEEN CURRENT_DATE AND CURRENT_DATE + INTERVAL '7 days'
ORDER BY a.appointment_date;

OutPut

name	appointment_date
John Doe	2024-02-10 10:00:00

2️⃣ Find the most common diagnosis:

SELECT diagnosis, COUNT(*) AS count
FROM medical_records
GROUP BY diagnosis
ORDER BY count DESC
LIMIT 1;

3️⃣ Identify patients who missed more than 3 appointments:

SELECT p.name, COUNT(a.appointment_id) AS missed_appointments
FROM appointments a
JOIN patients p ON a.patient_id = p.patient_id
WHERE a.status = 'Missed'
GROUP BY p.name
HAVING COUNT(a.appointment_id) > 3;

3. Banking Transaction Monitoring

Scenario: A bank wants to track customer transactions and detect potential fraud.
Tasks:

Create tables for customers, accounts, transactions, and branches.
Identify customers with unusually high withdrawals in a single day.
Find accounts with no transactions in the last year.
Determine the total deposits and withdrawals per branch.

CREATE TABLE customers (
    customer_id SERIAL PRIMARY KEY,
    name VARCHAR(100)
);

CREATE TABLE accounts (
    account_id SERIAL PRIMARY KEY,
    customer_id INT REFERENCES customers(customer_id),
    balance DECIMAL(15,2)
);

CREATE TABLE transactions (
    transaction_id SERIAL PRIMARY KEY,
    account_id INT REFERENCES accounts(account_id),
    transaction_date TIMESTAMP,
    amount DECIMAL(15,2),
    transaction_type VARCHAR(10) CHECK (transaction_type IN ('Deposit', 'Withdrawal'))
);

Data

INSERT INTO customers (name) VALUES
('Emma Watson'),
('John Wick'),
('Robert Downey');

INSERT INTO accounts (customer_id, balance) VALUES
(1, 5000.00),
(2, 15000.00),
(3, 7500.00);

INSERT INTO transactions (account_id, transaction_date, amount, transaction_type) VALUES
(1, '2024-02-01 14:30:00', 1000.00, 'Deposit'),
(2, '2024-02-03 09:15:00', 500.00, 'Withdrawal'),
(3, '2024-02-05 17:45:00', 3000.00, 'Deposit');

Query

1️⃣ Identify customers with unusually high withdrawals in a day:

SELECT account_id, transaction_date::DATE AS date, SUM(amount) AS total_withdrawn
FROM transactions
WHERE transaction_type = 'Withdrawal'
GROUP BY account_id, date
HAVING SUM(amount) > 10000;  -- Assuming $10,000 as the threshold

OutPut : (No Rows Returned)

2️⃣ Find inactive accounts:

SELECT a.account_id
FROM accounts a
LEFT JOIN transactions t ON a.account_id = t.account_id 
      AND t.transaction_date >= CURRENT_DATE - INTERVAL '1 year'
WHERE t.transaction_id IS NULL;

3️⃣ Total deposits and withdrawals per branch:

SELECT branch_id, transaction_type, SUM(amount) AS total_amount
FROM transactions
GROUP BY branch_id, transaction_type;

4. Ride-Sharing Analytics (Uber/Lyft Model)

Scenario: A ride-sharing company wants to analyze trip data for better pricing and service optimization.
Tasks:

Design tables for drivers, riders, trips, and payments.
Calculate the average fare per mile in each city.
Identify the busiest times of the day for rides.
Find drivers with the highest number of completed trips.

CREATE TABLE drivers (
    driver_id SERIAL PRIMARY KEY,
    name VARCHAR(100)
);

CREATE TABLE riders (
    rider_id SERIAL PRIMARY KEY,
    name VARCHAR(100)
);

CREATE TABLE trips (
    trip_id SERIAL PRIMARY KEY,
    driver_id INT REFERENCES drivers(driver_id),
    rider_id INT REFERENCES riders(rider_id),
    fare DECIMAL(10,2),
    distance DECIMAL(5,2),
    trip_date TIMESTAMP
);

Data

INSERT INTO drivers (name) VALUES
('James Carter'),
('Sophia Green'),
('Liam Miller');

INSERT INTO riders (name) VALUES
('Ethan Taylor'),
('Olivia Brown'),
('Mason White');

INSERT INTO trips (driver_id, rider_id, fare, distance, trip_date) VALUES
(1, 1, 25.00, 10.5, '2024-02-01 08:30:00'),
(2, 2, 18.00, 7.2, '2024-02-01 09:00:00'),
(3, 3, 35.00, 15.0, '2024-02-01 10:30:00');

Query

1️⃣ Calculate the average fare per mile:

SELECT AVG(fare / distance) AS avg_fare_per_mile FROM trips WHERE distance > 0;

avg_fare_per_mile

2.14

2️⃣ Find the busiest times of the day:

SELECT EXTRACT(HOUR FROM trip_date) AS hour_of_day, COUNT(*) AS trip_count
FROM trips
GROUP BY hour_of_day
ORDER BY trip_count DESC
LIMIT 3;

OutPut

hour_of_day	trip_count
8	1
9	1
10	1

3️⃣ Find top 5 drivers:

SELECT d.name, COUNT(t.trip_id) AS trips_completed
FROM trips t
JOIN drivers d ON t.driver_id = d.driver_id
GROUP BY d.name
ORDER BY trips_completed DESC
LIMIT 5;

5. Movie Streaming Platform (Netflix Model)

Scenario: A streaming service wants to understand user preferences and engagement.
Tasks:

Create tables for users, movies, watch history, and subscriptions.
Determine the most-watched movie genre in the last 6 months.
Find users who started but didn’t finish a movie.
Identify users eligible for a subscription renewal discount.

Schema

CREATE TABLE users (
    user_id SERIAL PRIMARY KEY,
    name VARCHAR(100)
);

CREATE TABLE movies (
    movie_id SERIAL PRIMARY KEY,
    title VARCHAR(100),
    genre VARCHAR(50)
);

CREATE TABLE watch_history (
    history_id SERIAL PRIMARY KEY,
    user_id INT REFERENCES users(user_id),
    movie_id INT REFERENCES movies(movie_id),
    watch_date DATE,
    completed BOOLEAN
);

Data

INSERT INTO users (name) VALUES
('Sarah Johnson'),
('David Smith'),
('Emily Brown');

INSERT INTO movies (title, genre) VALUES
('Inception', 'Sci-Fi'),
('Titanic', 'Romance'),
('Avengers', 'Action');

INSERT INTO watch_history (user_id, movie_id, watch_date, completed) VALUES
(1, 1, '2024-01-15', TRUE),
(2, 2, '2024-01-16', FALSE),
(3, 3, '2024-01-17', TRUE);

Query

1️⃣ Most-watched genre:

SELECT m.genre, COUNT(*) AS views
FROM watch_history w
JOIN movies m ON w.movie_id = m.movie_id
GROUP BY m.genre
ORDER BY views DESC
LIMIT 1;

OutPut

genre	views
Sci-Fi	1
Romance	1
Action	1

2️⃣ Find users who started but didn’t finish a movie:

SELECT u.name, m.title
FROM watch_history w
JOIN users u ON w.user_id = u.user_id
JOIN movies m ON w.movie_id = m.movie_id
WHERE w.completed = FALSE;

3️⃣ Identify users eligible for a discount (inactive users for 3+ months):

SELECT u.name, u.user_id
FROM users u
LEFT JOIN watch_history w ON u.user_id = w.user_id 
     AND w.watch_date >= CURRENT_DATE - INTERVAL '3 months'
WHERE w.history_id IS NULL;

6. HR Employee Performance Tracking

This case study focuses on tracking employee performance, including attendance, projects completed, and performance reviews.

1️⃣ Database Schema

CREATE TABLE employees (
    employee_id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    department VARCHAR(100),
    hire_date DATE
);

CREATE TABLE attendance (
    attendance_id SERIAL PRIMARY KEY,
    employee_id INT REFERENCES employees(employee_id),
    attendance_date DATE,
    status VARCHAR(10) CHECK (status IN ('Present', 'Absent', 'Late'))
);

CREATE TABLE projects (
    project_id SERIAL PRIMARY KEY,
    employee_id INT REFERENCES employees(employee_id),
    project_name VARCHAR(255),
    completion_date DATE
);

CREATE TABLE performance_reviews (
    review_id SERIAL PRIMARY KEY,
    employee_id INT REFERENCES employees(employee_id),
    review_date DATE,
    rating INT CHECK (rating BETWEEN 1 AND 5),
    comments TEXT
);

2️⃣ Sample Data

-- Insert employees
INSERT INTO employees (name, department, hire_date) VALUES
('Alice Johnson', 'IT', '2020-03-15'),
('Bob Smith', 'Marketing', '2019-07-20'),
('Charlie Brown', 'HR', '2021-06-01');

-- Insert attendance records
INSERT INTO attendance (employee_id, attendance_date, status) VALUES
(1, '2024-02-01', 'Present'),
(1, '2024-02-02', 'Absent'),
(2, '2024-02-01', 'Late'),
(2, '2024-02-02', 'Present'),
(3, '2024-02-01', 'Present'),
(3, '2024-02-02', 'Present');

-- Insert projects completed
INSERT INTO projects (employee_id, project_name, completion_date) VALUES
(1, 'Website Redesign', '2024-01-30'),
(2, 'Marketing Campaign Q1', '2024-02-01'),
(3, 'Employee Benefits Survey', '2024-01-25');

-- Insert performance reviews
INSERT INTO performance_reviews (employee_id, review_date, rating, comments) VALUES
(1, '2024-01-31', 4, 'Great technical skills, needs better communication.'),
(2, '2024-02-01', 5, 'Excellent performance in campaign strategy.'),
(3, '2024-01-29', 3, 'Good but needs improvement in team collaboration.');

3️⃣ Queries and Expected Output

1️⃣ Count Employee Attendance for the Last Month

name	days_present
Charlie Brown	2
Alice Johnson	1
Bob Smith	1

2️⃣ Employees with the Most Completed Projects

SELECT e.name, COUNT(p.project_id) AS total_projects
FROM projects p
JOIN employees e ON p.employee_id = e.employee_id
GROUP BY e.name
ORDER BY total_projects DESC
LIMIT 3;

name	total_projects
Alice Johnson	1
Bob Smith	1
Charlie Brown	1

3️⃣ Employees with the Best Performance Ratings

SELECT e.name, AVG(pr.rating) AS avg_rating
FROM performance_reviews pr
JOIN employees e ON pr.employee_id = e.employee_id
GROUP BY e.name
ORDER BY avg_rating DESC
LIMIT 3;

name	avg_rating
Bob Smith	5.0
Alice Johnson	4.0
Charlie Brown	3.0

4️⃣ Employees with Frequent Absences (More than 1 absence)

SELECT e.name, COUNT(a.attendance_id) AS absent_days
FROM attendance a
JOIN employees e ON a.employee_id = e.employee_id
WHERE a.status = 'Absent'
GROUP BY e.name
HAVING COUNT(a.attendance_id) > 1;

Output : (No Rows Returned)