SQL Case Study Part 1

1. E-Commerce Sales Analysis

Scenario: You work for an online retail company that wants to analyze customer purchasing patterns.
Tasks:

  • Design a database to store customers, orders, products, and payments.
  • Find the top 5 best-selling products.
  • Identify customers who haven’t purchased anything in the last 6 months.
  • Calculate monthly revenue trends.

Schema:

CREATE TABLE customers (
    customer_id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    email VARCHAR(100)
);

CREATE TABLE products (
    product_id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    price DECIMAL(10,2)
);

CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INT REFERENCES customers(customer_id),
    order_date DATE
);

CREATE TABLE order_items (
    order_item_id SERIAL PRIMARY KEY,
    order_id INT REFERENCES orders(order_id),
    product_id INT REFERENCES products(product_id),
    quantity INT
);

Data

INSERT INTO customers (name, email) VALUES
('Alice Johnson', 'alice@example.com'),
('Bob Smith', 'bob@example.com'),
('Charlie Brown', 'charlie@example.com');

INSERT INTO products (name, price) VALUES
('Laptop', 1200.00),
('Smartphone', 800.00),
('Tablet', 400.00);

INSERT INTO orders (customer_id, order_date) VALUES
(1, '2024-01-10'),
(2, '2024-01-12'),
(3, '2024-02-15');

INSERT INTO order_items (order_id, product_id, quantity) VALUES
(1, 1, 2),
(1, 2, 1),
(2, 3, 5),
(3, 2, 3);

Queries:

1️⃣ Find the top 5 best-selling products:

SELECT p.name, SUM(oi.quantity) AS total_sold
FROM order_items oi
JOIN products p ON oi.product_id = p.product_id
GROUP BY p.name
ORDER BY total_sold DESC
LIMIT 5;
nametotal_sold
Tablet5
Smartphone4
Laptop2

2️⃣ Identify inactive customers (no purchases in the last 6 months):

SELECT c.name, c.email
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id 
      AND o.order_date >= CURRENT_DATE - INTERVAL '6 months'
WHERE o.order_id IS NULL;

OutPut

(Since all customers have orders within 6 months, no inactive customers.)

3️⃣ Calculate monthly revenue:

SELECT DATE_TRUNC('month', o.order_date) AS month, SUM(p.price * oi.quantity) AS total_revenue
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
GROUP BY month
ORDER BY month;
monthtotal_revenue
2024-01-014400.00
2024-02-012400.00

2. Hospital Management System

Scenario: A hospital wants to track patient visits, doctor appointments, and medical prescriptions.
Tasks:

  • Design tables for patients, doctors, appointments, prescriptions, and treatments.
  • Retrieve all appointments for a specific doctor in a given week.
  • Find the most common diagnosis in the past year.
  • Identify patients who have missed more than 3 appointments.

Schema:

CREATE TABLE patients (
    patient_id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    date_of_birth DATE
);

CREATE TABLE doctors (
    doctor_id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    specialty VARCHAR(100)
);

CREATE TABLE appointments (
    appointment_id SERIAL PRIMARY KEY,
    patient_id INT REFERENCES patients(patient_id),
    doctor_id INT REFERENCES doctors(doctor_id),
    appointment_date TIMESTAMP
);

Data

INSERT INTO patients (name, date_of_birth) VALUES
('John Doe', '1980-05-20'),
('Jane Smith', '1992-08-15'),
('Michael Johnson', '1975-12-10');

INSERT INTO doctors (name, specialty) VALUES
('Dr. Wilson', 'Cardiology'),
('Dr. Adams', 'Neurology'),
('Dr. Brown', 'General Medicine');

INSERT INTO appointments (patient_id, doctor_id, appointment_date) VALUES
(1, 1, '2024-02-10 10:00:00'),
(2, 2, '2024-02-12 11:30:00'),
(3, 3, '2024-02-15 09:00:00');

Queries:

1️⃣ Retrieve all appointments for a doctor in a given week:

SELECT p.name, a.appointment_date
FROM appointments a
JOIN patients p ON a.patient_id = p.patient_id
WHERE a.doctor_id = 1
AND a.appointment_date BETWEEN CURRENT_DATE AND CURRENT_DATE + INTERVAL '7 days'
ORDER BY a.appointment_date;

OutPut

nameappointment_date
John Doe2024-02-10 10:00:00

2️⃣ Find the most common diagnosis:

SELECT diagnosis, COUNT(*) AS count
FROM medical_records
GROUP BY diagnosis
ORDER BY count DESC
LIMIT 1;

3️⃣ Identify patients who missed more than 3 appointments:

SELECT p.name, COUNT(a.appointment_id) AS missed_appointments
FROM appointments a
JOIN patients p ON a.patient_id = p.patient_id
WHERE a.status = 'Missed'
GROUP BY p.name
HAVING COUNT(a.appointment_id) > 3;

3. Banking Transaction Monitoring

Scenario: A bank wants to track customer transactions and detect potential fraud.
Tasks:

  • Create tables for customers, accounts, transactions, and branches.
  • Identify customers with unusually high withdrawals in a single day.
  • Find accounts with no transactions in the last year.
  • Determine the total deposits and withdrawals per branch.
CREATE TABLE customers (
    customer_id SERIAL PRIMARY KEY,
    name VARCHAR(100)
);

CREATE TABLE accounts (
    account_id SERIAL PRIMARY KEY,
    customer_id INT REFERENCES customers(customer_id),
    balance DECIMAL(15,2)
);

CREATE TABLE transactions (
    transaction_id SERIAL PRIMARY KEY,
    account_id INT REFERENCES accounts(account_id),
    transaction_date TIMESTAMP,
    amount DECIMAL(15,2),
    transaction_type VARCHAR(10) CHECK (transaction_type IN ('Deposit', 'Withdrawal'))
);

Data

INSERT INTO customers (name) VALUES
('Emma Watson'),
('John Wick'),
('Robert Downey');

INSERT INTO accounts (customer_id, balance) VALUES
(1, 5000.00),
(2, 15000.00),
(3, 7500.00);

INSERT INTO transactions (account_id, transaction_date, amount, transaction_type) VALUES
(1, '2024-02-01 14:30:00', 1000.00, 'Deposit'),
(2, '2024-02-03 09:15:00', 500.00, 'Withdrawal'),
(3, '2024-02-05 17:45:00', 3000.00, 'Deposit');

Query

1️⃣ Identify customers with unusually high withdrawals in a day:

SELECT account_id, transaction_date::DATE AS date, SUM(amount) AS total_withdrawn
FROM transactions
WHERE transaction_type = 'Withdrawal'
GROUP BY account_id, date
HAVING SUM(amount) > 10000;  -- Assuming $10,000 as the threshold

OutPut : (No Rows Returned)

2️⃣ Find inactive accounts:

SELECT a.account_id
FROM accounts a
LEFT JOIN transactions t ON a.account_id = t.account_id 
      AND t.transaction_date >= CURRENT_DATE - INTERVAL '1 year'
WHERE t.transaction_id IS NULL;

3️⃣ Total deposits and withdrawals per branch:

SELECT branch_id, transaction_type, SUM(amount) AS total_amount
FROM transactions
GROUP BY branch_id, transaction_type;

4. Ride-Sharing Analytics (Uber/Lyft Model)

Scenario: A ride-sharing company wants to analyze trip data for better pricing and service optimization.
Tasks:

  • Design tables for drivers, riders, trips, and payments.
  • Calculate the average fare per mile in each city.
  • Identify the busiest times of the day for rides.
  • Find drivers with the highest number of completed trips.
CREATE TABLE drivers (
    driver_id SERIAL PRIMARY KEY,
    name VARCHAR(100)
);

CREATE TABLE riders (
    rider_id SERIAL PRIMARY KEY,
    name VARCHAR(100)
);

CREATE TABLE trips (
    trip_id SERIAL PRIMARY KEY,
    driver_id INT REFERENCES drivers(driver_id),
    rider_id INT REFERENCES riders(rider_id),
    fare DECIMAL(10,2),
    distance DECIMAL(5,2),
    trip_date TIMESTAMP
);

Data

INSERT INTO drivers (name) VALUES
('James Carter'),
('Sophia Green'),
('Liam Miller');

INSERT INTO riders (name) VALUES
('Ethan Taylor'),
('Olivia Brown'),
('Mason White');

INSERT INTO trips (driver_id, rider_id, fare, distance, trip_date) VALUES
(1, 1, 25.00, 10.5, '2024-02-01 08:30:00'),
(2, 2, 18.00, 7.2, '2024-02-01 09:00:00'),
(3, 3, 35.00, 15.0, '2024-02-01 10:30:00');

Query

1️⃣ Calculate the average fare per mile:

SELECT AVG(fare / distance) AS avg_fare_per_mile FROM trips WHERE distance > 0;
avg_fare_per_mile
2.14

2️⃣ Find the busiest times of the day:

SELECT EXTRACT(HOUR FROM trip_date) AS hour_of_day, COUNT(*) AS trip_count
FROM trips
GROUP BY hour_of_day
ORDER BY trip_count DESC
LIMIT 3;

OutPut

hour_of_daytrip_count
81
91
101

3️⃣ Find top 5 drivers:

SELECT d.name, COUNT(t.trip_id) AS trips_completed
FROM trips t
JOIN drivers d ON t.driver_id = d.driver_id
GROUP BY d.name
ORDER BY trips_completed DESC
LIMIT 5;

5. Movie Streaming Platform (Netflix Model)

Scenario: A streaming service wants to understand user preferences and engagement.
Tasks:

  • Create tables for users, movies, watch history, and subscriptions.
  • Determine the most-watched movie genre in the last 6 months.
  • Find users who started but didn’t finish a movie.
  • Identify users eligible for a subscription renewal discount.

Schema

CREATE TABLE users (
    user_id SERIAL PRIMARY KEY,
    name VARCHAR(100)
);

CREATE TABLE movies (
    movie_id SERIAL PRIMARY KEY,
    title VARCHAR(100),
    genre VARCHAR(50)
);

CREATE TABLE watch_history (
    history_id SERIAL PRIMARY KEY,
    user_id INT REFERENCES users(user_id),
    movie_id INT REFERENCES movies(movie_id),
    watch_date DATE,
    completed BOOLEAN
);

Data

INSERT INTO users (name) VALUES
('Sarah Johnson'),
('David Smith'),
('Emily Brown');

INSERT INTO movies (title, genre) VALUES
('Inception', 'Sci-Fi'),
('Titanic', 'Romance'),
('Avengers', 'Action');

INSERT INTO watch_history (user_id, movie_id, watch_date, completed) VALUES
(1, 1, '2024-01-15', TRUE),
(2, 2, '2024-01-16', FALSE),
(3, 3, '2024-01-17', TRUE);

Query

1️⃣ Most-watched genre:

SELECT m.genre, COUNT(*) AS views
FROM watch_history w
JOIN movies m ON w.movie_id = m.movie_id
GROUP BY m.genre
ORDER BY views DESC
LIMIT 1;

OutPut

genreviews
Sci-Fi1
Romance1
Action1

2️⃣ Find users who started but didn’t finish a movie:

SELECT u.name, m.title
FROM watch_history w
JOIN users u ON w.user_id = u.user_id
JOIN movies m ON w.movie_id = m.movie_id
WHERE w.completed = FALSE;

3️⃣ Identify users eligible for a discount (inactive users for 3+ months):

SELECT u.name, u.user_id
FROM users u
LEFT JOIN watch_history w ON u.user_id = w.user_id 
     AND w.watch_date >= CURRENT_DATE - INTERVAL '3 months'
WHERE w.history_id IS NULL;

6. HR Employee Performance Tracking

This case study focuses on tracking employee performance, including attendance, projects completed, and performance reviews.

1️⃣ Database Schema

CREATE TABLE employees (
    employee_id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    department VARCHAR(100),
    hire_date DATE
);

CREATE TABLE attendance (
    attendance_id SERIAL PRIMARY KEY,
    employee_id INT REFERENCES employees(employee_id),
    attendance_date DATE,
    status VARCHAR(10) CHECK (status IN ('Present', 'Absent', 'Late'))
);

CREATE TABLE projects (
    project_id SERIAL PRIMARY KEY,
    employee_id INT REFERENCES employees(employee_id),
    project_name VARCHAR(255),
    completion_date DATE
);

CREATE TABLE performance_reviews (
    review_id SERIAL PRIMARY KEY,
    employee_id INT REFERENCES employees(employee_id),
    review_date DATE,
    rating INT CHECK (rating BETWEEN 1 AND 5),
    comments TEXT
);

2️⃣ Sample Data

-- Insert employees
INSERT INTO employees (name, department, hire_date) VALUES
('Alice Johnson', 'IT', '2020-03-15'),
('Bob Smith', 'Marketing', '2019-07-20'),
('Charlie Brown', 'HR', '2021-06-01');

-- Insert attendance records
INSERT INTO attendance (employee_id, attendance_date, status) VALUES
(1, '2024-02-01', 'Present'),
(1, '2024-02-02', 'Absent'),
(2, '2024-02-01', 'Late'),
(2, '2024-02-02', 'Present'),
(3, '2024-02-01', 'Present'),
(3, '2024-02-02', 'Present');

-- Insert projects completed
INSERT INTO projects (employee_id, project_name, completion_date) VALUES
(1, 'Website Redesign', '2024-01-30'),
(2, 'Marketing Campaign Q1', '2024-02-01'),
(3, 'Employee Benefits Survey', '2024-01-25');

-- Insert performance reviews
INSERT INTO performance_reviews (employee_id, review_date, rating, comments) VALUES
(1, '2024-01-31', 4, 'Great technical skills, needs better communication.'),
(2, '2024-02-01', 5, 'Excellent performance in campaign strategy.'),
(3, '2024-01-29', 3, 'Good but needs improvement in team collaboration.');

3️⃣ Queries and Expected Output

1️⃣ Count Employee Attendance for the Last Month

namedays_present
Charlie Brown2
Alice Johnson1
Bob Smith1

2️⃣ Employees with the Most Completed Projects

SELECT e.name, COUNT(p.project_id) AS total_projects
FROM projects p
JOIN employees e ON p.employee_id = e.employee_id
GROUP BY e.name
ORDER BY total_projects DESC
LIMIT 3;
nametotal_projects
Alice Johnson1
Bob Smith1
Charlie Brown1

3️⃣ Employees with the Best Performance Ratings

SELECT e.name, AVG(pr.rating) AS avg_rating
FROM performance_reviews pr
JOIN employees e ON pr.employee_id = e.employee_id
GROUP BY e.name
ORDER BY avg_rating DESC
LIMIT 3;
nameavg_rating
Bob Smith5.0
Alice Johnson4.0
Charlie Brown3.0

4️⃣ Employees with Frequent Absences (More than 1 absence)

SELECT e.name, COUNT(a.attendance_id) AS absent_days
FROM attendance a
JOIN employees e ON a.employee_id = e.employee_id
WHERE a.status = 'Absent'
GROUP BY e.name
HAVING COUNT(a.attendance_id) > 1;

Output : (No Rows Returned)