technicalmedium
Given a dataset of customer transactions including `customer_id`, `product_id`, `transaction_date`, and `revenue`, write a SQL query to identify the top 5 customers by total revenue for each month in 2023. The output should include `month`, `customer_id`, and `total_revenue`.
technical screen · 10-15 minutes
How to structure your answer
Utilize a CTE-based approach for clarity and modularity. First, extract the month from transaction_date and calculate monthly_revenue per customer_id using GROUP BY. Second, apply the RANK() window function partitioned by month and ordered by monthly_revenue in descending order to assign a rank to each customer within their respective month. Finally, filter the results to include only customers with a rank of 5 or less, ensuring the output includes month, customer_id, and total_revenue for 2023. This MECE approach ensures all relevant data is processed and filtered efficiently.
Sample answer
WITH MonthlyCustomerRevenue AS (
SELECT
EXTRACT(MONTH FROM transaction_date) AS month,
customer_id,
SUM(revenue) AS total_revenue
FROM
customer_transactions
WHERE
EXTRACT(YEAR FROM transaction_date) = 2023
GROUP BY
EXTRACT(MONTH FROM transaction_date),
customer_id
),
RankedCustomerRevenue AS (
SELECT
month,
customer_id,
total_revenue,
RANK() OVER (PARTITION BY month ORDER BY total_revenue DESC) AS rnk
FROM
MonthlyCustomerRevenue
)
SELECT
month,
customer_id,
total_revenue
FROM
RankedCustomerRevenue
WHERE
rnk <= 5
ORDER BY
month, total_revenue DESC;
Key points to mention
- • Use of `STRFTIME` or equivalent date functions (`DATE_TRUNC`, `EXTRACT`) for month extraction.
- • Aggregation (`SUM`, `GROUP BY`) to calculate total revenue per customer per month.
- • Application of window functions (`ROW_NUMBER`, `RANK`, `DENSE_RANK`) for ranking within partitions.
- • Understanding of `PARTITION BY` in window functions to reset ranking per group (month).
- • Filtering (`WHERE`) to select the top N results after ranking.
Common mistakes to avoid
- ✗ Forgetting to `PARTITION BY month` in the window function, leading to a single global ranking instead of per-month ranking.
- ✗ Not filtering for the year 2023, resulting in data from all years.
- ✗ Using `GROUP BY` on `transaction_date` directly instead of extracting the month, which would group by specific dates, not months.
- ✗ Incorrectly using `RANK()` or `DENSE_RANK()` when `ROW_NUMBER()` is more appropriate for a strict 'top N' without ties being an issue (though `RANK()` would also work, it might return more than 5 if there are ties at the 5th position).
- ✗ Performance issues with very large datasets if not optimizing CTEs or subqueries.