🚀 AI-Powered Mock Interviews Launching Soon - Join the Waitlist for Early Access

technicalmedium

Given a dataset of customer transactions including `customer_id`, `product_id`, `transaction_date`, and `revenue`, write a SQL query to identify the top 5 customers by total revenue for each month in 2023. The output should include `month`, `customer_id`, and `total_revenue`.

technical screen · 10-15 minutes

How to structure your answer

Utilize a CTE-based approach for clarity and modularity. First, extract the month from transaction_date and calculate monthly_revenue per customer_id using GROUP BY. Second, apply the RANK() window function partitioned by month and ordered by monthly_revenue in descending order to assign a rank to each customer within their respective month. Finally, filter the results to include only customers with a rank of 5 or less, ensuring the output includes month, customer_id, and total_revenue for 2023. This MECE approach ensures all relevant data is processed and filtered efficiently.

Sample answer

WITH MonthlyCustomerRevenue AS (
    SELECT
        EXTRACT(MONTH FROM transaction_date) AS month,
        customer_id,
        SUM(revenue) AS total_revenue
    FROM
        customer_transactions
    WHERE
        EXTRACT(YEAR FROM transaction_date) = 2023
    GROUP BY
        EXTRACT(MONTH FROM transaction_date),
        customer_id
),
RankedCustomerRevenue AS (
    SELECT
        month,
        customer_id,
        total_revenue,
        RANK() OVER (PARTITION BY month ORDER BY total_revenue DESC) AS rnk
    FROM
        MonthlyCustomerRevenue
)
SELECT
    month,
    customer_id,
    total_revenue
FROM
    RankedCustomerRevenue
WHERE
    rnk <= 5
ORDER BY
    month, total_revenue DESC;

Key points to mention

  • • Use of `STRFTIME` or equivalent date functions (`DATE_TRUNC`, `EXTRACT`) for month extraction.
  • • Aggregation (`SUM`, `GROUP BY`) to calculate total revenue per customer per month.
  • • Application of window functions (`ROW_NUMBER`, `RANK`, `DENSE_RANK`) for ranking within partitions.
  • • Understanding of `PARTITION BY` in window functions to reset ranking per group (month).
  • • Filtering (`WHERE`) to select the top N results after ranking.

Common mistakes to avoid

  • ✗ Forgetting to `PARTITION BY month` in the window function, leading to a single global ranking instead of per-month ranking.
  • ✗ Not filtering for the year 2023, resulting in data from all years.
  • ✗ Using `GROUP BY` on `transaction_date` directly instead of extracting the month, which would group by specific dates, not months.
  • ✗ Incorrectly using `RANK()` or `DENSE_RANK()` when `ROW_NUMBER()` is more appropriate for a strict 'top N' without ties being an issue (though `RANK()` would also work, it might return more than 5 if there are ties at the 5th position).
  • ✗ Performance issues with very large datasets if not optimizing CTEs or subqueries.