🚀 AI-Powered Mock Interviews Launching Soon - Join the Waitlist for Early Access

technicalmedium

You're given a dataset of daily stock prices for a portfolio of 100 stocks over the last year. Write Python code to calculate the daily percentage change for each stock and identify the top 5 stocks with the highest average daily percentage gain over the entire period.

technical screen · 10-15 minutes

How to structure your answer

Employ a MECE (Mutually Exclusive, Collectively Exhaustive) approach to data processing. First, load the dataset into a Pandas DataFrame, ensuring 'Date' is a datetime object and 'Stock_ID' is a categorical type. Second, pivot the DataFrame to have 'Date' as index and 'Stock_ID' as columns, with daily prices as values. Third, calculate the daily percentage change for each stock using the .pct_change() method, handling initial NaN values. Fourth, compute the average daily percentage gain for each stock, filtering for positive changes. Fifth, sort stocks by their average daily gain in descending order and select the top 5. Finally, present the identified top 5 stocks and their average daily percentage gains. This ensures all stocks are considered, and the calculation is precise.

Sample answer

To address this, I would use Python with the Pandas library for efficient data manipulation. First, I'd load the dataset, assuming it's in a CSV or similar format, into a Pandas DataFrame. The DataFrame would ideally have columns for 'Date', 'Stock_ID', and 'Price'. I'd then pivot the DataFrame so that 'Date' becomes the index and 'Stock_ID's become columns, with their corresponding daily prices as values. Next, I'd apply the .pct_change() method along the column axis to calculate the daily percentage change for each stock. This operation automatically handles the difference between consecutive days. After obtaining the daily percentage changes, I would calculate the mean of these changes for each stock, specifically focusing on positive gains. Finally, I'd sort these average gains in descending order and select the top 5 stocks. This approach is robust, scalable, and leverages Pandas' optimized functions for time-series analysis.

Key points to mention

  • • **Data Structure:** Emphasize using a pandas DataFrame where columns are stock tickers and rows are dates, or vice-versa, ensuring efficient vectorized operations.
  • • **Percentage Change Calculation:** Explain the formula: `((Current Price - Previous Price) / Previous Price) * 100` or simply `df.pct_change() * 100` for percentage.
  • • **Handling NaNs:** Discuss how `pct_change()` introduces `NaN` for the first entry and how to handle it (e.g., `dropna()`, `fillna(0)` or simply ignoring it for average calculation if the mean function handles NaNs).
  • • **Aggregation:** Clearly state the use of `mean()` to calculate the average daily gain for each stock.
  • • **Sorting and Selection:** Explain using `sort_values()` and `head()` to identify the top performers.
  • • **Scalability:** Briefly touch upon how this approach scales well for a larger number of stocks or longer time periods due to pandas' optimized C implementations.

Common mistakes to avoid

  • ✗ **Looping through rows/columns:** Inefficiently iterating through the DataFrame instead of using vectorized pandas operations (e.g., `df.apply()`, `df.pct_change()`).
  • ✗ **Incorrect percentage change formula:** Miscalculating the daily percentage change.
  • ✗ **Ignoring NaN values:** Not addressing the `NaN` generated by `pct_change()` which can skew average calculations.
  • ✗ **Using simple close price:** Not considering adjusted close prices for accurate historical performance, especially if dividends or stock splits occurred.
  • ✗ **Off-by-one errors:** Incorrectly aligning prices for percentage change calculation if not using built-in functions.