🚀 AI-Powered Mock Interviews Launching Soon - Join the Waitlist for Early Access

technicalmedium

Write a Python function that accepts a Pandas DataFrame with columns ['facility_id','timestamp','kwh'] and a dictionary mapping facility_id to emission_factor (kg COâ‚‚/kWh). The function should return a DataFrame with daily total emissions per facility and overall, and flag days where emissions exceed a configurable threshold. Optimize for large datasets.

onsite · 3-5 minutes

How to structure your answer

Framework + step-by-step strategy (120-150 words, no story)

Sample answer

The function begins by validating inputs: ensuring the DataFrame contains the required columns and that the emission_factor mapping covers all facility_ids. Next, it converts the 'timestamp' column to datetime and extracts the date, grouping by ['facility_id','date'] to sum kWh. Using the mapping, it multiplies each facility's daily kWh by its emission factor to compute emissions in kg CO₂. The daily totals are then aggregated across facilities to produce an overall emissions column. A threshold parameter is applied to flag high‑emission days, returning a boolean mask. The implementation leverages Pandas vectorized operations and avoids explicit loops, ensuring scalability. Edge cases such as missing data or zero emission factors are handled gracefully with NaN checks and default values. The function returns a tidy DataFrame ready for downstream reporting.

Key points to mention

  • • vectorized operations
  • • input validation
  • • threshold logic
  • • performance considerations

Common mistakes to avoid

  • ✗ Using for‑loops over rows
  • ✗ Ignoring missing or NaN values
  • ✗ Unit mismatch (kg vs metric tons)
  • ✗ Not validating input schema