How to structure your answer

Framework + step-by-step strategy (120-150 words, no story)

Sample answer

The function begins by validating inputs: ensuring the DataFrame contains the required columns and that the emission_factor mapping covers all facility_ids. Next, it converts the 'timestamp' column to datetime and extracts the date, grouping by ['facility_id','date'] to sum kWh. Using the mapping, it multiplies each facility's daily kWh by its emission factor to compute emissions in kg CO₂. The daily totals are then aggregated across facilities to produce an overall emissions column. A threshold parameter is applied to flag high‑emission days, returning a boolean mask. The implementation leverages Pandas vectorized operations and avoids explicit loops, ensuring scalability. Edge cases such as missing data or zero emission factors are handled gracefully with NaN checks and default values. The function returns a tidy DataFrame ready for downstream reporting.

Key points to mention

• vectorized operations
• input validation
• threshold logic
• performance considerations

Common mistakes to avoid

✗ Using for‑loops over rows
✗ Ignoring missing or NaN values
✗ Unit mismatch (kg vs metric tons)
✗ Not validating input schema