technicalmedium
Given a dataset of financial transactions in a CSV file, write a Python script that reads the data, calculates the total debits and credits for each account, and identifies any accounts where the debits do not equal the credits, outputting these discrepancies to a new CSV file.
technical screen · 15-20 minutes
How to structure your answer
MECE Framework: 1. Data Ingestion: Define function to read CSV using pandas, specifying columns like 'Account', 'Debit', 'Credit'. 2. Data Transformation: Group data by 'Account' and sum 'Debit' and 'Credit' columns. Calculate 'Net_Balance' (Debit - Credit). 3. Discrepancy Identification: Filter grouped data where 'Net_Balance' is not equal to zero. 4. Output Generation: Write the filtered discrepancies to a new CSV file, including 'Account', 'Total_Debit', 'Total_Credit', and 'Net_Balance'. 5. Error Handling: Implement try-except blocks for file operations and data type conversions.
Sample answer
import pandas as pd
def reconcile_transactions(input_csv_path, output_csv_path):
try:
df = pd.read_csv(input_csv_path)
# Ensure Debit and Credit columns are numeric, coercing errors
df['Debit'] = pd.to_numeric(df['Debit'], errors='coerce').fillna(0)
df['Credit'] = pd.to_numeric(df['Credit'], errors='coerce').fillna(0)
# Group by Account and calculate total debits and credits
account_summary = df.groupby('Account').agg(
Total_Debit=('Debit', 'sum'),
Total_Credit=('Credit', 'sum')
).reset_index()
# Calculate net balance and identify discrepancies
account_summary['Net_Balance'] = account_summary['Total_Debit'] - account_summary['Total_Credit']
discrepancies = account_summary[account_summary['Net_Balance'] != 0]
# Output discrepancies to a new CSV
if not discrepancies.empty:
discrepancies.to_csv(output_csv_path, index=False)
print(f"Discrepancies found and saved to '{output_csv_path}'")
else:
print("No discrepancies found.")
except FileNotFoundError:
print(f"Error: Input file '{input_csv_path}' not found.")
except Exception as e:
print(f"An unexpected error occurred: {e}")
# Example usage:
# reconcile_transactions('transactions.csv', 'discrepancies.csv')
Key points to mention
- • **Data Integrity & Reconciliation:** Emphasize the importance of ensuring debits equal credits in double-entry accounting and how this script aids in reconciliation.
- • **Error Handling:** Discuss potential issues like missing columns, incorrect data types (e.g., non-numeric debit/credit values), and how to handle them (e.g., `try-except` blocks, `pd.to_numeric`).
- • **Scalability:** Mention how `pandas` is well-suited for larger datasets compared to manual row-by-row processing.
- • **Modularity:** Suggest breaking down the script into functions (e.g., `read_transactions`, `calculate_balances`, `report_discrepancies`) for better readability and maintainability.
- • **Testing:** Briefly touch upon how one would test this script with sample data, including cases with and without discrepancies.
Common mistakes to avoid
- ✗ Not handling non-numeric debit/credit entries, leading to script crashes or incorrect calculations.
- ✗ Failing to consider edge cases, such as an account having only debits or only credits.
- ✗ Inefficient data processing for large files (e.g., iterating rows manually instead of using `groupby`).
- ✗ Incorrectly summing debits and credits (e.g., summing all values in a column without distinguishing between debit and credit types if not explicitly provided as separate columns).
- ✗ Overlooking the need for robust error handling for file I/O operations.