Experience

Intern, Data Analytics

TD Bank Group

Sep 2025Dec 2025Toronto, ON

Built a Python pipeline for Oracle data extraction, statistical feature engineering, and multi-source data quality validation.

What I Did

I built a Python pipeline that queried Oracle records via parameterized SQL, applied Pandas and NumPy to extract statistical features and flag anomalous entries, and produced structured summaries for downstream analysis. I also ran null-rate, schema-drift, and distributional outlier checks across multi-source datasets to block malformed data from entering downstream systems.

Impact

The pipeline automated data extraction and feature computation that had previously been done manually. The quality checks blocked malformed data from reaching downstream systems.

What I Learned

I gained experience with parameterized SQL queries against Oracle, Pandas and NumPy for statistical feature extraction, and designing data quality checks for multi-source pipelines including null-rate, schema-drift, and distributional outlier detection.

Key Highlights

  • Built a Python pipeline querying Oracle records via parameterized SQL, applied Pandas and NumPy to extract statistical features and flag anomalous entries, and produced structured summaries for downstream analysis.

  • Ran null-rate, schema-drift, and distributional outlier checks across multi-source datasets to block malformed data.

Tech Stack

PythonOracleSQLPandasNumPyData Quality

Tags

industrydata-engineering

Command Palette

Search for a command to run...