Intern, Data Analytics
TD Bank Group
Built a Python pipeline for Oracle data extraction, statistical feature engineering, and multi-source data quality validation.
What I Did
I built a Python pipeline that queried Oracle records via parameterized SQL, applied Pandas and NumPy to extract statistical features and flag anomalous entries, and produced structured summaries for downstream analysis. I also ran null-rate, schema-drift, and distributional outlier checks across multi-source datasets to block malformed data from entering downstream systems.
Impact
The pipeline automated data extraction and feature computation that had previously been done manually. The quality checks blocked malformed data from reaching downstream systems.
What I Learned
I gained experience with parameterized SQL queries against Oracle, Pandas and NumPy for statistical feature extraction, and designing data quality checks for multi-source pipelines including null-rate, schema-drift, and distributional outlier detection.
Key Highlights
Built a Python pipeline querying Oracle records via parameterized SQL, applied Pandas and NumPy to extract statistical features and flag anomalous entries, and produced structured summaries for downstream analysis.
Ran null-rate, schema-drift, and distributional outlier checks across multi-source datasets to block malformed data.