Ananya Kukreti
Gurugram, Haryana, India| 7895448416 | [email protected] | www.linkedin.com/in/ananya-kukreti
SUMMARY
High-performing senior data engineer with over 3 years of experience in creating, deploying, refining and scheduling data
pipelines and ETL processes. Successfully designed and optimized data pipelines leading to up to a 40% increase in processing
efficiency. Proficient in thriving in high-pressure environments, leveraging expertise in Big Data, Python, Spark, Hadoop, SQL
and shell scripting to drive team success.
TECHNICAL SKILLS:
Programming: Python, SQL, Shell Scripting
Big data: Spark, Pyspark, Hadoop, Hive
Cloud: Amazon MSK, AWS Glue, Amazon EMR, AWS Lambda, Amazon RDS
Others: CA7 mainframe job scheduling
CERTIFICATIONS: Azure AZ900 Certification
PROFESSIONAL EXPERIENCE
Incedo Inc. Gurugram, Haryana / India
Project 1 September 2022 – Present
Senior Data Engineer
Python| Hive | Hadoop | Spark | Shell Scripting|PySpark|
Working as a senior data engineer for a client that ranks among the top 5 banks in USA.
● Enhanced and streamlined ETL processes focusing on data ingestion, enrichment, and analysis to improve data
accuracy, and overall pipeline performance.
● Collaborated with Data Analysts and converted STMs prepared by them into optimized Pyspark code to build robust data
pipeline.
● Crafted code tailored for ingesting more than 20 TB of data from diverse data sources and enriching it.
● Engineered more than 50 features from ingested data which generated leads that were pivotal in identifying prospective
mortgage customers for the client.
● Led a team of 3 data engineers in automating processes associated with data enrichment and ingestion of multiple projects
for the client using CA7 job scheduling.
● Achieved a 40% reduction in data processing time and query response by optimizing and performance-tuning data
enrichment procedures.
● Contributed to the development of control and validation framework using python and spark for data quality checks,
leading to the identification and resolution of 10+ erroneous data abnormalities monthly.
Project 2 April 2021 – September 2022
Data Engineer
Python| SQL | Kafka | AWS
Worked for a client that is one of the top designers, manufacturers, and distributors of end-to-end networking, security and
connectivity products in USA to develop a pipeline that enables detecting anomalies in data gathered by an IoT device.
● Engineered an ETL process from the ground up using Python to efficiently analyze 500 GB of wired and wireless data
generated by the client's IoT devices.
● Utilised Amazon MSK for data ingestion from kafka and sending results back to Kafka topics, AWS glue and AWS EMR
for data processing and Amazon RDS for data storage.
● Refined program performance by leveraging multithreading in Python, achieving a 60-70% reduction in runtime and
significantly enhancing efficiency.
● Designed and implemented modules to collect real-time data from Kafka, analyzed baseline patterns and anomalies across
network protocols- TCP, UDP, and ICMP
● Pushed refined data to Kafka topics and MySQL tables which enabled data scientists to leverage it for visualization.
EDUCATION & OTHER
UNIVERSITY- Banasthali Vidyapith GRADUATION YEAR- 2021
Bachelor of Technology (Computer Science) CGPA – 7.8
LANGUAGES: Native- Hindi, Fluent- English, Elementary Proficiency- French
INTERESTS: Reading, listening to podcasts