Data Engineer • AWS | PySpark | Real-Time Pipelines
Building production-grade PySpark ETLs on AWS Glue & EMR on EKS • Migrated mainframe/COBOL workloads for Vanguard
I'm a Data Engineer with production experience delivering high-scale, regulated financial workloads at TCS for Vanguard.
I specialize in modernizing legacy ETL systems — migrating mainframe/COBOL + DB2 jobs to fully serverless AWS architectures using PySpark, Glue, Lambda, Step Functions, and EMR on EKS.
Currently focused on building real-time streaming pipelines, data lakehouses with Apache Iceberg, and Kubernetes-native data platforms using Argo Workflows and Karpenter.
Tata Consultancy Services (TCS)
Vanguard Project — Client-Embedded Team
Indore, India
50k TPS pipeline via Kinesis → PySpark streaming + SageMaker inference. Cut false positives by 41%.
Zero-ETL lakehouse with schema evolution → reduced Athena query costs by 68%.
90% Spot Graviton fleet on EMR on EKS using Karpenter + Slack-triggered Step Functions → savings of 72%.
Active-passive DR for 2TB analytics using S3 CRR, DynamoDB Global Tables, Route53 health checks + Terraform automation.
PySpark + Bedrock Titan embeddings on EKS → OpenSearch + plain-English Slack alerts via LLM summaries.
Kubernetes-native data mesh with ArgoCD + isolated scaling and cost tagging.
AWS Certified Data Engineer – Associate
Issued 2025
AWS Certified Developer – Associate
Issued 2025
AWS Certified Solutions Architect – Associate
Issued 2025
AWS Certified Cloud Practitioner
Issued 2023
All certificates are publicly verifiable on AWS Credly