Ankit Saxena

ANKIT SAXENA

Data Engineer • GenAI Engineer

About Me

I’m a AI and Machine Learning Engineer focused on building reliable, production-ready systems. I enjoy working on problems where data quality, modeling decisions, and system design matter as much as raw performance.

I take a pragmatic, production-first approach—thinking beyond notebooks to how solutions behave in the real world. I value clarity, thoughtful trade-offs, and building systems that are stable, explainable, and built to last.

Skills & Technologies

Python
SQL
ETL
Databricks
Machine Learning
Apache Spark
AWS
Docker
Snowflake / Redshift
GitHub Actions
MLOps
LLMs & Generative AI
RAG Pipelines
LangChain & LlamaIndex
Hugging Face

Featured Projects

Retail SKU Demand Prediction
  • End-to-end demand forecasting system to predict SKU-level daily demand for a retail store over a 21-day future horizon.
  • The solution uses classical machine learning / statistical techniques, applies standard time-series feature engineering, and evaluates performance using RMSE.
Brain Box

    Brain Box is a Retrieval-Augmented Generation (RAG) chat bot. The backend ingests documents, builds a vector store, and serves a chat API that answers questions using retrieved context

  • Vector embedding and storage using ChromaDB
  • Semantic retrieval and prompt orchestration with LangChain
  • It uses Models powered by Azure AI Foundry
  • Document ingestion and vector store rebuild on upload
  • Pluggable LLM / embedding configuration via environment variables
Real-Time Time Streaming ETL Pipeline

    A real-time ETL pipeline designed to ingest, process, and transform streaming data with low latency, enabling timely analytics and downstream machine learning use cases

  • Built a scalable streaming pipeline to ingest and process events in near real time, ensuring data consistency and fault tolerance across the ETL flow.
  • Implemented real-time transformations and aggregations with monitoring and error handling to support reliable downstream analytics and ML systems

Experience

AI/ML Engineer — Simplifix Systems (Vijay Sales)

Jan 2026 – Present

  • Built and maintained scalable data pipelines for retail transaction and inventory data using Databricks and AWS.
  • Developed machine learning models for inventory prediction, improving demand forecasting and stock planning across multiple branches.
  • Leveraged LLMs and AI-based approaches to analyze retail data and generate business insights for operational decision-making.
  • Designed and optimized data engineering workflows including ETL/ELT pipelines, data transformations, and data quality checks.
  • Created interactive dashboards and analytics reports for business stakeholders to track sales performance, inventory levels, and operational KPIs.
Associate Software Developer — Bosch Global Software Technologies

Feb 2025 – Jan 2026

  • Developed and deployed LLM-based microservices and data-driven automation solutions, improving team productivity by 32%.
  • Built scalable NLP and embedding pipelines using PyTorch, HuggingFace, and TensorFlow, reducing inference latency by 28%.
  • Designed reusable AI components enabling a multi-use architecture, reducing development time for new use cases by 40%.
  • Implemented production-grade applications using Python, Spark, FastAPI, and Docker, with monitoring for model drift and performance regression.
  • Collaborated in Agile / SAFe environments to translate requirements into reliable AI and data solutions.
Data Engineer Trainee — FranConnect Pvt. Ltd.

Sep 2024 – Feb 2025

  • Engineered scalable ETL and feature pipelines using PySpark, Hive, and SQL, improving processing performance by 45%.
  • Optimized ingestion of 30M+ records across distributed systems, ensuring data quality, governance, and lineage.
  • Reduced Spark job costs by 22% through partitioning, caching, and schema evolution strategies.
  • Automated data and ML pipeline deployments using Docker and GitHub Actions, reducing deployment failures by 80%.
Data Scientist Trainee — Dyno India Pvt. Ltd.

Feb 2023 – Aug 2023

  • Built and evaluated ML models (Logistic Regression, Random Forest, XGBoost, SVM, K-Means, PCA, ARIMA) achieving 84–92% accuracy.
  • Developed deep learning models (CNN, LSTM) for text and image data, reducing validation loss by 30%.
  • Implemented recommender systems using Collaborative Filtering and FISM, improving precision by 25%.

Certifications

• AWS Certified Cloud Practitioner

Contact

Socials