Arsalan Shaikh
I ship production LLM and ML systems end-to-end. Currently at
Adastraa AI, building an Amazon Ads
LLM copilot (Azure OpenAI + vector memory + function calling) and a real-time
XGBoost bidding engine processing
50K-100K events/day. Focused on cost-aware AI engineering, not weekend prototypes.
Current Role
Machine Learning Engineer
Adastraa AIAds AI Copilot — Streaming LLM with Vector Memory & Function-Calling
- Built an Amazon Ads copilot on Azure OpenAI (gpt-4o-mini + o4-mini reasoning), Express, and MongoDB Atlas for natural-language querying of live campaign data.
- Designed a 3-tier memory system (Redis short-term, MongoDB episodic with 30-day TTL, Atlas Vector Search with 0.92 cosine dedup) that bounds per-user memory to ~200 facts regardless of usage.
- Added intent-based model routing (gpt-4o-mini vs o4-mini) and 7 LLM-callable MongoDB tools via function calling, cutting per-turn token cost from ~2,264 to ~800.
- Implemented SSE streaming, prompt-injection defense (17 input patterns + output sanitizer), and index-level per-user isolation via $vectorSearch pre-filters.
Autopilot — Real-time Bidding Engine for Amazon Sponsored Products
- Replaced rule-based bidding with an XGBoost model predicting daily base bids and hourly adjustments, retrained daily on 90 days of Amazon Ads reports and live AMS streaming events (50K-100K/day).
- Built an anomaly-detection guardrail to reject outlier bid predictions before live execution, preventing runaway ad spend.
- Deployed on AWS (EC2, Amplify) with automated daily retraining; live in production on client campaigns.
Professional Experience
Gen AI Engineer (Remote)
Journalyst
- Built a real-time voice-driven AI coach using Groq Llama 3.3, Whisper ASR, and PlayAI TTS, with a Pinecone RAG pipeline grounded on user session history.
- Designed a custom Voice Activity Detection (VAD) pipeline for clean speech segmentation across variable speaking patterns.
- Implemented intent-based prompting that routed user queries through different response strategies based on speech and behavior signals.
- Shipped the full stack: Flask backend + React frontend with low-latency audio streaming.
Software Engineer Intern (Remote)
Dictation Daddy
- Fine-tuned OpenAI Whisper Large for domain-specific transcription, improving accuracy on production audio.
- Optimized inference pipeline, reducing latency by 40% (1s → 600ms) for near real-time response.
- Built REST APIs with monitoring, logging, and fault tolerance for reliable integration with the main product.
- Integrated Gemini 1.5 for adaptive text formatting (formal / informal) based on transcript context.
Key Achievements
Mumbai Hacks 2025 — Finalist
Top finalists out of 15,000+ participants
Selected from a pool of over 15,000 participants across India to compete in the finals.
Led team as technical decision-maker, owning architecture and problem-solving strategy across all phases.
Delivered the final pitch presentation, demonstrating a working AI solution to judges.
Academic Background
B.Tech in AI & Data Science
MIT Aurangabad, Maharashtra
Technical Skills
Generative AI / LLMs 01
Vector Search & Retrieval 02
Machine Learning 03
Backend & Languages 04
Databases & Caching 05
Cloud & MLOps 06
AI Projects
Gen AI
ML
CV
Iris Tumor Detection (CNN)
CNN model for medical image classification with image augmentation pipeline and hyperparameter tuning. Achieved 92% accuracy on held-out validation data.
My Resume
The Full Engineering Story
Download my latest resume for a complete view of my work history, technical depth, and projects. Updated with my current Adastraa AI work.
Let's Talk
Build something with AI?
Open to AI/ML Engineer roles, freelance LLM work, and collaborations on production AI systems. Drop a message or reach out directly.