Data Scientist · ML Engineer · Full Stack Developer · USC '26
Arpit
Sutariya.
Turning raw data into decisions. I build predictive models, AI agents, and analytics systems that bridge complex engineering with real-world impact.
01 / Expertise
Skills & Tools
A deep toolkit spanning machine learning, data engineering, and full-stack AI development.
Languages
ML & Deep Learning
Statistical Methods
LLMs & AI Agents
Data & Visualization
Infrastructure
02 / Experience
Where I've Worked
Professional experience in data science, machine learning, and building production-grade data systems.
May 2024 — Aug 2024
Data Scientist Intern
RelTime Pvt. Ltd · Mumbai, India
- —Analyzed 2M+ daily transaction records using Python and SQL; performed EDA, hypothesis testing, and feature engineering to identify key revenue drivers informing pricing strategy changes.
- —Built predictive models (Logistic Regression, Random Forest, XGBoost) to forecast customer behavior; validated with cross-validation and communicated via Tableau/Power BI dashboards.
- —Developed a data quality framework with Great Expectations for anomaly detection, catching 95% of issues pre-production and reducing data prep time by 40%.
Jan 2023 — May 2023
Web Developer Intern
Vidyavardhini's College of Engineering & Technology · Mumbai, India
- —Built a full-stack web application using React and Flask, extracting and displaying data from user-provided sources with Tailwind CSS for reponsive UI and JWT authentication for secure access.
- —Reduced API latency by 35% and supported 500+ concurrent users through Redis caching and optimized RESTful API design with request validation.
- —Developed a text classification feature using scikit-learn with TF-IDF vectorization, achieving 82% accuracy and exposing predictions via REST endpoints.
03 / Projects
Selected Work
Projects combining data science, AI engineering, and thoughtful product design.
Codeflow
Real Time Collaborative Code Editor
Real-time collaborative code editor with conflict resolution using Operational Transformation (OT). Features delta-based change tracking, multi-user concurrent edits, and owner-controlled accept/reject workflows. Includes dual-mode support for live preview (HTML, CSS, JS) and multi-language code execution (C++, Java, Python, JS).
ChatMyDB
Natural Language Data Exploration
NL-to-SQL/NoSQL system using GPT-4 with schema-aware prompting. Enables non-technical users to run complex aggregations, joins, and time-series queries without writing SQL — reducing manual query effort by 70%. Query validation pipeline achieves 92% first-pass accuracy.
OpsBrainLLM
AI-Powered Manufacturing Analytics
Multi-agent analytics system built with LangGraph that performs end-to-end manufacturing data analysis: automated SQL querying, regression analysis, K-Means clustering, ANOVA, and correlation analysis on factory sensor data with auto-generated visualizations.
GymAI
Fitness Analytics Platform
Computer vision pipeline using MediaPipe Pose to track joint angles across 6 exercises in real time. Signal processing and angle interpolation count reps with high precision, with aggregate analytics displayed on a React dashboard.
04 / Education
Academic Background
University of Southern California
M.S. in Applied Data Science
Coursework: Machine Learning, Data Mining, Database Systems, Research Methods & Statistical Analysis
University of Mumbai
B.Tech in Computer Science & Engineering (Data Science)
Coursework: Data Structures & Algorithms, Probability & Statistics, Cloud Computing, Operating Systems
05 / Contact
Let's Connect
Open to opportunities in data science, machine learning, and AI engineering. Let's talk.