Data Scientist · ML Engineer · Full Stack Developer · USC '26

Arpit
Sutariya.

Turning raw data into decisions. I build predictive models, AI agents, and analytics systems that bridge complex engineering with real-world impact.

01 / Expertise

Skills & Tools

A deep toolkit spanning machine learning, data engineering, and full-stack AI development.

Languages

PythonRSQLPL/SQLC/C++JavaScriptBash

ML & Deep Learning

scikit-learnPyTorchTensorFlowXGBoostCNNRNN / LSTMTransformers

Statistical Methods

RegressionClassificationClusteringA/B TestingTime SeriesBayesian MethodsExperimental Design

LLMs & AI Agents

LangChainLangGraphGPT-4RAGLoRA / QLoRAPEFTPrompt Engineering

Data & Visualization

PandasNumPySciPyPlotlyTableauPower BISeaborn

Infrastructure

PostgreSQLMongoDBDockerAWS (EC2, S3)CI/CDREST APIsStreamlit

02 / Experience

Where I've Worked

Professional experience in data science, machine learning, and building production-grade data systems.

May 2024 — Aug 2024

Data Scientist Intern

RelTime Pvt. Ltd · Mumbai, India

  • Analyzed 2M+ daily transaction records using Python and SQL; performed EDA, hypothesis testing, and feature engineering to identify key revenue drivers informing pricing strategy changes.
  • Built predictive models (Logistic Regression, Random Forest, XGBoost) to forecast customer behavior; validated with cross-validation and communicated via Tableau/Power BI dashboards.
  • Developed a data quality framework with Great Expectations for anomaly detection, catching 95% of issues pre-production and reducing data prep time by 40%.

Jan 2023 — May 2023

Web Developer Intern

Vidyavardhini's College of Engineering & Technology · Mumbai, India

  • Built a full-stack web application using React and Flask, extracting and displaying data from user-provided sources with Tailwind CSS for reponsive UI and JWT authentication for secure access.
  • Reduced API latency by 35% and supported 500+ concurrent users through Redis caching and optimized RESTful API design with request validation.
  • Developed a text classification feature using scikit-learn with TF-IDF vectorization, achieving 82% accuracy and exposing predictions via REST endpoints.

03 / Projects

Selected Work

Projects combining data science, AI engineering, and thoughtful product design.

01

Codeflow

Real Time Collaborative Code Editor

Real-time collaborative code editor with conflict resolution using Operational Transformation (OT). Features delta-based change tracking, multi-user concurrent edits, and owner-controlled accept/reject workflows. Includes dual-mode support for live preview (HTML, CSS, JS) and multi-language code execution (C++, Java, Python, JS).

30%Better persistence
OTConflict resolution
PythonFlaskJavaScriptSocket.ioReduxReact
View on GitHub
02

ChatMyDB

Natural Language Data Exploration

NL-to-SQL/NoSQL system using GPT-4 with schema-aware prompting. Enables non-technical users to run complex aggregations, joins, and time-series queries without writing SQL — reducing manual query effort by 70%. Query validation pipeline achieves 92% first-pass accuracy.

70%Less manual effort
92%Query accuracy
PythonGPT-4PostgreSQLMongoDBStreamlitPlotly
View on GitHub
03

OpsBrainLLM

AI-Powered Manufacturing Analytics

Multi-agent analytics system built with LangGraph that performs end-to-end manufacturing data analysis: automated SQL querying, regression analysis, K-Means clustering, ANOVA, and correlation analysis on factory sensor data with auto-generated visualizations.

6Analysis types
E2EAutomation
LangChainLangGraphOpenAIPostgreSQLStreamlitPlotly
View on GitHub
04

GymAI

Fitness Analytics Platform

Computer vision pipeline using MediaPipe Pose to track joint angles across 6 exercises in real time. Signal processing and angle interpolation count reps with high precision, with aggregate analytics displayed on a React dashboard.

6Exercises tracked
RTReal-time
PythonFlaskReactMediaPipePostgreSQL
View on GitHub

04 / Education

Academic Background

University of Southern California

M.S. in Applied Data Science

GPA: 3.7/4Jan 2025 — Dec 2026

Coursework: Machine Learning, Data Mining, Database Systems, Research Methods & Statistical Analysis

University of Mumbai

B.Tech in Computer Science & Engineering (Data Science)

GPA: 8.84/10Sep 2020 — May 2024

Coursework: Data Structures & Algorithms, Probability & Statistics, Cloud Computing, Operating Systems

05 / Contact

Let's Connect

Open to opportunities in data science, machine learning, and AI engineering. Let's talk.