I am an Applied Data Scientist, Data Engineer, and Forward-Deployed Engineer focused on building end-to-end data and machine learning systems that operate in real-world environments.
I am currently pursuing an M.S. in Applied Data Science at Clarkson University (GPA: 4.0), where my work centers on deploying production-grade pipelines, interactive analytical systems, and decision-support tools across industrial, operational, and policy domains.
My work sits at the intersection of:
- Engineering-driven problem solving
- Full-stack data systems (OLTP → ETL → analytics → application layer)
- Applied machine learning and statistical modeling
- Forward-deployed systems integrated directly with users and workflows
Data Engineering & Systems
SQL (PostgreSQL, MySQL), data warehousing, dimensional modeling, star schemas, SCD Type 2, ETL/ELT pipelines, incremental processing
Machine Learning & Analytics
Python, R, regression, classification, clustering, PCA, feature engineering, statistical diagnostics
Visualization & Decision Support
Plotly, Tableau, Shiny, interactive dashboards, analytical reporting systems
Applications & Backend Systems
Flask, SQLAlchemy, REST APIs, full CRUD systems, Streamlit
Tools & Infrastructure
Git, Docker, Airflow, Snowflake, AWS (EC2, S3, Lambda)
React (Vite), TypeScript, LLM APIs, Adversarial Testing
Built an adversarial evaluation system for LLM agents, enabling real-time exploitation and behavioral validation of vulnerabilities such as prompt injection, role impersonation, and data exfiltration. Designed a closed-loop pipeline combining agent parsing, automated vulnerability detection, simulation, and model-based exploit validation.
🔗 https://github.com/Thooms-coder/agent-breaker-studio
Python, SQL, Streamlit, Plotly, PostgreSQL, LLMs
Developed a civic analytics platform powered by large-scale census data, including a normalized metric warehouse and an LLM-driven copilot for natural language querying, statistical analysis, and interactive visualization.
🔗 https://github.com/Thooms-coder/ma-gateway-cities-dashboard
Python, PyTorch, Signal Processing, Pandas, Plotly
Engineered a multi-branch ETL pipeline integrating audio, image, and sensor data to perform cross-modal validation of traffic systems. Built independent feature pipelines and statistical workflows, reducing false-positive anomaly alerts by 22%.
🔗 https://github.com/Thooms-coder/multimodal-taxi-data-analysis-big-data
SQL, Data Engineering
Designed and implemented a full OLTP → staging → warehouse pipeline with dimensional modeling, SCD Type 2 handling, incremental loads, and analytical aggregation for a retail and rental system.
🔗 https://github.com/Thooms-coder/zagi-data-warehouse
Research Assistant — Applied Data Science (Clarkson University)
Built time-series pipelines and analytical models on 50,000+ high-frequency wastewater observations, developing predictive insights and decision-support tools for operational optimization.
Software Developer & Database Engineer (Clarkson University)
Designed and deployed a SQL-backed system for a 200+ member rowing club, automating scheduling, reporting, and operational workflows through a forward-deployed data system.
- 📍 Boston, Massachusettes
- 📧 mungosmj@clarkson.edu
- 💻 https://github.com/Thooms-coder
I am interested in roles and collaborations involving:
- Data Engineering
- Applied Machine Learning
- Forward-Deployed Engineering
- Analytics Systems & Decision Support
- Research-driven data applications