Industry partnerships.
Deloitte · University of Miami
Jan 2026 – Present
Building predictive models integrating Census ACS, GTFS, and ArcGIS data to identify transit equity gaps in Miami-Dade County. Sprint-based development with GitHub issue tracking, SHAP-based interpretability, simulation models, and an interactive dashboard for equity-driven policy recommendations.
Python SHAP GTFS ArcGIS Census ACS
UHealth Bascom Palmer Eye Institute · University of Miami
Sep 2025 – Present
Prototyped a multi-agent LLM chatbot with LangGraph, LangChain, and GPT-4 Turbo — RAG over ChromaDB, deterministic safety gates, confidence scoring, and LangSmith monitoring — then shipped the production solution on Microsoft Copilot Studio integrated with UHealth’s scheduling systems for 100+ providers.
Copilot Studio LangGraph LangChain GPT-4 RAG ChromaDB
Things I’ve built.
Personal Project — Fine-Tuning & LLMOps
2026
Multi-model SQL agent orchestrating three specialized fine-tuned LLMs: Qwen 2.5 Coder 7B for SQL generation, Phi-3 Mini 3.8B for chart reasoning (knowledge-distilled), and DeepSeek Coder 1.3B for SVG rendering. Schema RAG over ChromaDB, DuckDB execution, Plotly fallback, and three public training datasets on the HuggingFace Hub.
Qwen 2.5 Coder Phi-3 DeepSeek Unsloth ChromaDB DuckDB LLMOps
Personal Project — Triton Kernels, Unsloth-inspired
2026
Research project applying Unsloth’s hand-derived-backward + Triton-fusion methodology to Mixture-of-Experts. Targets sparse O(k) router backward, fused dispatch/combine with expert GEMMs, and LoRA-on-MoE selective grad accumulation. Validation against Megablocks, ScatterMoE, and DeepSeek-MoE on OLMoE and Mixtral.
Triton PyTorch Mixture-of-Experts Unsloth Kernel Fusion
ASA South Florida Data Challenge — Distinguished Achievement (Top 10%)
2026
Developed a CatBoost model on NHANES survey data achieving R² = 0.75 for predicting HDL cholesterol levels. Domain-guided feature engineering with SHAP analysis identified gender and waist circumference as top predictors. Presented at FIU Winner Showcase. Collaboration with Miguel Rocha.
CatBoost SHAP NHANES Feature Engineering
Personal Project — Python Library
2025 – Present
Iterative diagnostic boosting for time series forecasting. A systematic EDA → Stationarity → Decomposition → Features → Split → Model → Evaluate pipeline with editorial/minimal visual theme and modular architecture.
Python Time Series XGBoost SARIMA Library Design
MSBA @ University of Miami
2026
End-to-end time series forecasting of product sales — exploratory data analysis, 14 model families, and cross-validated model selection for optimal forecasting performance.
Python Time Series Cross-Validation EDA
Big Data Final Project — PySpark + XGBoost
2026
End-to-end big-data ML on ~1.8M coffee-shop transactions. PySpark for distributed ingest, cleaning, and EDA; scikit-learn + XGBoost + Optuna for modeling; SHAP for global and local explanations. Wait time R² 0.34 (+34.8%), purchase amount R² 0.86, rewards-member AUC 0.98 / F1 0.89. Cyclic-hour encodings, peak-hour interactions, and ordinal income features.
PySpark XGBoost Optuna SHAP scikit-learn Big Data
MAS 651 — Deep Learning Final Project
2026
Recommendation system and churn prediction model for Tampa Bay businesses using Yelp data. Deep learning approaches for user behavior modeling and business analytics.
Deep Learning Recommendation Systems Churn NLP
MGT 642 — Supply Chain Analytics Final Project
2026
Supply chain network design and optimization for CanDi, applying analytical models for facility location, capacity planning, and logistics optimization.
Supply Chain Optimization Network Design Analytics
Personal Project — Live on HuggingFace Spaces
2026
Full-stack AI tool that analyzes resumes against job descriptions using a hybrid ML scoring system: 60% ATS keyword match (Llama 3.1-70B) + 40% semantic similarity (Sentence-Transformers). Generates an optimized, ATS-tailored resume with downloadable PDF.
Llama 3.1-70B Sentence-Transformers Gradio HuggingFace NLP
Personal Project — Generative AI for Indigenous Languages
2026
Multilingual GenAI application for translation and conversation in Guatemalan Mayan languages. Bridges the gap between low-resource indigenous languages and modern AI through fine-tuned LLMs, preserving cultural heritage through technology.
LLMs NLP Low-Resource Languages HuggingFace Gradio
Personal Project — Open-Source Python Library
2026
Maya-inspired vigesimal numerical encodings for machine learning. Scikit-learn compatible transformers that convert numerical features into base-20 representations inspired by the ancient Maya numeral system, offering a novel feature engineering approach.
Python scikit-learn Feature Engineering Library Design
Personal Project — Novel Architecture
2026
A novel neural network architecture inspired by counter-current flow in chemical engineering unit operations. Applies principles of heat and mass transfer to information flow in deep learning, creating bidirectional feature exchange between parallel network streams.
PyTorch Deep Learning Neural Architecture Chemical Engineering
LoRA adapters I’ve trained and shipped.
5
public LoRA adapters
3
base-model families
3
downstream tasks
2
production pipelines powered
PEFT LoRA · low-resource indigenous NLP
Fine-tuned Llama 3.1 8B for translation and conversation across Guatemalan Mayan languages — tagged for low-resource Americas NLP. Currently powers the MayaVoice Space.
Llama 3.1 8B PEFT LoRA Translation Mayan
Baseline release · Unsloth + TRL
First-pass LoRA on Llama 3.1 8B for Mayan-language generation. Baseline that v2 iterates on.
Llama 3.1 8B Unsloth TRL LoRA
Knowledge-distilled · SQL-agent orchestration
Phi-3 Mini 3.8B fine-tuned to pick the right chart type for a query + schema. Distilled from a larger teacher; deployed as the reasoning node of the SQL Agent with LLMOps.
Phi-3 Mini 3.8B Knowledge Distillation Chart Reasoning
Lightweight release · pair with base
Adapter-only variant of the Phi-3 chart reasoner for users who want to load the LoRA on top of their own base-model cache.
Phi-3 Mini Adapter Only PEFT
1.3B code model · structured-data → SVG
DeepSeek Coder 1.3B fine-tuned to render structured data as inline SVG charts. Powers the visualization node of the SQL agent when the Plotly fallback isn’t needed.
DeepSeek Coder 1.3B SVG Generation Code-LLM
Training corpora I built and released.
4
public datasets
220K+
training rows
2
languages (EN / ZH)
190+
dataset downloads
100K+ rows · bilingual EN / ZH
Second-generation text-to-SQL corpus. Adds Chinese coverage and task-ID metadata for multilingual SQL-generation fine-tuning. Current training signal for the SQL-generator stack.
Text-to-SQL EN / ZH 100K+ rows Parquet
100K+ rows · English baseline
First-gen text-to-SQL training set (English). Prior version used to train the SQL-generator LoRA and benchmark against v2.
Text-to-SQL English 100K+ rows Parquet
10K–100K rows · query + schema → chart type
Natural-language query and schema pairs mapped to chart-type decisions. Distillation source for the Phi-3 chart reasoner.
Chart Reasoning Distillation Corpus Parquet
10K–100K rows · structured data → SVG
Structured-data to inline-SVG rendering pairs. Training signal for the DeepSeek SVG renderer.
SVG Rendering Code Generation Parquet
Interested in collaborating?
I’m always open to discussing new projects and opportunities.