Daniel Regalado Cardoso
  • Home
  • About
  • Projects
  • 🤗

Capstone Projects

Industry partnerships.

AI for Equitable Public Transportation

Deloitte · University of Miami

Jan 2026 – Present

Building predictive models integrating Census ACS, GTFS, and ArcGIS data to identify transit equity gaps in Miami-Dade County. Sprint-based development with GitHub issue tracking, SHAP-based interpretability, simulation models, and an interactive dashboard for equity-driven policy recommendations.

Python SHAP GTFS ArcGIS Census ACS

View Repository · Live Demo

AI Scheduling Assistant

UHealth Bascom Palmer Eye Institute · University of Miami

Sep 2025 – Present

Prototyped a multi-agent LLM chatbot with LangGraph, LangChain, and GPT-4 Turbo — RAG over ChromaDB, deterministic safety gates, confidence scoring, and LangSmith monitoring — then shipped the production solution on Microsoft Copilot Studio integrated with UHealth’s scheduling systems for 100+ providers.

Copilot Studio LangGraph LangChain GPT-4 RAG ChromaDB


Featured Projects

Things I’ve built.

Multi-Model SQL Agent with LLMOps

Personal Project — Fine-Tuning & LLMOps

2026

Multi-model SQL agent orchestrating three specialized fine-tuned LLMs: Qwen 2.5 Coder 7B for SQL generation, Phi-3 Mini 3.8B for chart reasoning (knowledge-distilled), and DeepSeek Coder 1.3B for SVG rendering. Schema RAG over ChromaDB, DuckDB execution, Plotly fallback, and three public training datasets on the HuggingFace Hub.

Qwen 2.5 Coder Phi-3 DeepSeek Unsloth ChromaDB DuckDB LLMOps

View Repository · Live Demo

MoE Backward Optimization (moesloth)

Personal Project — Triton Kernels, Unsloth-inspired

2026

Research project applying Unsloth’s hand-derived-backward + Triton-fusion methodology to Mixture-of-Experts. Targets sparse O(k) router backward, fused dispatch/combine with expert GEMMs, and LoRA-on-MoE selective grad accumulation. Validation against Megablocks, ScatterMoE, and DeepSeek-MoE on OLMoE and Mixtral.

Triton PyTorch Mixture-of-Experts Unsloth Kernel Fusion

View Repository

HDL Cholesterol Prediction

ASA South Florida Data Challenge — Distinguished Achievement (Top 10%)

2026

Developed a CatBoost model on NHANES survey data achieving R² = 0.75 for predicting HDL cholesterol levels. Domain-guided feature engineering with SHAP analysis identified gender and waist circumference as top predictors. Presented at FIU Winner Showcase. Collaboration with Miguel Rocha.

CatBoost SHAP NHANES Feature Engineering

View Repository

DiagBoostTS

Personal Project — Python Library

2025 – Present

Iterative diagnostic boosting for time series forecasting. A systematic EDA → Stationarity → Decomposition → Features → Split → Model → Evaluate pipeline with editorial/minimal visual theme and modular architecture.

Python Time Series XGBoost SARIMA Library Design

View Repository

Product Sales Forecasting

MSBA @ University of Miami

2026

End-to-end time series forecasting of product sales — exploratory data analysis, 14 model families, and cross-validated model selection for optimal forecasting performance.

Python Time Series Cross-Validation EDA

View Repository

Coffee Shop Big Data Analytics

Big Data Final Project — PySpark + XGBoost

2026

End-to-end big-data ML on ~1.8M coffee-shop transactions. PySpark for distributed ingest, cleaning, and EDA; scikit-learn + XGBoost + Optuna for modeling; SHAP for global and local explanations. Wait time R² 0.34 (+34.8%), purchase amount R² 0.86, rewards-member AUC 0.98 / F1 0.89. Cyclic-hour encodings, peak-hour interactions, and ordinal income features.

PySpark XGBoost Optuna SHAP scikit-learn Big Data

View Repository

Yelp Recommendation & Churn Prediction

MAS 651 — Deep Learning Final Project

2026

Recommendation system and churn prediction model for Tampa Bay businesses using Yelp data. Deep learning approaches for user behavior modeling and business analytics.

Deep Learning Recommendation Systems Churn NLP

View Repository

CanDi Supply Chain Network Design

MGT 642 — Supply Chain Analytics Final Project

2026

Supply chain network design and optimization for CanDi, applying analytical models for facility location, capacity planning, and logistics optimization.

Supply Chain Optimization Network Design Analytics

View Repository

ATS Resume Optimizer

Personal Project — Live on HuggingFace Spaces

2026

Full-stack AI tool that analyzes resumes against job descriptions using a hybrid ML scoring system: 60% ATS keyword match (Llama 3.1-70B) + 40% semantic similarity (Sentence-Transformers). Generates an optimized, ATS-tailored resume with downloadable PDF.

Llama 3.1-70B Sentence-Transformers Gradio HuggingFace NLP

View Repository · Live Demo

MayaVoice LLM

Personal Project — Generative AI for Indigenous Languages

2026

Multilingual GenAI application for translation and conversation in Guatemalan Mayan languages. Bridges the gap between low-resource indigenous languages and modern AI through fine-tuned LLMs, preserving cultural heritage through technology.

LLMs NLP Low-Resource Languages HuggingFace Gradio

View Repository · Live Demo

Maya Encoding

Personal Project — Open-Source Python Library

2026

Maya-inspired vigesimal numerical encodings for machine learning. Scikit-learn compatible transformers that convert numerical features into base-20 representations inspired by the ancient Maya numeral system, offering a novel feature engineering approach.

Python scikit-learn Feature Engineering Library Design

View Repository · Live Demo

CounterFlow Neural Network

Personal Project — Novel Architecture

2026

A novel neural network architecture inspired by counter-current flow in chemical engineering unit operations. Applies principles of heat and mass transfer to information flow in deep learning, creating bidirectional feature exchange between parallel network streams.

PyTorch Deep Learning Neural Architecture Chemical Engineering

View Repository · Live Demo


Fine-Tuned Models

LoRA adapters I’ve trained and shipped.

5

public LoRA adapters

3

base-model families

3

downstream tasks

2

production pipelines powered

MayaVoice — Llama 3.1 8B (v2)

PEFT LoRA · low-resource indigenous NLP

Fine-tuned Llama 3.1 8B for translation and conversation across Guatemalan Mayan languages — tagged for low-resource Americas NLP. Currently powers the MayaVoice Space.

Llama 3.1 8B PEFT LoRA Translation Mayan

🤗 View Model

MayaVoice — Llama 3.1 8B (v1)

Baseline release · Unsloth + TRL

First-pass LoRA on Llama 3.1 8B for Mayan-language generation. Baseline that v2 iterates on.

Llama 3.1 8B Unsloth TRL LoRA

🤗 View Model

Chart Reasoner — Phi-3 Mini

Knowledge-distilled · SQL-agent orchestration

Phi-3 Mini 3.8B fine-tuned to pick the right chart type for a query + schema. Distilled from a larger teacher; deployed as the reasoning node of the SQL Agent with LLMOps.

Phi-3 Mini 3.8B Knowledge Distillation Chart Reasoning

🤗 View Model

Chart Reasoner — Adapter Only

Lightweight release · pair with base

Adapter-only variant of the Phi-3 chart reasoner for users who want to load the LoRA on top of their own base-model cache.

Phi-3 Mini Adapter Only PEFT

🤗 View Model

SVG Renderer — DeepSeek Coder

1.3B code model · structured-data → SVG

DeepSeek Coder 1.3B fine-tuned to render structured data as inline SVG charts. Powers the visualization node of the SQL agent when the Plotly fallback isn’t needed.

DeepSeek Coder 1.3B SVG Generation Code-LLM

🤗 View Model


Curated Datasets

Training corpora I built and released.

4

public datasets

220K+

training rows

2

languages (EN / ZH)

190+

dataset downloads

text-to-sql-mix-v2

100K+ rows · bilingual EN / ZH

Second-generation text-to-SQL corpus. Adds Chinese coverage and task-ID metadata for multilingual SQL-generation fine-tuning. Current training signal for the SQL-generator stack.

Text-to-SQL EN / ZH 100K+ rows Parquet

🤗 View Dataset

text-to-sql-mix-v1

100K+ rows · English baseline

First-gen text-to-SQL training set (English). Prior version used to train the SQL-generator LoRA and benchmark against v2.

Text-to-SQL English 100K+ rows Parquet

🤗 View Dataset

chart-reasoning-mix-v1

10K–100K rows · query + schema → chart type

Natural-language query and schema pairs mapped to chart-type decisions. Distillation source for the Phi-3 chart reasoner.

Chart Reasoning Distillation Corpus Parquet

🤗 View Dataset

svg-chart-render-v1

10K–100K rows · structured data → SVG

Structured-data to inline-SVG rendering pairs. Training signal for the DeepSeek SVG renderer.

SVG Rendering Code Generation Parquet

🤗 View Dataset

Interested in collaborating?

I’m always open to discussing new projects and opportunities.

LinkedIn GitHub 🤗 Hugging Face Email

 dxr1491@miami.edu

© 2026 Daniel Regalado Cardoso

 
  • 🤗