Eric Martinez
banner
ericmtztx.bsky.social
Eric Martinez
@ericmtztx.bsky.social
UTRGV School of Medicine / UTHealth RGV
Asst Dir Business Intelligence & Enterprise Engineering
Adjunct Lecturer @ UTRGV CS
Jiu-Jitsu Black Belt
🧵 Following our healthcare AI journey? I'll share:

- Implementation & design details
- Evaluation methodology and dataset development
- AI engineering process
- Data platform infrastructure/ HIPAA compliance

Learn how we're making healthcare more accessible in the RGV, one search at a time. (7/7)
November 28, 2024 at 6:06 PM
Step 5: Address Bias Head-On
We're expanding our evaluation dataset to detect unwanted outcomes that could impact patients, providers, or departments. No system is perfect, but measuring bias lets us iterate and improve responsibly. (6/7)
November 28, 2024 at 6:06 PM
Step 4: Systematic Development

- Build comprehensive evaluation dataset
- Perfect basic provider matching
- Enable structured query rewriting with LLMs

Each step verified through testing to ensure reliable results. (5/7)
November 28, 2024 at 6:06 PM
Step 3: Measure What Matters
Key metrics we're tracking:

- Query-to-(provider/specialty/department) match accuracy
- Cross-lingual search performance
- Response time (targeting <300ms)

These guide our improvements where they matter most and can be sliced across descriptors to measure bias. (4/7)
November 28, 2024 at 6:06 PM
Step 2: Fast Iteration Cycles
We built an evaluation pipeline measuring precision and other key metrics. This enables rapid testing of different embedding models, scoring methods, and chunking strategies - focusing on getting provider retrieval right before adding complexity. (3/7)
November 28, 2024 at 6:06 PM
Step 1: Real Data, Synthetic Queries
We extract actual patient-provider matches from our EHR, then use LLMs to generate natural search queries. This creates a robust test dataset that reflects real patient needs - from "my toe hurts" to "doctor for diabetes control" (2/7)
November 28, 2024 at 6:06 PM
I do think small task oriented models trained on real world data could be very useful (speed, cost, task performance). But skeptical that throwing PubMed articles and things of that sort will improve performance on real world clinical tasks, especially without significant curation and synthesis.
November 27, 2024 at 5:28 AM
Great paper. Really skeptical of the fine-tuning techniques and training data being used in many of these Med* models. Not much effort seems to be going into building legit high quality datasets for clinical tasks. I yawn at yet another Med* model eval’d on low quality QA datasets.
November 27, 2024 at 5:22 AM
I’d be curious to see if this phenomenon occurs with 1) larger variants of the same model teaching smaller models 2) The effect of prompting to curate high quality data than the original. It would be interesting to see how long it takes for collapse to outweigh the benefits.
November 23, 2024 at 5:27 PM