John Gilhuly
johngilhuly.bsky.social
John Gilhuly
@johngilhuly.bsky.social
Field Engineering @ Anysphere
🧪 📊 The @arize-phoenix.bsky.social TS/JS client now supports Experiments and Datasets!

You can now create datasets, run experiments, and attach evaluations to experiments using the Phoenix TS/JS client.

Shoutout to @anthonypowell.me and @mikeldking.bsky.social for the work here!
May 21, 2025 at 2:26 PM
Tired of tweaking prompts yourself? Let the machines do it for you! 🤖

This guide is a great primer on common approaches we see towards automated prompt optimization. If you've already read 100 "prompting tips and tricks" blogs but aren't yet a full DSPy contributor, then let this be your bridge!
May 12, 2025 at 6:20 PM
Reposted by John Gilhuly
🆕 New in OpenInference: Python auto-instrumentation for the Google GenAI SDK!

Add GenAI tracing to your @arize-phoenix.bsky.social applications in just a few lines. Works great with Span Replay so you can debug, tweak, and explore agent behavior in prompt playground.

Check Notebook + docs below!👇
May 8, 2025 at 8:41 PM
Reposted by John Gilhuly
Learn to prompt better
May 7, 2025 at 7:26 PM
@pydantic.dev evals 🤝 @arize-phoenix.bsky.social tracing and UI

I’ve been really liking some of the eval tools from Pydantic's evals package.

Wanted to see if I could combine these with Phoenix’s tracing so I could run Pydantic evals on traces captured in Phoenix
May 2, 2025 at 6:02 PM
Had a fantastic time talking about Self-Improving AI Agents with @arize-phoenix.bsky.social at the AI Camp NYC meetup this past week!

It's amazing to see how fast the discourse has moved from "just agents" to now multi-agent flows, optimized evals, and automated improvement strategies.
April 19, 2025 at 4:48 PM
Reposted by John Gilhuly
Just dropped a tutorial on using the OpenAI Agents SDK + @arize-phoenix.bsky.social to go from building to evaluating agents.

✔️ Trace agent decisions at every step
✔️ Offline and Online Evals using LLM as a Judge

If you're building agents, measuring them is essential.

Full vid and cookbook below
April 18, 2025 at 6:51 PM
We've added new LLM decorators to @arize-phoenix.bsky.social 's OpenInference library 🎁

Tag a function with `@ tracer.llm` to automatically capture it as an @opentelemetry.io span.
- Automatically parses input and output messages
- Comes in decorator or context manager flavors
April 18, 2025 at 2:21 AM
Reposted by John Gilhuly
We've added GPT-4.1 models to the @arize-phoenix.bsky.social Prompt Playground.

My go-to way to test out these new models: grab a failed trace from a previous run, pull it into playground, switch the model and see if 4.1 can succeed where 4o failed.

Early signs are promising!
April 16, 2025 at 6:43 PM
Reposted by John Gilhuly
Trace Flowise apps with Arize Phoenix 🔍

Flowise is fast, visual, and low-code — but what happens under the hood?

With the new Arize Phoenix integration, you can debug, inspect, and visualize your LLM applications and agent workflows with 1 configuration step - no code required.
April 15, 2025 at 9:03 PM
Google’s new Agent Development Kit (ADK) makes it dead easy to build, manage, and deploy AI agent systems.

Here's how you can easily setup a LangGraph agent, deploy it with Google ADK, and trace it with @arize-phoenix.bsky.social.
Deploying an Agent with Google Vertex Agent Engine and Arize Phoenix
YouTube video by Arize AI
www.youtube.com
April 10, 2025 at 7:10 PM
Reposted by John Gilhuly
Use Ragas with Arize AI @arize.bsky.social or ArizePhoenix to improve the evaluation of your LLM applications

Together you can:

✅ Evaluate performance with Ragas metrics
✅ Visualize and understand LLM behavior through traces & experiments in Arize or Phoenix

Dive into our docs & notebooks ⬇️
April 9, 2025 at 12:20 AM
Reposted by John Gilhuly
LLM as a Judge allows models to evaluate outputs in a single prompt—but a good judging needs a good prompt

In my new tutorial, learn techniques on how to optimize your prompt so your judge can improve accuracy, cost, fairness, and robustness

better prompts ➡️ better evals
April 7, 2025 at 5:15 PM
Reposted by John Gilhuly
New in the Phoenix client: Prompt Tagging 🏷️

📌Tag prompts in code and see those tags reflected in the UI
📌Tag prompt versions as development, staging, or production — or define your own
📌Add in tag descriptions for more clarity

Manage your prompt lifecycles with confidence🚀
April 4, 2025 at 7:24 PM
Reposted by John Gilhuly
Better LLMs start with better data and observability

We’ve integrated @CleanlabAI’s Trustworthy Language Model (TLM) with Phoenix to help teams improve LLM reliability and performance

🔗 Dive into the full implementation in our docs & notebook:
March 20, 2025 at 7:50 PM
In case you missed it, Arize AI Phoenix crossed the 5k GitHub star mark last week! ⭐️

Phoenix has changed a TON since its first iteration.

I'm constantly in awe of the execution speed and quality of this team. Here's to the next 5k and beyond!
March 20, 2025 at 4:07 PM
Reposted by John Gilhuly
How much LLM reasoning can you drive through your prompt itself?

I’ve been using Chain of Thought (CoT) prompting to help LLMs replicate logical step-by-step thinking.

For the next segment in my prompting series, I use @arize-phoenix.bsky.social to test the performance of various CoT methods
March 19, 2025 at 11:13 PM
Reposted by John Gilhuly
How much more data does an LLM app really need?

In my latest tutorial, I explore how few-shot prompting boosts accuracy without massive datasets or retraining—using @arize-phoenix.bsky.social prompts and experiments to break it down.

This kicks off my prompting series... more to come!
March 18, 2025 at 11:50 PM
For all my NYC friends! 🗽🍎

We're hosting an in-person office hours tomorrow all around LLM and Agent Evals.

Join for the free snacks/drinks, stay for the heated discussions about the validity of Pokemon-based model evaluations ⚡️🐀
LLM Evals Office Hours with Arize · Luma
Join us for an open coworking session focused on LLM and Agent Evaluations! Whether you're actively working on evaluation strategies or just exploring the…
lu.ma
March 18, 2025 at 6:20 PM
Reposted by John Gilhuly
Prompt optimization is essential, and automating it with frameworks like DSPy gives you scalable and data-driven improvements.

There's also a tutorial linked in here where you can use Phoenix to compare the performance of different techniques. 👇

arize.com/blog/prompt-...
Prompt Optimization Techniques
Explore different prompt optimization techniques and learn how Arize Phoenix and DSPy can be used to automate and enhance the process.
arize.com
March 17, 2025 at 9:22 PM
Shoutout to Aman Khan and the rest of the @arize.bsky.social team for delivering a top-notch talk at @deeplearningai.bsky.social's inaugural AI Dev 25 conference! 📣

Aman combined our recent Agent Evaluation course with the latest prompt optimization techniques to automate the improvement process.
March 15, 2025 at 3:48 AM
Reposted by John Gilhuly
⚡️OpenAI Agents Instrumentation is out—we’re continuously improving to bring you the latest.

With just a few lines of code, you'll be ready to trace and run agents. Check it out to level up your agent monitoring.

Our docs have all the details for a quick setup! 👇

#OpenAI #Agents #Tracing
March 15, 2025 at 1:21 AM
How can you programmatically improve your prompts? 🤔 🤖

Forget manual prompt engineering - there are better (read: "more automatic") ways to improve your prompts.

This video and notebook break down these techniques.

Featuring:
- DSPy
- @arize-phoenix.bsky.social
March 3, 2025 at 5:01 PM
Separate AI tools for dev and prod aren't just inefficient—they're actively sabotaging your model performance.

Too often, teams are stuck using disconnected tools—one for evaluation, another for monitoring, and yet another for debugging.

So, we built a unified approach.

arize.com/blog/why-ai-...
March 1, 2025 at 4:16 PM
🤖 Building agents, but not sure how to measure their performance?

Our newest blog post on @hf.co has you covered!

This post shows you how to use @arize-phoenix.bsky.social to trace and evaluate your smolagents.

Credit to @srichavali.bsky.social and @aymeric-roucher.bsky.social
February 28, 2025 at 5:19 PM