app.phoenix.arize.com
The new TypeScript Evals package to be a simple & powerful way to evaluate your agents:
✅ Define a task (what the agent does)
✅ Build a dataset
✅ Use an LLM-as-a-Judge evaluator to score outputs
✅ Run evals and see results in Phoenix
Docs 👇
The new TypeScript Evals package to be a simple & powerful way to evaluate your agents:
✅ Define a task (what the agent does)
✅ Build a dataset
✅ Use an LLM-as-a-Judge evaluator to score outputs
✅ Run evals and see results in Phoenix
Docs 👇
Splits let you define named subsets of your dataset & filter your experiments to run only on those subsets.
Learn more & Check out this walkthrough:
⚪️ Create a split directly in the Phoenix UI
⚪️ Run an experiment scoped to that subset
👉 Full demo + code below 👇
Splits let you define named subsets of your dataset & filter your experiments to run only on those subsets.
Learn more & Check out this walkthrough:
⚪️ Create a split directly in the Phoenix UI
⚪️ Run an experiment scoped to that subset
👉 Full demo + code below 👇
Our new live Phoenix Demos let you explore every step of an agent’s reasoning just by chatting with pre-built agents, with traces appearing instantly as you go.
Our new live Phoenix Demos let you explore every step of an agent’s reasoning just by chatting with pre-built agents, with traces appearing instantly as you go.
With Mastra now integrating directly with Phoenix, you can trace your TypeScript agents with almost zero friction.
And now… you can evaluate them too: directly from TypeScript using Phoenix Evals.
With Mastra now integrating directly with Phoenix, you can trace your TypeScript agents with almost zero friction.
And now… you can evaluate them too: directly from TypeScript using Phoenix Evals.
✨ Create tailored Spaces
🔑 Manage user permissions
👥 Easy team collaboration
More than a feature, it’s Phoenix adapting to you.
Spin up a new Phoenix project & test it out!
@arize-phoenix.bsky.social
✨ Create tailored Spaces
🔑 Manage user permissions
👥 Easy team collaboration
More than a feature, it’s Phoenix adapting to you.
Spin up a new Phoenix project & test it out!
@arize-phoenix.bsky.social
You can now create datasets, run experiments, and attach evaluations to experiments using the Phoenix TS/JS client.
Shoutout to @anthonypowell.me and @mikeldking.bsky.social for the work here!
You can now create datasets, run experiments, and attach evaluations to experiments using the Phoenix TS/JS client.
Shoutout to @anthonypowell.me and @mikeldking.bsky.social for the work here!
Add GenAI tracing to your @arize-phoenix.bsky.social applications in just a few lines. Works great with Span Replay so you can debug, tweak, and explore agent behavior in prompt playground.
Check Notebook + docs below!👇
Add GenAI tracing to your @arize-phoenix.bsky.social applications in just a few lines. Works great with Span Replay so you can debug, tweak, and explore agent behavior in prompt playground.
Check Notebook + docs below!👇
I’ve been really liking some of the eval tools from Pydantic's evals package.
Wanted to see if I could combine these with Phoenix’s tracing so I could run Pydantic evals on traces captured in Phoenix
I’ve been really liking some of the eval tools from Pydantic's evals package.
Wanted to see if I could combine these with Phoenix’s tracing so I could run Pydantic evals on traces captured in Phoenix
✔️ Trace agent decisions at every step
✔️ Offline and Online Evals using LLM as a Judge
If you're building agents, measuring them is essential.
Full vid and cookbook below
✔️ Trace agent decisions at every step
✔️ Offline and Online Evals using LLM as a Judge
If you're building agents, measuring them is essential.
Full vid and cookbook below
Tag a function with `@ tracer.llm` to automatically capture it as an @opentelemetry.io span.
- Automatically parses input and output messages
- Comes in decorator or context manager flavors
Tag a function with `@ tracer.llm` to automatically capture it as an @opentelemetry.io span.
- Automatically parses input and output messages
- Comes in decorator or context manager flavors
My go-to way to test out these new models: grab a failed trace from a previous run, pull it into playground, switch the model and see if 4.1 can succeed where 4o failed.
Early signs are promising!
My go-to way to test out these new models: grab a failed trace from a previous run, pull it into playground, switch the model and see if 4.1 can succeed where 4o failed.
Early signs are promising!
Flowise is fast, visual, and low-code — but what happens under the hood?
With the new Arize Phoenix integration, you can debug, inspect, and visualize your LLM applications and agent workflows with 1 configuration step - no code required.
Flowise is fast, visual, and low-code — but what happens under the hood?
With the new Arize Phoenix integration, you can debug, inspect, and visualize your LLM applications and agent workflows with 1 configuration step - no code required.
Here's how you can easily setup a LangGraph agent, deploy it with Google ADK, and trace it with @arize-phoenix.bsky.social.
Here's how you can easily setup a LangGraph agent, deploy it with Google ADK, and trace it with @arize-phoenix.bsky.social.
Together you can:
✅ Evaluate performance with Ragas metrics
✅ Visualize and understand LLM behavior through traces & experiments in Arize or Phoenix
Dive into our docs & notebooks ⬇️
Together you can:
✅ Evaluate performance with Ragas metrics
✅ Visualize and understand LLM behavior through traces & experiments in Arize or Phoenix
Dive into our docs & notebooks ⬇️
In my new tutorial, learn techniques on how to optimize your prompt so your judge can improve accuracy, cost, fairness, and robustness
better prompts ➡️ better evals
In my new tutorial, learn techniques on how to optimize your prompt so your judge can improve accuracy, cost, fairness, and robustness
better prompts ➡️ better evals
📌Tag prompts in code and see those tags reflected in the UI
📌Tag prompt versions as development, staging, or production — or define your own
📌Add in tag descriptions for more clarity
Manage your prompt lifecycles with confidence🚀
📌Tag prompts in code and see those tags reflected in the UI
📌Tag prompt versions as development, staging, or production — or define your own
📌Add in tag descriptions for more clarity
Manage your prompt lifecycles with confidence🚀
Apply here: docs.google.com/forms/d/e/1F...
Apply here: docs.google.com/forms/d/e/1F...
In this tutorial, I apply ReAct principles to prompt LLMs to Reason + Act like humans. By specifying these steps, the LLM generates reasoning and interacts with tools for greater accuracy.
Full Video Tutorial: youtu.be/PB7hrp0mz54?...
In this tutorial, I apply ReAct principles to prompt LLMs to Reason + Act like humans. By specifying these steps, the LLM generates reasoning and interacts with tools for greater accuracy.
Full Video Tutorial: youtu.be/PB7hrp0mz54?...