✨ Create tailored Spaces
🔑 Manage user permissions
👥 Easy team collaboration
More than a feature, it’s Phoenix adapting to you.
Spin up a new Phoenix project & test it out!
@arize-phoenix.bsky.social
✨ Create tailored Spaces
🔑 Manage user permissions
👥 Easy team collaboration
More than a feature, it’s Phoenix adapting to you.
Spin up a new Phoenix project & test it out!
@arize-phoenix.bsky.social
Add GenAI tracing to your @arize-phoenix.bsky.social applications in just a few lines. Works great with Span Replay so you can debug, tweak, and explore agent behavior in prompt playground.
Check Notebook + docs below!👇
Add GenAI tracing to your @arize-phoenix.bsky.social applications in just a few lines. Works great with Span Replay so you can debug, tweak, and explore agent behavior in prompt playground.
Check Notebook + docs below!👇
✔️ Trace agent decisions at every step
✔️ Offline and Online Evals using LLM as a Judge
If you're building agents, measuring them is essential.
Full vid and cookbook below
✔️ Trace agent decisions at every step
✔️ Offline and Online Evals using LLM as a Judge
If you're building agents, measuring them is essential.
Full vid and cookbook below
My go-to way to test out these new models: grab a failed trace from a previous run, pull it into playground, switch the model and see if 4.1 can succeed where 4o failed.
Early signs are promising!
My go-to way to test out these new models: grab a failed trace from a previous run, pull it into playground, switch the model and see if 4.1 can succeed where 4o failed.
Early signs are promising!
In my new tutorial, learn techniques on how to optimize your prompt so your judge can improve accuracy, cost, fairness, and robustness
better prompts ➡️ better evals
In my new tutorial, learn techniques on how to optimize your prompt so your judge can improve accuracy, cost, fairness, and robustness
better prompts ➡️ better evals
In this tutorial, I apply ReAct principles to prompt LLMs to Reason + Act like humans. By specifying these steps, the LLM generates reasoning and interacts with tools for greater accuracy.
Full Video Tutorial: youtu.be/PB7hrp0mz54?...
In this tutorial, I apply ReAct principles to prompt LLMs to Reason + Act like humans. By specifying these steps, the LLM generates reasoning and interacts with tools for greater accuracy.
Full Video Tutorial: youtu.be/PB7hrp0mz54?...
I’ve been using Chain of Thought (CoT) prompting to help LLMs replicate logical step-by-step thinking.
For the next segment in my prompting series, I use @arize-phoenix.bsky.social to test the performance of various CoT methods
I’ve been using Chain of Thought (CoT) prompting to help LLMs replicate logical step-by-step thinking.
For the next segment in my prompting series, I use @arize-phoenix.bsky.social to test the performance of various CoT methods
We're celebrating Phoenix reaching 5000 stars on GitHub! This milestone underscores the growing demand for robust, open-source tools that tackle the complexities of AI and LLM development
Check it out: github.com/Arize-ai/pho...
www.youtube.com/watch?v=bW5Z...
We're celebrating Phoenix reaching 5000 stars on GitHub! This milestone underscores the growing demand for robust, open-source tools that tackle the complexities of AI and LLM development
Check it out: github.com/Arize-ai/pho...
www.youtube.com/watch?v=bW5Z...
In my latest tutorial, I explore how few-shot prompting boosts accuracy without massive datasets or retraining—using @arize-phoenix.bsky.social prompts and experiments to break it down.
This kicks off my prompting series... more to come!
In my latest tutorial, I explore how few-shot prompting boosts accuracy without massive datasets or retraining—using @arize-phoenix.bsky.social prompts and experiments to break it down.
This kicks off my prompting series... more to come!
This makes Prompt Playground ideal for side-by-side reasoning tests: o3 vs. Anthropic vs. R1.
Plus, GPT-4.5 support keeps it up to date with the latest from OpenAI & Anthropic - test them all out in the playground! ⚡️
This makes Prompt Playground ideal for side-by-side reasoning tests: o3 vs. Anthropic vs. R1.
Plus, GPT-4.5 support keeps it up to date with the latest from OpenAI & Anthropic - test them all out in the playground! ⚡️
📌 Persistent column selection for consistent views
🔍 Filter data directly from tables with metadata and quick metadata filters
⏳ Set custom time ranges for traces & spans
🌳 Option to filter spans by root spans
Check out the demo👇
📌 Persistent column selection for consistent views
🔍 Filter data directly from tables with metadata and quick metadata filters
⏳ Set custom time ranges for traces & spans
🌳 Option to filter spans by root spans
Check out the demo👇
There's also a tutorial linked in here where you can use Phoenix to compare the performance of different techniques. 👇
arize.com/blog/prompt-...
There's also a tutorial linked in here where you can use Phoenix to compare the performance of different techniques. 👇
arize.com/blog/prompt-...