Kwindla Hultman Kramer
banner
kwindla.bsky.social
Kwindla Hultman Kramer
@kwindla.bsky.social
Low, low, low latency. Daily.co and Pipecat.ai
March Voice AI Meetup - Wednesday the 5th

lu.ma/ffpyl57n
February 17, 2025 at 1:58 AM
Source code is here:

github.com/pipecat-ai/p...

My favorite thing about this demo is that it's a really nice example of composite function calling.

Here are the function definitions. Gemini figures out solely from the argument descriptions how to find a conversation from "a few minutes ago"!
February 4, 2025 at 3:51 PM
Memory for voice AI agents (and composite function calling) ...

There are several ways to store (and later, retrieve) conversation state. One of the simplest is just to define a couple of functions and use your local filesystem!

Here, @chadbailey.net shows how to do that, using Gemini 2.0 Flash.
February 4, 2025 at 3:51 PM
Sean DuBois is one of my favorite people to talk to about WebRTC, audio and video, designing good libraries, and hacking in general.

Sean is the creator of Pion. Pion is an Open Source WebRTC implementation that is influential and very widely used (including at OpenAI, where Sean works).
February 3, 2025 at 8:25 PM
My favorite part of the DeepSeek-V3 Technical Report is the stuff about the all-to-all communication kernels. (Mostly in section 3.2.2. "Efficient Implementation of Cross-Node All-to-All Communication.")
January 30, 2025 at 8:46 PM
January 24, 2025 at 1:37 AM
January 19, 2025 at 9:54 PM
Sunday morning listening ... and hacking.
January 12, 2025 at 2:52 PM
Oh, wait. I take it back.
January 10, 2025 at 11:15 PM
They know what they’re doing over there in Cupertino (and Shenzhen).
January 10, 2025 at 11:13 PM
iOS + Gemini Multimodal Live + WebRTC

Filipi Fuchter added an iOS example to the Pipecat "Simple Chatbot" repo. With the Pipecat iOS SDK, you can build apps that use Gemini Multimodal Live and Gemini Flash with WebRTC, WebSockets, and HTTP networking.
January 10, 2025 at 7:23 PM
The voice-to-voice AI Pareto frontier ...

Gemini 1.5 Flash occupies an interesting place in the capabilities matrix for voice AI. It's fast, very inexpensive, has a long context window, and has native audio input.

I've been experimenting with Gemini a lot. Here's an interesting Pipecat pipeline:
December 5, 2024 at 4:48 PM
Sunset. Double overhead day.
December 3, 2024 at 12:54 AM
Team Suparova at the @supabase / @ycombinator hackathon.

There was a four-participant limit on the team size. We have five, but two are robots.

Last night was a very long session with lots of tiny little screws and some heavy ifconfig action.
November 23, 2024 at 7:46 PM