Heiko Hotz
banner
heikohotz.bsky.social
Heiko Hotz
@heikohotz.bsky.social
AI Engineer @ Google 👨‍💻 — Educator 👨‍🏫 — Traveller ✈️ — Hobby photographer 📷 — Foodie 🌮 — Film fan 🍿 — Boardgamer 🎲 — Londoner💂‍♂️

Medium: https://heiko-hotz.medium.com/
Github: https://github.com/heiko-hotz
LI: https://www.linkedin.com/in/heikohotz/
I really like tiny (you could even say "nano") bananas. They are so full of flavour 😋
August 19, 2025 at 7:07 AM
Official results are in - Gemini achieved gold-medal level in the International Mathematical Olympiad! 🏆
July 21, 2025 at 10:22 PM
Evaluating voice-driven agents got you pulling your hair out? 😩 Evaluating voice agents is WILD. Accents, noise, weird speech... how do you even test?! Manual prompt engineering for that? A total nightmare. 👇
July 15, 2025 at 7:05 AM
WWDC interviews with Apple executives just got a facelift - and it is refreshing!
For years, high-level Apple execs would come to John Gruber's (from Daring Fireball) Talk Show at WWDC. I often found these interviews less than insightful, and sometimes even annoying.
June 12, 2025 at 3:39 PM
Introducing Gemini-Powered Slide Creation by Voice!

In this quick demo, I’ve integrated a “Slide Creation Agent” into my personal project, Project Pastra. Watch how it effortlessly generates slides based on voice instructions.
January 14, 2025 at 7:47 AM
Multimodal AI models have the potential to finally deliver on the dream of language being the ultimate human-computer interface 🎙️

youtu.be/0OEDHAjY6LM
Gemini's Multimodal Live API with Calendar Tool
YouTube video by Heiko Hotz
youtu.be
January 8, 2025 at 7:42 AM
Fade Out. Directed by Jason Zada. Created with Google’s Veo 2.

youtu.be/9yQXkdA3u8k?...
Fade Out
YouTube video by Secret Level
youtu.be
December 30, 2024 at 8:56 PM
The Gemini Multimodal Live API Developer Guide is live!
December 30, 2024 at 9:25 AM
𝗖𝗵𝗮𝗽𝘁𝗲𝗿 𝟲 𝗼𝗳 𝘁𝗵𝗲 𝗚𝗲𝗺𝗶𝗻𝗶 𝗠𝘂𝗹𝘁𝗶𝗺𝗼𝗱𝗮𝗹 𝗟𝗶𝘃𝗲 𝗗𝗲𝘃𝗲𝗹𝗼𝗽𝗲𝗿 𝗚𝘂𝗶𝗱𝗲: 𝙒𝙝𝙖𝙩 𝙞𝙨 𝙖 𝙫𝙞𝙙𝙚𝙤, 𝙖𝙣𝙮𝙬𝙖𝙮?

After the hard fought battle of implementing proper audio communication in chapter 5, adding video capabilities to the multimodal live app a la Project Astra was a breeze.
December 27, 2024 at 8:59 AM
HELLO?? CAN YOU HEAR ME????

More times than I'm proud to admit did I utter these words into my laptop over the past few days 😅
December 26, 2024 at 10:15 AM
𝗔 𝗱𝗲𝘃𝗲𝗹𝗼𝗽𝗲𝗿 𝗴𝘂𝗶𝗱𝗲 𝗳𝗼𝗿 𝗚𝗲𝗺𝗶𝗻𝗶’𝘀 𝗠𝘂𝗹𝘁𝗶𝗺𝗼𝗱𝗮𝗹 𝗟𝗶𝘃𝗲 𝗔𝗣𝗜
December 24, 2024 at 9:36 AM
Developers are loving the Gemini 2.0 Multimodal Live API - we see so many of you starting to build with it 🤗

To get started with the API I wrote a small Python script (83 lines of code) that demonstrates how to set up a real-time, two-way audio communication with a Gemini language model.
December 23, 2024 at 1:10 PM
One more, because it's so much fun 🤩

#google #gemini #deepmind
December 19, 2024 at 10:04 PM
Math puzzle contest!

Gemini 2.0 Flash Thinking vs GPT-4o vs Claude 3.5 Sonnet. I was honestly surprised by the results.

Would love if someone could check with o1(Pro) 🤗

(Credit to the Bluesky community where I saw this puzzle a few days ago)
December 19, 2024 at 9:39 PM
Gemini 2.0 Flash Thinking released!

You thought we were done shipping, am I right? But the Google DeepMind folks had one more ace up their sleeves, and it's a big one!

#google #gemini #gemini2.0 #deepmind
December 19, 2024 at 8:57 PM
our web console serves as a valuable tool for developers exploring the vast capabilities of google's multimodal live api and its gemini foundation. #google #gemini #multimodal #api #devtools
December 18, 2024 at 6:53 PM
the combination of react, websockets, and audio worklets creates a powerful and flexible development environment for the google multimodal live api. #google #gemini #react #websockets #audioworklets
December 18, 2024 at 4:53 PM
this project is a testament to the power of websockets and their ability to enable seamless, real-time communication between applications and apis. #google #gemini #websockets #realtime #api
December 18, 2024 at 2:53 PM
experience the future of multimodal development with our react-based web console for the google multimodal live api. #google #gemini #react #multimodal #innovation
December 18, 2024 at 12:53 PM
our application is designed to facilitate real-time interactions with the google multimodal live api, simplifying the development of complex applications. #google #gemini #realtime #api #development
December 18, 2024 at 10:53 AM
the console features an event-driven architecture, emitting events for connection status, incoming data, and outgoing data. #google #gemini #events #websocket #programming
December 18, 2024 at 8:53 AM
Talk (and I really mean 🗣️) to your docs!

The Gemini Multimodal Live API has taken the developer community by storm and many have already started building with it. Here I show how to talk to your docs.

youtu.be/0ak684rtRvA

#google #gemini #live #multimodal
Talk to docs
YouTube video by Heiko Hotz
youtu.be
December 18, 2024 at 7:10 AM
developers can send text, real-time audio, and image input to the google multimodal live api through our intuitive console. #google #gemini #multimodality #api #webdev
December 18, 2024 at 6:53 AM
our app provides a comprehensive development environment for interacting with google's cutting-edge multimodal live api. #google #gemini #multimodal #ai #developers
December 17, 2024 at 8:53 PM
the user interface of our web console is meticulously crafted using scss for a cohesive and visually appealing experience. #google #gemini #scss #ui #design #webdesign
December 17, 2024 at 6:53 PM