Albert Adler
banner
iamalbertadler.bsky.social
Albert Adler
@iamalbertadler.bsky.social
I build mobile apps and websites trying to solve problems. Also, I share my daily progress here (some people call it #buildinpublic )
A few extra details:
- Input images are also cached
- Used tools (like search or code) are also cached
- If you use structured output schema, that is also cached!
- Cache is cleared from the OpenAI servers after 5-10 minutes of no requests (although it can last up to 1 hour)
March 12, 2025 at 11:48 PM
By just organising a bit the prompt, you get some nice benefits:
- Less latency for big prompts (OpenAI says up to 80% less latency!!)
- Reduced costs for the input tokens (since a big part will be cached, OpenAI says up to 50% reduced costs)
- And caching is free, so do it!
March 12, 2025 at 11:48 PM
The user prompt is the "dynamic" part. The content specific for the article I am creating now. Imagine it like the custom instructions your user provide. That goes at the end.
March 12, 2025 at 11:48 PM
In the example below I am prompting a basic article writer.

The system prompt will define the basics of every article, so it is my "static" content, it goes at the beginning of the final prompt.
March 12, 2025 at 11:48 PM
Set the static content of the prompt always at the beginning and the variable data at the end.

Let me give an example (sad that X does not allow code formatting yet...):
March 12, 2025 at 11:47 PM
Joking. It is good that caching is already enabled for you, but to get the most of it you should structure the prompts in a "special" way:
March 12, 2025 at 11:47 PM
End of the guide?
March 12, 2025 at 11:47 PM
So... In OpenAI, prompt caching is enabled by default for prompts with 1024 or more, at least in gpt-4.5-preview, gpt-4o, mini...
March 12, 2025 at 11:47 PM
As an initial disclaimer, the way of doing it depends on what are you using. In this case I will use OpenAI API, but I will publish other guides with other AIs!
March 12, 2025 at 11:47 PM