At first it's magical to run 4o mini level models locally, but then you realize any non chat use is bursty and spending 20c calling cloud inference in parallel saves ten minutes of waiting around.
Not sure the equation would shift even with double the VRAM.
At first it's magical to run 4o mini level models locally, but then you realize any non chat use is bursty and spending 20c calling cloud inference in parallel saves ten minutes of waiting around.
Not sure the equation would shift even with double the VRAM.