Andreas Steiner
banner
andreaspsteiner.bsky.social
Andreas Steiner
@andreaspsteiner.bsky.social
Researching #ComputerVision at #GoogleDeepMind using JAX/Flax (http://github.com/google/flax). Views are my own.
December 5, 2024 at 6:19 PM
If you want to know more, now is a good time to head over to the 31 page tech report.

Brought to you by an amazing team of collaborators from
@GoogleDeepMind
and
@GoogleAI
.

arxiv.org/abs/2412.03555

6/7
December 5, 2024 at 6:18 PM
In addition to the pre-trained checkpoints, we also release two checkpoints fine-tuned on the DOCCI dataset, which generate fine-grained captions with a great quality/compute trade-off – and no yapping!

5/7
December 5, 2024 at 6:18 PM
After 🪄finetuning🪄 on your data, you can expect to see great results, like the sota we got on recognizing table structures, music scores, molecular structures, and text, and on radiography report generation.

4/7
December 5, 2024 at 6:17 PM
As the original PaliGemma, the pre-trained PaliGemma 2 models have segmentation and detection capabilities, and excel at OCR – which makes them extremely versatile for 🪄finetuning🪄. The original demo hf.co/spaces/big-v... gives you an idea of the capabilities.

3/7
December 5, 2024 at 6:17 PM
Adding this new "model size" dimension unlocks substantial improvements for some tasks (blue, e.g. AI2D), and compounds with improvements from increased resolution for most tasks (green, e.g. InfoVQA).

2/7
December 5, 2024 at 6:16 PM