Taras Trishchuk
trishchuk.bsky.social
Taras Trishchuk
@trishchuk.bsky.social
airdrop.tari.com
May 8, 2025 at 2:42 PM
arxiv.org
January 10, 2025 at 5:13 PM
Results:

Improved Qwen2.5-Math-7B accuracy from 58.8% to 90.0% on MATH dataset
The system solved 53.3% of AIME test problems (top 20% among participants)
Outperformed larger models on several key datasets
January 10, 2025 at 5:12 PM
Use of Process Preference Model (PPM) to evaluate and improve reasoning steps
Created a dataset of 747k math problems with verified solutions
January 10, 2025 at 5:12 PM
MCTS simulates step-by-step reasoning paths, mimicking deep human thinking
Each step includes a natural language explanation and Python code for validation
Self-evolution process through four rounds of mutual model improvement
January 10, 2025 at 5:11 PM
rStar-Math is a Microsoft technique that enhances the mathematical abilities of small language models using Monte Carlo Tree Search (MCTS) and self-evolution strategies.

Key features:
January 10, 2025 at 5:11 PM
arxiv.org
January 10, 2025 at 2:30 PM
To enable for all models, use

$ OLLAMA_FLASH_ATTENTION=1 OLLAMA_KV_CACHE_TYPE=q4_0 ollama serve
January 10, 2025 at 1:59 PM
Ollama 0.5.0

Added Experimental: new flag to set KV cache quantization to 4-bit, 8-bit, or 16-bit. This reduces VRAM requirements for longer context windows

*Note: in the future flash attention will be enabled by default where available, with KV cache quantization available on a per-model basis
January 10, 2025 at 1:57 PM
However, the approach has two significant limitations:

- Difficulty handling dynamic data
- Dependence on the model's maximum context length
January 10, 2025 at 1:31 PM
Main advantages:

- Instant generation without document search delays
- Reduced errors thanks to pre-computed KV-cache
- Simplified architecture without a separate search component
- Faster query processing
- Improved accuracy through unified, complete context
January 10, 2025 at 1:31 PM
The results are astonishing: the chip completed a task in just 5 minutes that would take its “competitor” longer than the lifetime of the universe.

Google plans to use Willow to train neural networks as well.

Welcome to the future.
December 10, 2024 at 11:36 AM
⚡️ this quantum chip features a record-breaking 105 qubits.

The breakthrough comes with the Willow chip, a powerful quantum processor that reduces error rates as it scales—a challenge scientists have struggled with for three decades.
December 10, 2024 at 11:36 AM
2. Willow performed a standard benchmark computation in under five minutes that would take one of today’s fastest supercomputers 10 septillion (that is, 10^25) years — a number that vastly exceeds the age of the Universe.
December 9, 2024 at 10:45 PM