Uses Group Relative Policy Optimization (GRPO) instead of Proximal Policy Optimization (PPO): foregoes critic model same size as policy model, instead estimates baseline from group scores instead, using the average reward of multiple samples to reduce memory use.
January 21, 2025 at 2:19 AM
Uses Group Relative Policy Optimization (GRPO) instead of Proximal Policy Optimization (PPO): foregoes critic model same size as policy model, instead estimates baseline from group scores instead, using the average reward of multiple samples to reduce memory use.
In addition to open source, DeepSeek-R1 is significant because it’s complete reinforcement learning (RL), no supervised fine-tuning (SFT)(“cold start”). Reminiscent of AlphaZero (which mastered Go, Shogi, and Chess from scratch, without playing against human grandmasters).
January 21, 2025 at 2:18 AM
In addition to open source, DeepSeek-R1 is significant because it’s complete reinforcement learning (RL), no supervised fine-tuning (SFT)(“cold start”). Reminiscent of AlphaZero (which mastered Go, Shogi, and Chess from scratch, without playing against human grandmasters).
11. Bob Nelson: market is provisionally open. If have strong shareholder base already and book ready, then market’s open. Biotech IPOs are funding events: ARCH doesn’t view IPOs as exits, will stay past IPO for 3–4yrs till clinical milestone.
January 17, 2025 at 2:03 AM
11. Bob Nelson: market is provisionally open. If have strong shareholder base already and book ready, then market’s open. Biotech IPOs are funding events: ARCH doesn’t view IPOs as exits, will stay past IPO for 3–4yrs till clinical milestone.
9. Org matter in decision making. e.g. Merck organized into Research vs Development. J&J organized along indication areas. Do you invest on risk, or on inflection points?
January 17, 2025 at 2:03 AM
9. Org matter in decision making. e.g. Merck organized into Research vs Development. J&J organized along indication areas. Do you invest on risk, or on inflection points?
7. Pharmas need to look over their shoulders prior to billion-dollar acquisitions in case generics come out of China in a few years with the same MOA. One pharma CEO, “we have to get the cost of R&D down to be competitive.”
January 17, 2025 at 2:03 AM
7. Pharmas need to look over their shoulders prior to billion-dollar acquisitions in case generics come out of China in a few years with the same MOA. One pharma CEO, “we have to get the cost of R&D down to be competitive.”
6. Large deals tend to result in cost cuts, not topline growth rates, and this industry trades on topline growth rate. Bolt-ons and mega-billion dollar deals — barbell strategy — may be in 2025.
January 17, 2025 at 2:03 AM
6. Large deals tend to result in cost cuts, not topline growth rates, and this industry trades on topline growth rate. Bolt-ons and mega-billion dollar deals — barbell strategy — may be in 2025.
5. 2023 was a record M&A year, $130bn. 2024 was a digestion year: not horrible for the number of deals, but private deals because capital markets closed. Scale is imprt in pharma, drives how much R&D is allocated. Previous admin was against large deals. New admin not against.
January 17, 2025 at 2:02 AM
5. 2023 was a record M&A year, $130bn. 2024 was a digestion year: not horrible for the number of deals, but private deals because capital markets closed. Scale is imprt in pharma, drives how much R&D is allocated. Previous admin was against large deals. New admin not against.
4. IRA shifted focus to bigger cancers, may be here to stay. Biologics and small molecules timelines may not be aligned, 13 vs 9. Small molecules have challenges with tox, and only have 9yrs to recoup investment. There could hopefully be bipartisan support to even this 9 vs 13.
January 17, 2025 at 2:02 AM
4. IRA shifted focus to bigger cancers, may be here to stay. Biologics and small molecules timelines may not be aligned, 13 vs 9. Small molecules have challenges with tox, and only have 9yrs to recoup investment. There could hopefully be bipartisan support to even this 9 vs 13.
3. Saw a lot of fast following the last few years: 3–4 drugs on same MOA is hard to get a return. Do VCs shift to lower risk lower reward investments instead?
January 17, 2025 at 2:02 AM
3. Saw a lot of fast following the last few years: 3–4 drugs on same MOA is hard to get a return. Do VCs shift to lower risk lower reward investments instead?
2. Last year’s IPOs, 80% are below water, thus capitalize your company such that you aren’t dependent on an IPO. Have optionality. Is M&A the goal? If you are taking a drug to market, you may not have other options but to IPO.
January 17, 2025 at 2:02 AM
2. Last year’s IPOs, 80% are below water, thus capitalize your company such that you aren’t dependent on an IPO. Have optionality. Is M&A the goal? If you are taking a drug to market, you may not have other options but to IPO.
SLM as a process preference model (PPM) to predict reward labels for each reasoning step. Q-values can reliably distinguish positive (correct) steps from negative. Using preference pairs and pairwise ranking loss, instead of direct Q-values, eliminate the inherently noise. 6/n
January 12, 2025 at 4:47 PM
SLM as a process preference model (PPM) to predict reward labels for each reasoning step. Q-values can reliably distinguish positive (correct) steps from negative. Using preference pairs and pairwise ranking loss, instead of direct Q-values, eliminate the inherently noise. 6/n
SLM samples candidate nodes, each generating CoT and corresponding Python code. Only nodes with successful execution are retained. MCTS automatically assign (self-annotate) a Q-value to each intermediate step based on its contribution: more trajectories=higher Q. 5/n
January 12, 2025 at 4:47 PM
SLM samples candidate nodes, each generating CoT and corresponding Python code. Only nodes with successful execution are retained. MCTS automatically assign (self-annotate) a Q-value to each intermediate step based on its contribution: more trajectories=higher Q. 5/n
Process reward modeling (PRM) provides fine-grained feedback on intermediate steps because incorrect intermediate steps significantly decrease data quality in math. 4/n
January 12, 2025 at 4:47 PM
Process reward modeling (PRM) provides fine-grained feedback on intermediate steps because incorrect intermediate steps significantly decrease data quality in math. 4/n
Result: “4 rounds of self-evolution with millions of synthesized solutions for 747k math problems … it improves Qwen2.5-Math-7B from 58.8% to 90.0% and Phi3-mini-3.8B from 41.4% to 86.4%, surpassing o1-preview by +4.5% and +0.9%.” 3/n
January 12, 2025 at 4:47 PM
Result: “4 rounds of self-evolution with millions of synthesized solutions for 747k math problems … it improves Qwen2.5-Math-7B from 58.8% to 90.0% and Phi3-mini-3.8B from 41.4% to 86.4%, surpassing o1-preview by +4.5% and +0.9%.” 3/n