Using 𝙃𝙮𝙥𝙚𝙧-𝙣𝙚𝙩𝙬𝙤𝙧𝙠—a smaller fine-tuned hyper-network that dynamically generates weights for each expert based on all concatenated demonstration subsets. (🧵4/n)
Using 𝙃𝙮𝙥𝙚𝙧-𝙣𝙚𝙩𝙬𝙤𝙧𝙠—a smaller fine-tuned hyper-network that dynamically generates weights for each expert based on all concatenated demonstration subsets. (🧵4/n)
Using 𝙨𝙘𝙖𝙡𝙖𝙧 weights—a vector of trainable parameters that assign each expert a weight—we fine-tuned how demonstration subsets are combined. (🧵3/n)
Using 𝙨𝙘𝙖𝙡𝙖𝙧 weights—a vector of trainable parameters that assign each expert a weight—we fine-tuned how demonstration subsets are combined. (🧵3/n)
💡Mixtures of In-Context Learners (𝗠𝗼𝗜𝗖𝗟): we treat LLMs prompted with subsets of demonstrations as experts and learn a weighting function to optimise the distribution over the continuation (🧵1/n)
💡Mixtures of In-Context Learners (𝗠𝗼𝗜𝗖𝗟): we treat LLMs prompted with subsets of demonstrations as experts and learn a weighting function to optimise the distribution over the continuation (🧵1/n)