Lightnews — Scholar-powered news

@fermaat.bsky.social

Overall: CrewAI is super powerful & complete! Great intuitive framework for high-level users. Worth trying!

January 25, 2025 at 11:04 AM

@fermaat.bsky.social

Callbacks feature is neat for handling mid-processes, but tool logging could be more flexible.

Process flows can be sequential/hierarchical. Works for most cases but feels a bit rigid. Custom processing is possible but not as user-friendly.

January 25, 2025 at 11:04 AM

Fernando Velasco Lozano

@fermaat.bsky.social

Setup is super intuitive! Love how YAML files handle prompt governance, even without built-in tracking. The out-of-the-box tools are amazing - plug & play ready for quick experiments.

One thing though: RAG integration feels off as a tool. IMO retrievers should be part of the core logic.

January 25, 2025 at 11:04 AM

Fernando Velasco Lozano

@fermaat.bsky.social

Back to intruder dimensions, I feel like the different hyper parameters do play a key role on this. The matrix product would be my main suspect however. Thinking about phenomenal like gradient exploding on RNNs, it seems like this had an earlier effect.

November 23, 2024 at 11:19 AM

Fernando Velasco Lozano

@fermaat.bsky.social

It is interesting how the paper studies the concept of pseudo loss (unrigorously, how pre-training has been forgotten), I quite loved the U-shape on certain LoRAs. Is there an optimum rank value for LoRA defined by this? So cool!

November 23, 2024 at 11:19 AM

Fernando Velasco Lozano

@fermaat.bsky.social

The paper suggests LoRA tends to forget part of the knowledge the pre-trained model had. This is quite intuitive as, in my head, LoRA tends to specialise with fewer parameters on a more closed task.

November 23, 2024 at 11:19 AM

Fernando Velasco Lozano

@fermaat.bsky.social

Essentially, if we were to decompose the model weights using a SVD both on pre-training and LoRA versions, intruder dimensions would be singular vectors with high singular values on the LoRA SVD that are orthogonal to the ones on pre-training.

November 23, 2024 at 11:19 AM

Fernando Velasco Lozano

@fermaat.bsky.social

LoRA forgetting part of the pre-training knowledge is not something unexpected (LoRA has fewer params, thus, lower complexity, more prone to adapt faster and perhaps, forget pre-training). However, the way it is described on the paper is fascinating, specially on the concept of intruder dimensions.

November 23, 2024 at 11:19 AM

Fernando Velasco Lozano

@fermaat.bsky.social

Perhaps we need to change perspectives and try new things. Are we ready for a new BERT-like moment?

November 17, 2024 at 7:49 AM

Fernando Velasco Lozano

@fermaat.bsky.social

BERT and Transformer-like architectures (please, remember LLMs are still mostly in this set) are still very active and evolving, but it seems (and some main figures have been talking about this recently) that its evolution might have been plateaued.

November 17, 2024 at 7:48 AM

Fernando Velasco Lozano

@fermaat.bsky.social

Quite astonishing tech features! Since then, I’ve used BERT or BERT-like architectures, primarily adapting it to specific domains. Its fine-tuning adaptability seemed to be one of the biggest features. It quite democratised access to SoTA NLP and inspired quite a wave of new research

November 17, 2024 at 7:47 AM

Fernando Velasco Lozano

@fermaat.bsky.social

This had a big impact on BERT topping all leaderboards back then. Another cool tech feature is how it combined two techniques for training: Masked Language Model (in my mind, pretty similar to CBOW) and Next Sentence Prediction. Both were novel techniques on training transformers architectures

November 17, 2024 at 7:47 AM

Fernando Velasco Lozano

@fermaat.bsky.social

BERT stands for Bidirectional Encoder Representations from Transformers, and indeed, its architecture reflects this: it's actually an encoder-only model (though you can always plug two BERTs for an encoder-decoder architecture) that uses bidirectional self attention.

November 17, 2024 at 7:45 AM

Fernando Velasco Lozano

@fermaat.bsky.social

I was amazed on BERT’s generative capabilities. It could actually generate text! Not at today’s LLM level, but for me it was great. Squad2 link: lnkd.in/dsKYz5PK

This link will take you to a page that’s not on LinkedIn

lnkd.in

November 17, 2024 at 7:44 AM

Fernando Velasco Lozano

@fermaat.bsky.social

I remember my first BERT fine tuning, a few days after the release: adapting it to Spanish SQuAD2. A pretty straightforward approach: translation and training. The results were not the best, of course, but I was deeply enjoying the process and seeing BERT as a true keystone on the development of ML.

November 17, 2024 at 7:43 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news