Lightnews — Scholar-powered news

Ramchalam K R

@ramchalamkr.bsky.social

110 followers 740 following 15 posts

Ml Research Engineer at the intersection of model training and efficient inference on NPUs

Posts Replies Media Videos

Ramchalam K R

@ramchalamkr.bsky.social

@qualcomm.bsky.social @machinelearning.bsky.social

September 19, 2025 at 11:41 AM

Ramchalam K R

@ramchalamkr.bsky.social

Gosh. Thanks I got confused if this was another pack. Apologies:)

December 14, 2024 at 12:06 AM

Ramchalam K R

@ramchalamkr.bsky.social

I'd like to be added. Thanks

December 14, 2024 at 12:02 AM

Ramchalam K R

@ramchalamkr.bsky.social

Alright thank you for the clarification.

December 11, 2024 at 10:19 PM

Ramchalam K R

@ramchalamkr.bsky.social

I did work on structured pruning on weights a few years ago and as we were focused on deployment to edge devices , it was critical. But this approach on the activation/attention head is interesting although the inference graph wouldn't really change on the base model. Would love to further discuss.

December 10, 2024 at 4:11 PM

Ramchalam K R

@ramchalamkr.bsky.social

I do have a few points of clarification. Maybe I will drop by on the poster session.
But the stage 1 looks to me like it's structured pruning on the activation. What I am curious about is does this approach help improving inference ? So I presume we won't need to do compute for certain heads.

December 10, 2024 at 4:08 PM