Flora Salim
@florasalim.bsky.social
Professor, CSE, UNSW Sydney. #AI #ML #UbiComp #LLM #MFM #timeseries #ST #multimodal #sensors #continuallearning #trustworthyAI ❤️ #coffee
Why am I here? Scouting for a new platform to discover and learn new papers (let’s see if it’s the one)
Why am I here? Scouting for a new platform to discover and learn new papers (let’s see if it’s the one)
Finally, we compared Bisecle to frontier models like GPT-4o and Gemini 2.5. Note that we can only use APIs to test them. We show that these frontier LLMs still struggle with temporal reasoning and dealing with non-stationary, evolving video tasks. In some tasks, Bisecle show superior performance
Bisecle: Binding and Separation in Continual Learning for Video Language Understanding
Frontier vision-language models (VLMs) have made remarkable improvements in video understanding tasks. However, real-world videos typically exist as continuously evolving data streams (e.g., dynamic s...
arxiv.org
September 24, 2025 at 1:07 PM
Finally, we compared Bisecle to frontier models like GPT-4o and Gemini 2.5. Note that we can only use APIs to test them. We show that these frontier LLMs still struggle with temporal reasoning and dealing with non-stationary, evolving video tasks. In some tasks, Bisecle show superior performance
- Bisecle exhibits remarkable resistance to forgetting
- Bisecle is compatible with LLMs from 1B to 13B, introducing only a small number of additional parameters and computational cost.
- Bisecle can achieve superior performance even in low-resource settings.
- Bisecle is compatible with LLMs from 1B to 13B, introducing only a small number of additional parameters and computational cost.
- Bisecle can achieve superior performance even in low-resource settings.
September 24, 2025 at 1:07 PM
- Bisecle exhibits remarkable resistance to forgetting
- Bisecle is compatible with LLMs from 1B to 13B, introducing only a small number of additional parameters and computational cost.
- Bisecle can achieve superior performance even in low-resource settings.
- Bisecle is compatible with LLMs from 1B to 13B, introducing only a small number of additional parameters and computational cost.
- Bisecle can achieve superior performance even in low-resource settings.
The results show that:
- Bisecle establishes a new SOTA results surpassing others in both accuracy (+15.79%) and forgetting reduction (8.49% lower Forgetting rate).
- Our method Bisecle consistently outperforms others, indicating strong robustness even when training data is limited.
- Bisecle establishes a new SOTA results surpassing others in both accuracy (+15.79%) and forgetting reduction (8.49% lower Forgetting rate).
- Our method Bisecle consistently outperforms others, indicating strong robustness even when training data is limited.
September 24, 2025 at 1:07 PM
The results show that:
- Bisecle establishes a new SOTA results surpassing others in both accuracy (+15.79%) and forgetting reduction (8.49% lower Forgetting rate).
- Our method Bisecle consistently outperforms others, indicating strong robustness even when training data is limited.
- Bisecle establishes a new SOTA results surpassing others in both accuracy (+15.79%) and forgetting reduction (8.49% lower Forgetting rate).
- Our method Bisecle consistently outperforms others, indicating strong robustness even when training data is limited.
The two components of Bisecle with complementary angles:
- multi-directional supervision mechanism improves knowledge preservation.
- contrastive prompt learning scheme is designed to isolate task-specific knowledge to facilitate efficient memory storage, and to explicitly mitigate update conflict.
- multi-directional supervision mechanism improves knowledge preservation.
- contrastive prompt learning scheme is designed to isolate task-specific knowledge to facilitate efficient memory storage, and to explicitly mitigate update conflict.
September 24, 2025 at 1:07 PM
The two components of Bisecle with complementary angles:
- multi-directional supervision mechanism improves knowledge preservation.
- contrastive prompt learning scheme is designed to isolate task-specific knowledge to facilitate efficient memory storage, and to explicitly mitigate update conflict.
- multi-directional supervision mechanism improves knowledge preservation.
- contrastive prompt learning scheme is designed to isolate task-specific knowledge to facilitate efficient memory storage, and to explicitly mitigate update conflict.
* Potential for LLM steering: The research explored the potential to manipulate ToM-related information within the LLMs to generate more aligned and contextually appropriate responses.
The first author - 1st year student Mehdi Jafari is attending his first academic conference #ACL2025.
The first author - 1st year student Mehdi Jafari is attending his first academic conference #ACL2025.
July 30, 2025 at 8:50 PM
* Potential for LLM steering: The research explored the potential to manipulate ToM-related information within the LLMs to generate more aligned and contextually appropriate responses.
The first author - 1st year student Mehdi Jafari is attending his first academic conference #ACL2025.
The first author - 1st year student Mehdi Jafari is attending his first academic conference #ACL2025.
* ToM-informed Alignment Improves Response Quality: Empirical evaluations of LLaMA-3 models (3B and 8B) demonstrated that incorporating ToM principles into the conversational agents improved response quality significantly, achieving win rates of 63% and 67% respectively.
July 30, 2025 at 8:50 PM
* ToM-informed Alignment Improves Response Quality: Empirical evaluations of LLaMA-3 models (3B and 8B) demonstrated that incorporating ToM principles into the conversational agents improved response quality significantly, achieving win rates of 63% and 67% respectively.
Key Findings:
* LLMs Can Represent and Retain ToM-related Constructs: The study investigated whether LLMs could represent and retain ToM-related constructs and found evidence supporting this ability.
* ToM-informed Alignment Improves Response Quality:
* LLMs Can Represent and Retain ToM-related Constructs: The study investigated whether LLMs could represent and retain ToM-related constructs and found evidence supporting this ability.
* ToM-informed Alignment Improves Response Quality:
July 30, 2025 at 8:50 PM
Key Findings:
* LLMs Can Represent and Retain ToM-related Constructs: The study investigated whether LLMs could represent and retain ToM-related constructs and found evidence supporting this ability.
* ToM-informed Alignment Improves Response Quality:
* LLMs Can Represent and Retain ToM-related Constructs: The study investigated whether LLMs could represent and retain ToM-related constructs and found evidence supporting this ability.
* ToM-informed Alignment Improves Response Quality:
In Beyond Words, we explore:
a) The extent to which the activation space of LLMs represents ToM of interlocutors,
b) Whether these representations form a consistent model of ToM,
and
c) How can we leverage ToM-related features to generate more aligned responses?
a) The extent to which the activation space of LLMs represents ToM of interlocutors,
b) Whether these representations form a consistent model of ToM,
and
c) How can we leverage ToM-related features to generate more aligned responses?
July 30, 2025 at 8:50 PM
In Beyond Words, we explore:
a) The extent to which the activation space of LLMs represents ToM of interlocutors,
b) Whether these representations form a consistent model of ToM,
and
c) How can we leverage ToM-related features to generate more aligned responses?
a) The extent to which the activation space of LLMs represents ToM of interlocutors,
b) Whether these representations form a consistent model of ToM,
and
c) How can we leverage ToM-related features to generate more aligned responses?
Current LLMs often generate contextually appropriate responses, but they don’t truly understand the user's goals, beliefs, or misunderstandings.
Using ToM, we can analyse interlocutor behaviours based on the understanding of their mental and emotional states.
Using ToM, we can analyse interlocutor behaviours based on the understanding of their mental and emotional states.
July 30, 2025 at 8:50 PM
Current LLMs often generate contextually appropriate responses, but they don’t truly understand the user's goals, beliefs, or misunderstandings.
Using ToM, we can analyse interlocutor behaviours based on the understanding of their mental and emotional states.
Using ToM, we can analyse interlocutor behaviours based on the understanding of their mental and emotional states.
The Brick-by-Brick 2024 challenge focuses only on the multi-label classification problem, which we consider to be harder, and the holy grail for automation and management of net-zero and sustainable buildings.
Round 2 of Brick by Brick 2024 has commenced! To join: www.aicrowd.com/challenges/b...
Round 2 of Brick by Brick 2024 has commenced! To join: www.aicrowd.com/challenges/b...
AIcrowd | Brick by Brick 2024 | Challenges
Automating Building Data Classification
www.aicrowd.com
January 19, 2025 at 11:23 AM
The Brick-by-Brick 2024 challenge focuses only on the multi-label classification problem, which we consider to be harder, and the holy grail for automation and management of net-zero and sustainable buildings.
Round 2 of Brick by Brick 2024 has commenced! To join: www.aicrowd.com/challenges/b...
Round 2 of Brick by Brick 2024 has commenced! To join: www.aicrowd.com/challenges/b...
The task also tackles issues like imbalanced data and sparse labels, all while addressing real-world problems like building optimization and sustainability.
Our NeurIPS 2024 paper includes both a multi-label classification benchmark and a zero-shot forecasting benchmark.
neurips.cc/virtual/2024...
Our NeurIPS 2024 paper includes both a multi-label classification benchmark and a zero-shot forecasting benchmark.
neurips.cc/virtual/2024...
NeurIPS Poster Building Timeseries Dataset: Empowering Large-Scale Building AnalyticsNeurIPS 2024
neurips.cc
January 19, 2025 at 11:23 AM
The task also tackles issues like imbalanced data and sparse labels, all while addressing real-world problems like building optimization and sustainability.
Our NeurIPS 2024 paper includes both a multi-label classification benchmark and a zero-shot forecasting benchmark.
neurips.cc/virtual/2024...
Our NeurIPS 2024 paper includes both a multi-label classification benchmark and a zero-shot forecasting benchmark.
neurips.cc/virtual/2024...
BTS is more challenging than existing TS datasets and a lot more interesting because BTS captures the complexities of real-world operations: 1) Temporal Irregularity; 2) Spatial Heterogeneity; 3) Long-tail Distribution.
-- requires models to manage hierarchical dependencies and ensure consistency.
-- requires models to manage hierarchical dependencies and ensure consistency.
January 19, 2025 at 11:23 AM
BTS is more challenging than existing TS datasets and a lot more interesting because BTS captures the complexities of real-world operations: 1) Temporal Irregularity; 2) Spatial Heterogeneity; 3) Long-tail Distribution.
-- requires models to manage hierarchical dependencies and ensure consistency.
-- requires models to manage hierarchical dependencies and ensure consistency.
** Knowledge Graph (KG)
BTS also includes a KG that captures the relationships between TS and their physical, logical, and virtual entities.
Making it a great case for Hierarchical Multi-Label Classification. The TS are to be classified across nested categories (e.g. Point>Sensor>Air Quality>CO2).
BTS also includes a KG that captures the relationships between TS and their physical, logical, and virtual entities.
Making it a great case for Hierarchical Multi-Label Classification. The TS are to be classified across nested categories (e.g. Point>Sensor>Air Quality>CO2).
January 19, 2025 at 11:23 AM
** Knowledge Graph (KG)
BTS also includes a KG that captures the relationships between TS and their physical, logical, and virtual entities.
Making it a great case for Hierarchical Multi-Label Classification. The TS are to be classified across nested categories (e.g. Point>Sensor>Air Quality>CO2).
BTS also includes a KG that captures the relationships between TS and their physical, logical, and virtual entities.
Making it a great case for Hierarchical Multi-Label Classification. The TS are to be classified across nested categories (e.g. Point>Sensor>Air Quality>CO2).