Lightnews — Scholar-powered news

Ali Behrouz

@alibehrouz.bsky.social

640 followers 520 following 21 posts

Intern @Google, Ph.D. Student @Cornell_CS.
Interested in machine learning, LLM, brain, and healthcare.

abehrouz.github.io

Posts Replies Media Videos

Ali Behrouz

@alibehrouz.bsky.social

Are you interested in developing your own graph sequence model based on a specific sequence model? Stay tuned for the implementation of our framework that allows you to experiment with any combination of different sequence models and different tokenizations. Here is a sample of our results:

December 3, 2024 at 10:05 PM

Ali Behrouz

@alibehrouz.bsky.social

What about hybrid models? We show that when combining recurrent models with Transformers, the model can be more efficient for some tasks:

December 3, 2024 at 10:05 PM

Ali Behrouz

@alibehrouz.bsky.social

Finally, using HAC and what we learned from theoretical results, we present the GSM++ model that uses HAC to tokenize the graph, a GCN to encode local neighborhoods, and a hybrid architecture of SSM+Transformer for learning global dependencies.

December 3, 2024 at 10:04 PM

Ali Behrouz

@alibehrouz.bsky.social

What about connectivity tasks? Attempts to improve the quadratic cost with recurrent models result in a lack of parameter-efficient connectivity solutions. Again with proper ordering, recurrent models are efficient for these tasks! What is proper ordering? Graph needs to have a small node locality:

December 3, 2024 at 10:03 PM

Ali Behrouz

@alibehrouz.bsky.social

We motivate this by sensitivity of SSMs, which is linear with resp. to tokens’ distance: Similar to causal transformers, SSMs suffer from representational collapse when increasing the number of layers. So how can have a model that has good sensitivity but is robust to representational collapse?

December 3, 2024 at 10:01 PM

Ali Behrouz

@alibehrouz.bsky.social

We present a simple framework that includes many existing graph sequence models based on three simple steps: (1) tokenize the graph into a set of sequences, (2) local encoding that encodes tokens, and (3) global encoding that uses a sequence model to learn long-range dependencies.

December 3, 2024 at 9:57 PM

Ali Behrouz

@alibehrouz.bsky.social

🌟 Best of Both Worlds!

❓Have you ever wondered why hybrid models (RNN + Transformers) are powerful? We answer this through the lens of circuit complexity and graphs!

Excited to share our work on understanding Graph Sequence Models (GSM), which allows the use of any sequence model for graphs.

December 3, 2024 at 9:57 PM

Ali Behrouz

@alibehrouz.bsky.social

We show the importance of data-dependency in Chimera by a case study on image classification based on the brain response of a subject. We further show the selection mechanism in the S6 block and Mamba can be seen in Chimera but in both dimensions of time and variates: (7/8)

November 20, 2024 at 1:47 AM

Ali Behrouz

@alibehrouz.bsky.social

Using this 2D SSM, we present Chimera, a three-headed architecture that is capable of learning both long-term progression and seasonal patterns, using different discretization processes: (6/8)

November 20, 2024 at 1:46 AM

Ali Behrouz

@alibehrouz.bsky.social

We further discuss how S4ND-like extension of Mamba and Mamba-2 are the special cases of our 2D SSMs when restricting the transition matrices: (5/8)

November 20, 2024 at 1:46 AM

Ali Behrouz

@alibehrouz.bsky.social

Similar to S6, to enhance the power of 2D SSM, we let its parameters be the function of the input, resulting in losing convolution form. We show this 2D recurrence can be done using a parallel 2D scan based on a new associative operator, resulting in fast parallelizable training.

November 20, 2024 at 1:45 AM

Ali Behrouz

@alibehrouz.bsky.social

2D SSMs are based on linear Partial Differential Equation (PDE) with two variables (here are time and time series's variates). Using ZOH, we discretize the PDE, resulting in a 2D recurrence formula as follows: (2/8)

November 20, 2024 at 1:44 AM

Ali Behrouz

@alibehrouz.bsky.social

Are univariate SSMs effective when there are 2D dependencies?

✨In our #NeurIPS paper we show how to effectively model multivariate time series by input-dependent (selective) 2-dimensional state space models with fast training using a 2D parallel scan. (1/8)

November 20, 2024 at 1:43 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news