Tivadar Danka
@tivadardanka.bsky.social
I make math accessible for everyone. Mathematician with an INTJ personality. Chaotic good. Writing https://thepalindrome.org
Join 35,000+ ML practitioners who get 2 actionable emails every week to help them understand the math behind ML, make smarter decisions, and avoid costly mistakes.
Subscribe here (it’s free): thepalindrome.org/
Subscribe here (it’s free): thepalindrome.org/
The Palindrome | Tivadar Danka | Substack
mathematics ∪ machine learning. Click to read The Palindrome, a Substack publication with tens of thousands of subscribers.
thepalindrome.org
November 11, 2025 at 1:01 PM
Join 35,000+ ML practitioners who get 2 actionable emails every week to help them understand the math behind ML, make smarter decisions, and avoid costly mistakes.
Subscribe here (it’s free): thepalindrome.org/
Subscribe here (it’s free): thepalindrome.org/
Peter Lax sums it up perfectly: "So what is gained by abstraction? First of all, the freedom to use a single symbol for an array; this way we can think of vectors as basic building blocks, unencumbered by components."
November 11, 2025 at 1:00 PM
Peter Lax sums it up perfectly: "So what is gained by abstraction? First of all, the freedom to use a single symbol for an array; this way we can think of vectors as basic building blocks, unencumbered by components."
Linear algebra is powerful exactly because it abstracts away the complexity of manipulating data structures like vectors and matrices.
Instead of explicitly dealing with arrays and convoluted sums, we can use simple expressions AB.
That's a huge deal.
Instead of explicitly dealing with arrays and convoluted sums, we can use simple expressions AB.
That's a huge deal.
November 11, 2025 at 1:00 PM
Linear algebra is powerful exactly because it abstracts away the complexity of manipulating data structures like vectors and matrices.
Instead of explicitly dealing with arrays and convoluted sums, we can use simple expressions AB.
That's a huge deal.
Instead of explicitly dealing with arrays and convoluted sums, we can use simple expressions AB.
That's a huge deal.
The same logic can be applied, thus giving an explicit formula to calculate the elements of a matrix product.
November 11, 2025 at 1:00 PM
The same logic can be applied, thus giving an explicit formula to calculate the elements of a matrix product.
We can collapse the linear combination into a single vector, resulting in a formula for the first column of AB.
This is straight from the mysterious matrix product formula.
This is straight from the mysterious matrix product formula.
November 11, 2025 at 1:00 PM
We can collapse the linear combination into a single vector, resulting in a formula for the first column of AB.
This is straight from the mysterious matrix product formula.
This is straight from the mysterious matrix product formula.
Recall that matrix-vector products are linear combinations of column vectors.
With this in mind, we see that the first column of AB is the linear combination of A's columns. (With coefficients from the first column of B.)
With this in mind, we see that the first column of AB is the linear combination of A's columns. (With coefficients from the first column of B.)
November 11, 2025 at 1:00 PM
Recall that matrix-vector products are linear combinations of column vectors.
With this in mind, we see that the first column of AB is the linear combination of A's columns. (With coefficients from the first column of B.)
With this in mind, we see that the first column of AB is the linear combination of A's columns. (With coefficients from the first column of B.)
Now, about the matrix product formula.
From a geometric perspective, the product AB is the same as first applying B, then A to our underlying space.
From a geometric perspective, the product AB is the same as first applying B, then A to our underlying space.
November 11, 2025 at 1:00 PM
Now, about the matrix product formula.
From a geometric perspective, the product AB is the same as first applying B, then A to our underlying space.
From a geometric perspective, the product AB is the same as first applying B, then A to our underlying space.
(If unwrapping the matrix-vector product seems too complex, I got you.
The computation below is the same as in the above tweet, only in vectorized form.)
The computation below is the same as in the above tweet, only in vectorized form.)
November 11, 2025 at 1:00 PM
(If unwrapping the matrix-vector product seems too complex, I got you.
The computation below is the same as in the above tweet, only in vectorized form.)
The computation below is the same as in the above tweet, only in vectorized form.)
Moreover, we can look at a matrix-vector product as a linear combination of the column vectors.
Make a mental note of this, because it is important.
Make a mental note of this, because it is important.
November 11, 2025 at 1:00 PM
Moreover, we can look at a matrix-vector product as a linear combination of the column vectors.
Make a mental note of this, because it is important.
Make a mental note of this, because it is important.
Matrices represent linear transformations. You know, those that stretch, skew, rotate, flip, or otherwise linearly distort the space.
The images of basis vectors form the columns of the matrix.
We can visualize this in two dimensions.
The images of basis vectors form the columns of the matrix.
We can visualize this in two dimensions.
November 11, 2025 at 1:00 PM
Matrices represent linear transformations. You know, those that stretch, skew, rotate, flip, or otherwise linearly distort the space.
The images of basis vectors form the columns of the matrix.
We can visualize this in two dimensions.
The images of basis vectors form the columns of the matrix.
We can visualize this in two dimensions.
By the same logic, we conclude that A times eₖ equals the k-th column of A.
This sounds a bit algebra-y, so let's see this idea in geometric terms.
Yes, you heard right: geometric terms.
This sounds a bit algebra-y, so let's see this idea in geometric terms.
Yes, you heard right: geometric terms.
November 11, 2025 at 1:00 PM
By the same logic, we conclude that A times eₖ equals the k-th column of A.
This sounds a bit algebra-y, so let's see this idea in geometric terms.
Yes, you heard right: geometric terms.
This sounds a bit algebra-y, so let's see this idea in geometric terms.
Yes, you heard right: geometric terms.
Similarly, multiplying A with a (column) vector whose second component is 1 and the rest is 0 yields the second column of A.
That's a pattern!
That's a pattern!
November 11, 2025 at 1:00 PM
Similarly, multiplying A with a (column) vector whose second component is 1 and the rest is 0 yields the second column of A.
That's a pattern!
That's a pattern!
Now, let's look at a special case: multiplying the matrix A with a (column) vector whose first component is 1, and the rest is 0.
Let's name this special vector e₁.
Turns out that the product of A and e₁ is the first column of A.
Let's name this special vector e₁.
Turns out that the product of A and e₁ is the first column of A.
November 11, 2025 at 1:00 PM
Now, let's look at a special case: multiplying the matrix A with a (column) vector whose first component is 1, and the rest is 0.
Let's name this special vector e₁.
Turns out that the product of A and e₁ is the first column of A.
Let's name this special vector e₁.
Turns out that the product of A and e₁ is the first column of A.
Here is a quick visualization before the technical details.
The element in the i-th row and j-th column of AB is the dot product of A's i-th row and B's j-th column.
The element in the i-th row and j-th column of AB is the dot product of A's i-th row and B's j-th column.
November 11, 2025 at 1:00 PM
Here is a quick visualization before the technical details.
The element in the i-th row and j-th column of AB is the dot product of A's i-th row and B's j-th column.
The element in the i-th row and j-th column of AB is the dot product of A's i-th row and B's j-th column.
First, the raw definition.
This is how the product of A and B is given. Not the easiest (or most pleasant) to look at.
We are going to unwrap this.
This is how the product of A and B is given. Not the easiest (or most pleasant) to look at.
We are going to unwrap this.
November 11, 2025 at 1:00 PM
First, the raw definition.
This is how the product of A and B is given. Not the easiest (or most pleasant) to look at.
We are going to unwrap this.
This is how the product of A and B is given. Not the easiest (or most pleasant) to look at.
We are going to unwrap this.
Most machine learning practitioners don’t understand the math behind their models.
That's why I've created a FREE roadmap so you can master the 3 main topics you'll ever need: algebra, calculus, and probabilities.
Get the roadmap here: thepalindrome.org/p/the-roadm...
That's why I've created a FREE roadmap so you can master the 3 main topics you'll ever need: algebra, calculus, and probabilities.
Get the roadmap here: thepalindrome.org/p/the-roadm...
The Roadmap of Mathematics for Machine Learning
A complete guide to linear algebra, calculus, and probability theory
thepalindrome.org
November 10, 2025 at 1:00 PM
Most machine learning practitioners don’t understand the math behind their models.
That's why I've created a FREE roadmap so you can master the 3 main topics you'll ever need: algebra, calculus, and probabilities.
Get the roadmap here: thepalindrome.org/p/the-roadm...
That's why I've created a FREE roadmap so you can master the 3 main topics you'll ever need: algebra, calculus, and probabilities.
Get the roadmap here: thepalindrome.org/p/the-roadm...
There are drawbacks, like the slow convergence, which has a rate of 1/√n, where n is the number of points selected.
However, there is no denying it: estimating the area of an object by throwing random points is pretty awesome.
However, there is no denying it: estimating the area of an object by throwing random points is pretty awesome.
November 10, 2025 at 1:00 PM
There are drawbacks, like the slow convergence, which has a rate of 1/√n, where n is the number of points selected.
However, there is no denying it: estimating the area of an object by throwing random points is pretty awesome.
However, there is no denying it: estimating the area of an object by throwing random points is pretty awesome.
The general method is called "Monte Carlo integration", and as the name suggests, it can be used to evaluate integrals of chunky functions. Even ones with lots of variables.
November 10, 2025 at 1:00 PM
The general method is called "Monte Carlo integration", and as the name suggests, it can be used to evaluate integrals of chunky functions. Even ones with lots of variables.
Combining these two observations, we get that the frequency of hits converges to the ratio of the areas.
Thus, we can approximate the area by simply counting the number of hits.
This is one of the coolest ideas in mathematics.
Thus, we can approximate the area by simply counting the number of hits.
This is one of the coolest ideas in mathematics.
November 10, 2025 at 1:00 PM
Combining these two observations, we get that the frequency of hits converges to the ratio of the areas.
Thus, we can approximate the area by simply counting the number of hits.
This is one of the coolest ideas in mathematics.
Thus, we can approximate the area by simply counting the number of hits.
This is one of the coolest ideas in mathematics.
On the other hand, the expected value is the probability of a hit.
That is, the area of our shape divided by the area of the rectangular board!
That is, the area of our shape divided by the area of the rectangular board!
November 10, 2025 at 1:00 PM
On the other hand, the expected value is the probability of a hit.
That is, the area of our shape divided by the area of the rectangular board!
That is, the area of our shape divided by the area of the rectangular board!
On a second look, the average can be written as the total number of hits divided by the total number of points.
(Recall that the value of each variable is 0 if it is a miss and 1 if it is a hit.)
(Recall that the value of each variable is 0 if it is a miss and 1 if it is a hit.)
November 10, 2025 at 1:00 PM
On a second look, the average can be written as the total number of hits divided by the total number of points.
(Recall that the value of each variable is 0 if it is a miss and 1 if it is a hit.)
(Recall that the value of each variable is 0 if it is a miss and 1 if it is a hit.)
Thus, the Law of Large Numbers holds: their average (almost surely) converges to the expected value.
(Our Bernoulli variables are identically distributed, so their expected values are equal.)
(Our Bernoulli variables are identically distributed, so their expected values are equal.)
November 10, 2025 at 1:00 PM
Thus, the Law of Large Numbers holds: their average (almost surely) converges to the expected value.
(Our Bernoulli variables are identically distributed, so their expected values are equal.)
(Our Bernoulli variables are identically distributed, so their expected values are equal.)