Tuan Trinh
tuantx.bsky.social
Tuan Trinh
@tuantx.bsky.social
Data, Machine Learning and Entreprenuership
Pinned
Happy that our technology in AI recommendation is granted USA patent.
Reposted by Tuan Trinh
Entropy is one of those formulas that many of us learn, swallow whole, and even use regularly without really understanding.

(E.g., where does that “log” come from? Are there other possible formulas?)

Yet there's an intuitive & almost inevitable way to arrive at this expression.
December 9, 2024 at 10:44 PM
Reposted by Tuan Trinh
Super interesting (IMHO) example of HNSW limits and approximated behavior. In this test we have 2 out of 19167 nodes are not reachable doing a hnsw_search(their_vector). Why? Because of ambiguity: they are also names, so the search will enter a local minima (at EF=200).
December 17, 2024 at 11:49 AM
Reposted by Tuan Trinh
Slides for "Table Foundation Models"

I explain why these models can strongly outperform tree-based models, what are the intuitions,
hopefully pointing to ways forward for more improvement

speakerdeck.com/gaelvaroquau...
Table foundation models for analytics
Deep-learning typically does not outperform tree-based models on tabular data. Often this may be explained by the small size of such datasets. For image…
speakerdeck.com
December 15, 2024 at 10:43 PM
Reposted by Tuan Trinh
I was recently on a panel on "What are important data systems problems, ignored by research?" with @andypavlo.bsky.social and Allison Lee moderated by Viktor Leis - here is the write-up of the discussion databasearchitects.blogspot.com/2024/12/what...
What are important data systems problems, ignored by research?
A blog by and for database architects.
databasearchitects.blogspot.com
December 13, 2024 at 7:58 AM
Reposted by Tuan Trinh
Some folk from software internals discord and I read the series of disaggregated oltp papers and met to talk about them. I wrote an informal overview of the papers and a summary of some of the discussion after each paper: transactional.blog/n...
Notes On: Disaggregated OLTP Systems
Aurora, Socrates, PolarDB, and Taurus.
transactional.blog
December 7, 2024 at 5:52 AM
Reposted by Tuan Trinh
While there are countless code examples to learn from, formal models are harder to find 👀

Blog posts exploring modeling approaches are a rare chance to sharpen your skills ❤️

bsky.app/profile/domi...
I am enjoying Lorin Hochstein's series of blog posts exploring complex concepts using formal modeling tools like TLA+ and Alloy

Uncompromising understanding 🏴‍☠️

bsky.app/profile/noro...
November 28, 2024 at 4:25 PM
Reposted by Tuan Trinh
The Porcupine linearizability checker is really cool: github.com/anishathalye...

I love ideas like P-compositionality (arxiv.org/pdf/1504.00204) - it's something nobody else thought of, that seems so obvious in hindsight. A relatively small insight that vastly simplifies a tough problem.
November 13, 2024 at 4:35 PM
Reposted by Tuan Trinh


New version (v1.9.1) of Geogram, the award-winning geometry processing library is out !

New in this version:

- much faster (2x speed) large-scale periodic Delaunay triangulation and power diagrams

- Linsolve/GPU: AMGCL + new nlCuda backend goes brrrr !

github.com/BrunoLevy/ge...
November 26, 2024 at 8:22 AM
Reposted by Tuan Trinh
When trying to compute a dual of a composite problem involving two functions and two linear operators (e.g., Total Variation regularization of inverse problems), it is sometimes useful to consider either of the operators as the dual operator.
November 28, 2024 at 6:00 AM
Reposted by Tuan Trinh
When you first learn about the fork() syscall, it can seem magical. How can a single system call produce two different return values at the same time?!

In my latest article, I demystify the hidden magic of fork and also show how it is implemented in Linux.
blog.codingconfessions.com/p/the-magic-...
Disillusioning the Magic of the fork System Call
How the kernels implement the fork system call
blog.codingconfessions.com
November 27, 2024 at 11:37 AM
Reposted by Tuan Trinh
Andrew Ng released "aisuite", so we added it to observes. Start observing your AI models but then lightweight.

`pip install observers[aisuite] # or observers[litellm]`

Release:
github.com/cfahlgren1/o...
Release 0.1.3 - Support for `aisuite` and `litellm` · cfahlgren1/observers
What's Changed feat: initial packaged version by @davidberenstein1957 in #2 feat: argilla support by @davidberenstein1957 in #3 add datasets example by @cfahlgren1 in #4 Improve quickstart example...
github.com
November 27, 2024 at 11:19 AM
Reposted by Tuan Trinh
I made a notebook with a few notes on Diffusion Models for a "tutorial" in a project-workshop yesterday. Not really an introduction, but I give some insights that I usually don't see elsewhere. Feel free to reuse.
🔗https://colab.research.google.com/drive/1EyqALXFvgKGsTiFDALGEHH5-WnuGjOKU?usp=sharing
Google Colab
colab.research.google.com
November 23, 2024 at 11:32 AM
Reposted by Tuan Trinh
Anne Gagneux, Ségolène Martin, @quentinbertrand.bsky.social Remi Emonet and I wrote a tutorial blog post on flow matching: dl.heeere.com/conditional-... with lots of illustrations and intuition!

We got this idea after their cool work on improving Plug and Play with FM: arxiv.org/abs/2410.02423
November 27, 2024 at 9:00 AM
Reposted by Tuan Trinh
Releasing SmolVLM, a small 2 billion parameters Vision+Language Model (VLM) built for on-device/in-browser inference with images/videos.

Outperforms all models at similar GPU RAM usage and tokens throughputs

Blog post: huggingface.co/blog/smolvlm
November 26, 2024 at 4:58 PM
Reposted by Tuan Trinh
My deep learning course at the University of Geneva is available on-line. 1000+ slides, ~20h of screen-casts. Full of examples in PyTorch.

fleuret.org/dlc/

And my "Little Book of Deep Learning" is available as a phone-formatted pdf (nearing 700k downloads!)

fleuret.org/lbdl/
November 26, 2024 at 6:15 AM
Reposted by Tuan Trinh
SmolLM - run, pre-train, fine-tune, evaluate SoTA fully open source LM 🔥

Run with Transformers, MLX, Transformers.js, MLC Web-LLM, Ollama, Candle and more!

Apache 2.0 licensed codebase - go explore now!
November 25, 2024 at 1:17 PM
Reposted by Tuan Trinh
Minhyuk Sung's course "Diffusion Models and Their Applications" at KAIST is now fully online, including all lectures, slides, and programming assignments: mhsung.github.io/kaist-cs492d...
CS492(D) Diffusion Models and Their Applications (KAIST, Fall 2024)
mhsung.github.io
November 25, 2024 at 2:18 PM
Happy that our technology in AI recommendation is granted USA patent.
November 24, 2024 at 11:02 AM