Krishna Balasubramanian
@krizna.bsky.social
https://sites.google.com/view/kriznakumar/ Associate professor at @ucdavis
#machinelearning #deeplearning #probability #statistics #optimization #sampling
#machinelearning #deeplearning #probability #statistics #optimization #sampling
Reposted by Krishna Balasubramanian
Krishnakumar Balasubramanian, Nathan Ross
Finite-Dimensional Gaussian Approximation for Deep Neural Networks: Universality in Random Weights
https://arxiv.org/abs/2507.12686
Finite-Dimensional Gaussian Approximation for Deep Neural Networks: Universality in Random Weights
https://arxiv.org/abs/2507.12686
July 18, 2025 at 4:14 AM
Krishnakumar Balasubramanian, Nathan Ross
Finite-Dimensional Gaussian Approximation for Deep Neural Networks: Universality in Random Weights
https://arxiv.org/abs/2507.12686
Finite-Dimensional Gaussian Approximation for Deep Neural Networks: Universality in Random Weights
https://arxiv.org/abs/2507.12686
New theory for simulated tempering using restricted spectral gap with arbitrary local MCMC samplers under multi-modality.
When applied to simulated tempering Metropolis-Hasting algorithm for sampling from Gaussian mixture models, we obtain high-accuracy TV guarantees.
When applied to simulated tempering Metropolis-Hasting algorithm for sampling from Gaussian mixture models, we obtain high-accuracy TV guarantees.
Restricted Spectral Gap Decomposition for Simulated Tempering Targeting Mixture Distributions
Simulated tempering is a widely used strategy for sampling from multimodal distributions. In this paper, we consider simulated tempering combined with an arbitrary local Markov chain Monte Carlo sampl...
arxiv.org
May 22, 2025 at 2:41 AM
New theory for simulated tempering using restricted spectral gap with arbitrary local MCMC samplers under multi-modality.
When applied to simulated tempering Metropolis-Hasting algorithm for sampling from Gaussian mixture models, we obtain high-accuracy TV guarantees.
When applied to simulated tempering Metropolis-Hasting algorithm for sampling from Gaussian mixture models, we obtain high-accuracy TV guarantees.
New work on Riemannian Proximal Sampler, to sample on Riemannian manifolds:
arxiv.org/abs/2502.07265
Comes with high-accuracy (i.e., log(1/eps), where eps is tolerance) guarantees with exact and inexact oracles for Manifold Brownian Increments and Riemannian Heat-kernels
arxiv.org/abs/2502.07265
Comes with high-accuracy (i.e., log(1/eps), where eps is tolerance) guarantees with exact and inexact oracles for Manifold Brownian Increments and Riemannian Heat-kernels
Riemannian Proximal Sampler for High-accuracy Sampling on Manifolds
We introduce the Riemannian Proximal Sampler, a method for sampling from densities defined on Riemannian manifolds. The performance of this sampler critically depends on two key oracles: the Manifold ...
arxiv.org
February 12, 2025 at 9:59 PM
New work on Riemannian Proximal Sampler, to sample on Riemannian manifolds:
arxiv.org/abs/2502.07265
Comes with high-accuracy (i.e., log(1/eps), where eps is tolerance) guarantees with exact and inexact oracles for Manifold Brownian Increments and Riemannian Heat-kernels
arxiv.org/abs/2502.07265
Comes with high-accuracy (i.e., log(1/eps), where eps is tolerance) guarantees with exact and inexact oracles for Manifold Brownian Increments and Riemannian Heat-kernels
Happy to have this paper on Improved rates for Stein Variational Gradient Descent accepted as an oral presentation at #ICLR2025
arxiv.org/abs/2409.08469
Only theory, No deep learning (although techniques useful for DL), No experiments in this time of scale and AGI :)
arxiv.org/abs/2409.08469
Only theory, No deep learning (although techniques useful for DL), No experiments in this time of scale and AGI :)
Improved Finite-Particle Convergence Rates for Stein Variational Gradient Descent
We provide finite-particle convergence rates for the Stein Variational Gradient Descent (SVGD) algorithm in the Kernelized Stein Discrepancy ($\mathsf{KSD}$) and Wasserstein-2 metrics. Our key insight...
arxiv.org
February 11, 2025 at 4:25 PM
Happy to have this paper on Improved rates for Stein Variational Gradient Descent accepted as an oral presentation at #ICLR2025
arxiv.org/abs/2409.08469
Only theory, No deep learning (although techniques useful for DL), No experiments in this time of scale and AGI :)
arxiv.org/abs/2409.08469
Only theory, No deep learning (although techniques useful for DL), No experiments in this time of scale and AGI :)
Got this paper out in 2024, just in time before AGI takes over in 2025:
arxiv.org/abs/2412.17181
We develop Gaussian approximation bounds and non-asymptotically valid confidence intervals for matching-based Average Treatment Effect (ATE) estimators.
arxiv.org/abs/2412.17181
We develop Gaussian approximation bounds and non-asymptotically valid confidence intervals for matching-based Average Treatment Effect (ATE) estimators.
Gaussian and Bootstrap Approximation for Matching-based Average Treatment Effect Estimators
We establish Gaussian approximation bounds for covariate and rank-matching-based Average Treatment Effect (ATE) estimators. By analyzing these estimators through the lens of stabilization theory, we e...
arxiv.org
January 2, 2025 at 7:01 PM
Got this paper out in 2024, just in time before AGI takes over in 2025:
arxiv.org/abs/2412.17181
We develop Gaussian approximation bounds and non-asymptotically valid confidence intervals for matching-based Average Treatment Effect (ATE) estimators.
arxiv.org/abs/2412.17181
We develop Gaussian approximation bounds and non-asymptotically valid confidence intervals for matching-based Average Treatment Effect (ATE) estimators.
Reposted by Krishna Balasubramanian
It seems that OpenAI's latest model, o3, can solve 25% of problems on a database called FrontierMath, created by EpochAI, where previous LLMs could only solve 2%. On Twitter I am quoted as saying, "Getting even one question right would be well beyond what we can do now, let alone saturating them."
December 20, 2024 at 11:15 PM
It seems that OpenAI's latest model, o3, can solve 25% of problems on a database called FrontierMath, created by EpochAI, where previous LLMs could only solve 2%. On Twitter I am quoted as saying, "Getting even one question right would be well beyond what we can do now, let alone saturating them."
Von Neumann: With 4 parameters, I can fit an elephant. With 5, I can make it wiggle its trunk.
OpenAI: Hold my gazillion parameter Sora model - I’ll make the elephant out of leaves and teach it to dance.
youtu.be/4QG_MGEBQow?...
OpenAI: Hold my gazillion parameter Sora model - I’ll make the elephant out of leaves and teach it to dance.
youtu.be/4QG_MGEBQow?...
Generated by Sora AI, elephant
YouTube video by AI Creation Today
youtu.be
December 11, 2024 at 12:49 AM
Von Neumann: With 4 parameters, I can fit an elephant. With 5, I can make it wiggle its trunk.
OpenAI: Hold my gazillion parameter Sora model - I’ll make the elephant out of leaves and teach it to dance.
youtu.be/4QG_MGEBQow?...
OpenAI: Hold my gazillion parameter Sora model - I’ll make the elephant out of leaves and teach it to dance.
youtu.be/4QG_MGEBQow?...
@iclr-conf.bsky.social Would greatly appreciate any guidance on what to do if reviewer, AC and PC did not respond. Thanks a lot!
cc:
@yisongyue.bsky.social
cc:
@yisongyue.bsky.social
jack skellington from the nightmare before christmas is standing in the dark and asking what to do .
ALT: jack skellington from the nightmare before christmas is standing in the dark and asking what to do .
media.tenor.com
December 2, 2024 at 7:17 PM
@iclr-conf.bsky.social Would greatly appreciate any guidance on what to do if reviewer, AC and PC did not respond. Thanks a lot!
cc:
@yisongyue.bsky.social
cc:
@yisongyue.bsky.social
How to characterize the learnability of local algorithms ?
The Merged Staircase Property (MSP) proposed by Abbe et al. (2022) is used to completely characterize the learnability of SGD-trained 2-layer neural networks (NN) in the regime where mean-field approximation holds for SGD.
The Merged Staircase Property (MSP) proposed by Abbe et al. (2022) is used to completely characterize the learnability of SGD-trained 2-layer neural networks (NN) in the regime where mean-field approximation holds for SGD.
November 27, 2024 at 3:07 PM
How to characterize the learnability of local algorithms ?
The Merged Staircase Property (MSP) proposed by Abbe et al. (2022) is used to completely characterize the learnability of SGD-trained 2-layer neural networks (NN) in the regime where mean-field approximation holds for SGD.
The Merged Staircase Property (MSP) proposed by Abbe et al. (2022) is used to completely characterize the learnability of SGD-trained 2-layer neural networks (NN) in the regime where mean-field approximation holds for SGD.