ML Beyond Gradients

We are a research group at Concordia University and Mila in Montreal, led by Felix Dangel.

Topics. Our research focusses on building more powerful machine learning algorithms by incorporating richer information about the loss landscape. This includes:

Optimization: Second-order and natural gradient descent methods for LLM pre-training & scientific ML (Shampoo/SOAP, K-FAC, …).
LLMs: Curvature-based methods to better understand, regulate, and customize LLMs (influence functions, unlearning, merging, …).
Automatic differentiation: Efficiently computing and using higher-order derivatives (e.g. for geometry- and physics-informed ML).

Other interests include neural network parameter space symmetries, tensor networks and their application in quantum Chemistry/Physics, randomized linear algebra, and information geometry.

News

Mar 01, 2026	🎉 Moved to Montreal and officially assumed my position!

What is 'Beyond Gradients'?

Machine learning centers around the gradient, which is easily accessible via backpropagation. But there is richer information that captures more aspects of the loss landscape. Our goal is to make it accessible, and to use it to accelerate ML algorithms.

Stochasticity. The loss is an average of per-datum losses (same for the gradient). Higher-order moments like the covariance contain information about the noise.

Geometry. Gradients locally approximate the loss with a linear function. Considering higher-order terms, like the second order (curvature) captures the landscape more faithfully.

These objects (e.g. Hessian or gradient covariance matrices) offer promising avenues to improve algorithms. But the challenge in working with them is that they are high-dimensional and usually infeasible to work with exactly, requiring efficient estimation and structural approximations.