**AI and Mathematics**

**Day 1 (April 6) @ 15:30–16:30**

**General abstract: **In close collaboration with NWO, the AIM Network wants to make fundamental research in the mathematics of AI visible to the broader research community, and to facilitate participation of mathematicians in research consortia within AI. As such, AIM unites the Dutch research groups working on mathematics for and in Artificial Intelligence.

The AIM session consists of five presentations covering a broad spectrum of AI-related topics. The session also includes a short interactive introduction of our activities and goals. The session will be hosted by Karen Aardal (TUD), Christoph Brune (Twente), Peter Grunwald (CWI, Leiden) and Amber Kerkhofs (NWO).

**Christoph Brune** (University of Twente)

#### Deep learning of normal form autoencoders for universal, parameter-dependent dynamics

A long-standing goal in dynamical systems is to construct reduced-order models for high-dimensional spatiotemporal data that capture the underlying parametric dependence of the data. In this work, we introduce a deep learning technique to discover a single low-dimensional model for such data that captures the underlying parametric dependence in terms of a normal form. A normal form is a symbolic expression, or universal unfolding, that describes how the reduced-order differential equation model varies with respect to a bifurcation parameter. Our approach introduces coupled autoencoders for the state and parameter, with the latent variables constrained to adhere to a given normal form. (joint work with Manu Kalia, Steven Brunton, Hil Meijer and Nathan Kutz)

**Allard Hendriksen** (CWI, Amsterdam)

#### Practical deep denoising for tomography without reference data

In X-ray computed tomography, a 3D model of the interior of an object is computed from a sequence of X-ray images. As the exposure time of the X-ray acquisition is reduced, noise is introduced in the images and the reconstructed model. For removing this noise, deep convolutional neural networks (CNNs) have been shown to be effective, but have so far required a dataset of noise-free target images for training. Our research suggests that it is possible to train such networks without any additional noise-free data by changing the training strategy. This opens the doors for application of deep CNNs in applications where obtaining noise-free images is infeasible, such as battery research and tomography of quickly evolving dynamic systems.

**Oxana Manita** (Eindhoven University of Technology)

#### Universal Approximation in Dropout Neural Networks

Dropout is a commonly used regularization algorithm. During training with dropout edges or nodes of a network are randomly deleted in order to prevent coadaptation. In this talk the approximation properties of dropout networks are discussed. We consider two versions of dropout networks – networks with random output and networks with deterministic output obtained by replacing random filters by their expected values. The latter is commonly used in practice mode of operation. It turns out that for a given function, one can construct a dropout network that approximates it well in both modes simultaneously. (Joint work with M. Peletier. J. Portegies, J. Sanders and A. Senen-Cerda)

**Mathias Staudigl** (Maastricht University)

#### Generalized Self-concordant analysis of Frank-Wolfe Algorithms

Projection-free optimization via different variants of the Frank-Wolfe method has become one of the cornerstones in large scale optimization for machine learning and computational statistics. Numerous applications within these fields involve the minimization of functions with self-concordant like properties. Such Generalized Self-Concordant (GSC) functions do not necessarily feature a Lipschitz continuous gradient, nor are they strongly convex. Indeed, in a number of applications within machine learning, e.g. inverse covariance estimation or distance-weighted discrimination problems in support vector machines, the loss is given by a GSC function having unbounded curvature, implying absence of theoretical guarantees for the existing Frank-Wolfe methods. This paper closes this apparent gap in the literature by developing provably convergent Frank-Wolfe algorithms with standard O(1/k) convergence rate guarantees. If the problem formulation allows the efficient construction of a local linear minimization oracle, we develop a Frank-Wolfe method with linear convergence rate. (joint work with Kamil Safin (Moscow Institute of Physics and Technology), Pavel Dvurechensky (Weierstrass Institute for Applied Analysis and Stochastics), Shimrit Shtern (Technion-The Israel Institute of Technology))

**Lara Scavuzzo** (Delft University of Technology)

#### Learning to branch for Mixed Integer Programming

Mixed Integer Programming is a powerful mathematical modeling tool for optimization problems, with numerous applications in real-world scenarios. In spite of the NP-hardness of Mixed Integer Programs (MIPs), our capabilities to tackle such problems have dramatically increased as a result of advancements in the algorithms that solve them. However, in practice, solving MIPs to optimality remains challenging. In this talk, I will discuss the potential of Machine Learning (ML) tools to augment algorithms for Mixed Integer Programming. In particular, how ML can support decision-making for critical tasks within the solver, such as choosing a branching candidate.