4th IMA Conference on The Mathematical Challenges of Big Data

Event


Date: -

Time : 9:00 am - 5:00 pm

Oxford University

Oxford, OX1 2JD, England

Organiser: IMA

Monday September 19, 2022 9:00 am Tuesday September 20, 2022 5:00 pm Europe/London 4th IMA Conference on The Mathematical Challenges of Big Data Oxford University, , Oxford, OX1 2JD, England Programme Abstract Book Lecturers in Room L1 & L6 Mathematical Institute University of Oxford Andrew Wiles Building Woodstock Road Oxford […]
Warning: Undefined variable $eventpleasenote in /home/ima/www/www/wp/wp-content/themes/ima-s8/templates/content-single.php on line 129

Warning: Trying to access array offset on value of type null in /home/ima/www/www/wp/wp-content/themes/ima-s8/templates/content-single.php on line 129
Event Link: https://ima.org.uk/17625/4th-ima-conference-on-the-mathematical-challenges-of-big-data/
IMA

4th IMA Conference on The Mathematical Challenges of Big Data


Programme

Abstract Book

Lecturers in Room L1 & L6
Mathematical Institute
University of Oxford
Andrew Wiles Building
Woodstock Road
Oxford OX2 6GG

 

The 4th IMA Conference on The Mathematical Challenges of Big Data is issuing a Call For Papers for both contributed talks and posters. Mathematical foundations of data science and its ongoing challenges are rapidly growing fields, encompassing areas such as: network science, machine learning, modelling, information theory, deep and reinforcement learning, applied probability and random matrix theory. Applying deeper mathematics to data is changing the way we understand the environment, health, technology, quantitative humanities, the natural sciences, and beyond ‐ with increasing roles in society and industry. This conference brings together researchers and practitioners to highlight key developments in the state‐of‐the art and find common ground where theory and practice meet, to shape future directions and maximize impact. We particularly welcome talks aimed to inform on recent developments in theory or methodology that may have applied consequences, as well as reports of diverse applications that have led to interesting successes or uncovered new challenges.

Contributed talks and posters are welcomed from the mathematically oriented data science community. Contributions will be selected based on brief abstracts and can be based on previously unpresented results, or recent material originally presented elsewhere. We encourage contributions from both established and early career researchers. Contributions will be assigned to talks or posters based on the authors request as well as the views of the organizing committee on the suitability of the results. The conference will be held in person with the option to attend remotely where needed.

Inducement of sparsity, Heather Battey
Sparsity, the existence of many zeros or near-zeros in some domain, is widely assumed throughout the high-dimensional literature and plays at least two roles depending on context. Parameter orthogonalisation (Cox and Reid, 1987) is presented as inducement of population-level sparsity. The latter is taken as a unifying theme for the talk, in which sparsity-inducing parameterisations or data transformations are sought. Three recent examples are framed in this light: sparse parameterisations of covariance models; systematic construction of factorisable transformations for the elimination of nuisance parameters; and inference in high-dimensional regression. The solution strategy for the problem of exact or approximate sparsity inducement appears to be context specific and may entail, for instance, solving one or more partial differential equation, or specifying a parameterised path through transformation or parameterisation space.

Over-parametrization: Insights from solvable models, Lenka Zdeborova

This talk gives an overview of recent results in a line of theoretical work that started 3 decades ago in statistical physics. We will first discuss teacher-student setting of the generalized linear regression. We illustrate the presence of the interpolation peak for classification with ridge loss and its vanishing with regularization. We show that, in the spherical perceptron, the optimally regularized logistic regression approaches very closely the Bayes optimal accuracy. We contrast this with the non-convex case of phase retrieval where the canonical empirical risk minimization performs poorly compared to the Bayes-optimal error. We then move towards learning with hidden units and analyze double descent in learning with generic fixed features and any convex loss. The formulas we obtain a generic enough to describe the learning of the last layer of neural networks for realistic data and networks. Finally, for the phase retrieval, we are able to analyze gradient descent in the feature-learning regime of a two-layer neural network where we show that over-parametrization allows a considerable reduction of the sample complexity. Concretely, an over-parametrized neural network only needs twice the input dimension of samples, while non-overparametrized network needs constant times more, and kernel regression quadratically many samples in the input dimension.

Disentangling homophily, community structure and triadic closure in networks, Tiago de Paula Peixoto

One of the most typical properties of network data is the presence of homophily, i.e. the increased tendency of an edge to exist between two nodes if they share the same underlying characteristic, such as a social parameter, metabolic role, etc. More broadly, when the underlying similarity parameter is not specified a priori, the same homophily pattern is known as community structure. Another pervasive pattern encountered in the same kinds of networks is transitivity, i.e. the increased tendency of observing an edge between two nodes if they share a neighbor in common. Although these patterns are indicative of two distinct mechanisms of network formation, namely choice or constraint homophily and triadic closure, respectively, they are generically conflated in non-longitudinal data. This is because both processes can result in the same kinds of observation: 1. the preferred connection between nodes of the same kind can induce the presence of triangles involving similar nodes, and 2. the tendency of triangles to be formed can induce the formation of groups of nodes with a higher density of connections between them, when compared to the rest of the network. This conflation means we cannot reliably interpret the underlying mechanisms of network formation merely from the abundance of triangles or observed community structure in network data.

In this talk I present a solution to this problem, consisting in a principled method to disentangle homophily and community structure from triadic closure in network data. This is achieved by formulating a generative model that includes community structure in a first instance, and an iterated process of triadic closure in a second. Based on this model, we develop a Bayesian inference algorithm that is capable of identifying which edges are more likely to be due to community structure or triadic closure, in addition to the underlying community structure itself. As we show, this reconstruction yields a detailed interpretation of the underlying mechanisms of network formation, allowing us to identify macro-scale structures that emerge spontaneously from micro-scale higher-order interactions, and in this way we can separate them from inherently macro-scale structures. I show how the method can evade the detection of spurious communities caused solely by the formation of triangles in the network, and how it can improve the performance of link prediction when compared to the pure version of the model without triadic closure.

The transformer leap,Nandos de Freitas

In the last five years we witnessed an extraordinary leap in our understanding of intelligence and the capabilities of AI systems. Large scale training of transformer neural networks was the driving force, but it soon became apparent that very simple supervisory signals could enable us to train foundations models, which can be easily adapted to solve a myriad of problems. Transformers have taken over, solving protein folding, generating language, generating images and videos to illustrate a sentence, mapping natural language descriptions to code, and ultimately paving the way for the design of intelligent generalist agents. In this talk, I will share a personal view of this transformer leap.

Confirmed Invited Speakers

Dr Heather Battey, Imperial College London
Prof. Lenka Zdebrova, EPFL (Swiss Federal Institute Technology)
Prof. Nando de Freitas, Google Deep Mind
Prof. Tiago de Paula Peixoto, Central European University

Registration

Click here to register for the 3rd Big Data Conference

Conference Fee – Non IMA Member £260
Conference Fee – IMA Member £220
Conference Fee – IMA Student £100
Conference Fees- Non IMA Student £140

If you are attending the conference please use the hashtag #IMABigData2022 and tag the IMA on socials!

Accommodation Links

Somerville College

Bed & Breakfast

On the link below you can find all rooms from all colleges (please note some colleges are far away from Maths Institute, please check the location first before making any bookings):

https://www.universityrooms.com/en-GB/city/oxford/home?gclid=Cj0KCQjwz96WBhC8ARIsAATR252Zi7lxFSay_KvqW8Ml9gxwjFtYwCJhMcV9w9sQ1kkS3uO2KmmjNfQaAo2_EALw_wcB

Organising Committee

Dr. Daniel Lawson, University of Bristol (Chair)
Prof. Ginestra Bianconi, Queen Mary, University of London
Prof. Jared Tanner, University of Oxford

Contact information

For scientific queries please contact: Prof. Jared Tanner, tanner@maths.ox.ac.uk
For general conference queries please contact the IMA Conference Department. Email: conferences@ima.org.uk Tel: +44 (0) 1702 354 020 Institute of Mathematics and its Applications, Catherine Richards House, 16 Nelson Street, Southend‐on‐Sea, Essex, SS1 1EF, UK.

Image credit: The Radcliffe Camera, Oxford University by Ben Seymour / Unsplash / Unsplash Licence

Published