One of the most exciting developments of the past decade is the emergence of neural networks and deep learning for solving real-world problems. The idea, inspired by the way the neurons in the brain are connected, is deceptively simple: data is passed to a network of nodes, which “fire” when the weighted sum of the data exceeds some threshold value. This process is then iterated over many “layers” until the network outputs something useful: a classification, a probability, or maybe a reconstruction of the data set itself. The parameters of the weighted sum (know as “weights” and “biases”) are randomly initialized, and “trained” over many passes through the network according to some user-defined criterion by which the output matches a desired value. The mathematical techniques are familiar from first-year multivariable calculus and linear algebra, but somehow, this iterated structure can recognize handwritten digits and symbols, beat the world’s best players at games like go and poker, recognize faces and people, translate foreign languages, and even generate realistic conversation from a prompt.

What is going on inside the neural network? Right now, very little is known: the tool works, so why ask too many questions? For one, if we would like to use these tools to do data analysis for physics experiments, we need to have some idea of the systematic uncertainties of the output with respect to the input parameters. Another important reason is that principles from physics, like Lorentz invariance, can be baked into the network architecture from the beginning, which is much more efficient than forcing the network to discover these symmetry principles from scratch each time it is trained on new data. Finally, there are many tantalizing analogies between neural networks and situations we encounter in all branches of physics. The interactions of a large number of entities which give rise to simple collective behavior is strongly reminiscent of statistical mechanics and condensed matter physics, and in many common situations, the equations by which the weights and biases are optimized are analogous to equations of motion from classical mechanics with stochastic force terms.

Since my expertise is in high-energy physics, I’m interested in how concepts used in field theory can help explain the behavior of neural networks, and at the end of the day let us turn the problem around to design better neural networks to help us do high-energy physics. One recent project involved showing that topological properties of a dataset (for example, how many “holes” it has, or how many points need to be removed before it can be mapped to a plane) affect the performance of autoencoders, neural networks that attempt to “compress” a dataset to its essential features. Since the manifold of Lorentz-invariant phase space has the topology of a sphere, our results have important implications for autoencoders attempting to perform anomaly detection on high-energy physics events.