One of the most exciting developments of the past decade is the emergence of neural networks and deep learning for solving real-world problems. The idea, inspired by the way the neurons in the brain are connected, is deceptively simple: data is passed to a network of nodes, which “fire” when the weighted sum of the data exceeds some threshold value. This process is then iterated over many “layers” until the network outputs something useful: a classification, a probability, or maybe a reconstruction of the data set itself. The parameters of the weighted sum (know as “weights” and “biases”) are randomly initialized, and “trained” over many passes through the network according to some user-defined criterion by which the output matches a desired value. The mathematical techniques are familiar from first-year multivariable calculus and linear algebra, but somehow, this iterated structure can recognize handwritten digits and symbols, beat the world’s best players at games like go and poker, recognize faces and people, translate foreign languages, and even generate realistic conversation from a prompt.

Since my expertise is in high-energy physics, I’m interested in how concepts used in field theory can help explain the behavior of neural networks, and at the end of the day let us turn the problem around to design better neural networks to help us do high-energy physics. My first foray into the field involved showing that topological properties of a dataset (for example, how many “holes” it has, or how many points need to be removed before it can be mapped to a plane) affect the performance of autoencoders, neural networks that attempt to “compress” a dataset to its essential features. Since the manifold of Lorentz-invariant phase space has the topology of a sphere, our results have important implications for autoencoders attempting to perform anomaly detection on high-energy physics events. I recently identified the appearance of “scaling laws” in simulated collider physics data which resemble those first noticed in large language models like GPT-3. Since the underlying theory from which this data is drawn is known, and calculable to a large extent, this suggests that physics data could be an interesting playground of data “in the wild” carrying features of both simple toy models and natural image and language data.


References