One of the most exciting developments of the past decade is the emergence of neural networks and deep learning for solving real-world problems. The idea, inspired by the way the neurons in the brain are connected, is deceptively simple: data is passed to a network of nodes, which “fire” when the weighted sum of the data exceeds some threshold value. This process is then iterated over many “layers” until the network outputs something useful: a classification, a probability, or maybe a reconstruction of the data set itself. The parameters of the weighted sum (know as “weights” and “biases”) are randomly initialized, and “trained” over many passes through the network according to some user-defined criterion by which the output matches a desired value. The mathematical techniques are familiar from first-year multivariable calculus and linear algebra, but somehow, this iterated structure can recognize handwritten digits and symbols, beat the world’s best players at games like go and poker, recognize faces and people, translate foreign languages, and even generate realistic conversation from a prompt.
What is going on inside the neural network? Right now, very little is known: the tool works, so why ask too many questions? For one, if we would like to use these tools to do data analysis for physics experiments, we need to have some idea of the systematic uncertainties of the output with respect to the input parameters. Another important reason is that principles from physics, like Lorentz invariance, can be baked into the network architecture from the beginning, which is much more efficient than forcing the network to discover these symmetry principles from scratch each time it is trained on new data. Finally, there are many tantalizing analogies between neural networks and situations we encounter in all branches of physics. The interactions of a large number of entities which give rise to simple collective behavior is strongly reminiscent of statistical mechanics and condensed matter physics, and in many common situations, the equations by which the weights and biases are optimized are analogous to equations of motion from classical mechanics with stochastic force terms.
Since my expertise is in high-energy physics, I’m interested in how concepts used in field theory can help explain the behavior of neural networks, and at the end of the day let us turn the problem around to design better neural networks to help us do high-energy physics. As I’m very new to this field, I don’t currently have any publications on this topic, but I encourage you to check out the suggested references below which I think represent some of the best examples of this philosophy.
- D. Boyda et al. Sampling using SU(N) gauge equivariant flows. arXiv:2008.05456.
- D. Roberts. SGD Implicitly Regularizes Generalization Error.
- S. Yaida. Non-Gaussian processes and neural networks at finite widths. arXiv:1910.00019.
- S. Yaida. Fluctuation-dissipation relations for stochastic gradient descent. arXiv:1810.00004.
- P. Komiske, E. Metodiev, and J. Thaler. Energy flow networks: deep sets for particle jets. JHEP 01 (2019) 121. arXiv:1810.05165.