Fitting elephants in modern machine learning by statistically consistent interpolation

Select a language for the TTS:
UK English Female
UK English Male
US English Female
US English Male
Australian Female
Australian Male
Language selected: (auto detect) - EN

Play all audios:

ABSTRACT Textbook wisdom advocates for smooth function fits and implies that interpolation of noisy data should lead to poor generalization. A related heuristic is that fitting parameters

should be fewer than measurements (Occam’s razor). Surprisingly, contemporary machine learning approaches, such as deep nets, generalize well, despite interpolating noisy data. This may be

understood via statistically consistent interpolation (SCI), that is, data interpolation techniques that generalize optimally for big data. Here, we elucidate SCI using the weighted

interpolating nearest neighbours algorithm, which adds singular weight functions to _k_ nearest neighbours. This shows that data interpolation can be a valid machine learning strategy for

big data. SCI clarifies the relation between two ways of modelling natural phenomena: the rationalist approach (strong priors) of theoretical physics with few parameters, and the empiricist

(weak priors) approach of modern machine learning with more parameters than data. SCI shows that the purely empirical approach can successfully predict. However, data interpolation does not

provide theoretical insights, and the training data requirements may be prohibitive. Complex animal brains are between these extremes, with many parameters, but modest training data, and

with prior structure encoded in species-specific mesoscale circuitry. Thus, modern machine learning provides a distinct epistemological approach that is different both from physical theories

and animal brains. Access through your institution Buy or subscribe This is a preview of subscription content, access via your institution ACCESS OPTIONS Access through your institution

Access Nature and 54 other Nature Portfolio journals Get Nature+, our best-value online-access subscription $29.99 / 30 days cancel any time Learn more Subscribe to this journal Receive 12

digital issues and online access to articles $119.00 per year only $9.92 per issue Learn more Buy this article * Purchase on SpringerLink * Instant access to full article PDF Buy now Prices

may be subject to local taxes which are calculated during checkout ADDITIONAL ACCESS OPTIONS: * Log in * Learn about institutional subscriptions * Read our FAQs * Contact customer support

SIMILAR CONTENT BEING VIEWED BY OTHERS PIECEWISE LINEAR NEURAL NETWORKS AND DEEP LEARNING Article 09 June 2022 SCALABLE SPATIOTEMPORAL PREDICTION WITH BAYESIAN NEURAL FIELDS Article Open

access 11 September 2024 A REVIEW OF SOME TECHNIQUES FOR INCLUSION OF DOMAIN-KNOWLEDGE INTO DEEP NEURAL NETWORKS Article Open access 20 January 2022 REFERENCES * Dyson, F. A meeting with

Enrico Fermi. _Nature_ 427, 297 (2004). Article Google Scholar * James, G., Witten, D., Hastie, T. & Tibshirani, R. _An Introduction to Statistical Learning_ Vol. 112 (Springer, 2013).

* Györfi, L., Kohler, M., Krzyzak, A. & Walk, H. _A Distribution-Free Theory of Nonparametric Regression_ (Springer, 2002). * Zhang, C., Bengio, S., Hardt, M., Recht, B. & Vinyals,

O. Understanding deep learning requires rethinking generalization. In _International Conference on Learning Representations_ (ICLR, 2017). * Wyner, A. J., Olson, M., Bleich, J. & Mease,

D. Explaining the success of adaboost and random forests as interpolating classifiers. _J. Mach. Learn. Res._ 18, 1–33 (2017). MathSciNet MATH Google Scholar * Belkin, M., Ma, S. &

Mandal, S. To understand deep learning we need to understand kernel learning. In _Proc. 35th International Conference on Machine Learning_ 541–549 (PMLR, 2018). * Belkin, M., Hsu, D. &

Mitra, P. Overfitting or perfect fitting? Risk bounds for classification and regression rules that interpolate. In _Advances in Neural Information Processing Systems_ Vol. 31 (NIPS, 2018). *

Cutler, A. & Zhao, G. Pert-perfect random tree ensembles. _Comput. Sci. Stat._ 33, 490–497 (2001). Google Scholar * Donoho, D. L. & Tanner, J. Sparse nonnegative solution of

underdetermined linear equations by linear programming. _Proc. Natl Acad. Sci. USA_ 102, 9446–9451 (2005). Article MathSciNet Google Scholar * Wainwright, M. J. _High-Dimensional

Statistics_: _A Non-Asymptotic Viewpoint_ Vol. 48 (Cambridge Univ. Press, 2019). * Rakhlin, A. & Zhai, X. Consistency of interpolation with laplace kernels is a high-dimensional

phenomenon. In _Proc. Thirty-Second Conference on Learning Theory_ 2595–2623 (PMLR, 2019). * Ongie, G., Willett, R., Soudry, D. & Srebro, N. A function space view of bounded norm

infinite width ReLU nets: the multivariate case. In _International Conference on Learning Representations_ (ICLR, 2020). * Belkin, M., Hsu, D., Ma, S. & Mandal, S. Reconciling modern

machine-learning practice and the classical bias–variance trade-off. _Proc. Natl Acad. Sci. USA_ 116, 15849–15854 (2019). Article MathSciNet Google Scholar * Liang, T. & Rakhlin, A.

Just interpolate: kernel “ridgeless” regression can generalize. _Ann. Stat._ 48, 1329–1347 (2020). Article MathSciNet Google Scholar * Bartlett, P. L., Long, P. M., Lugosi, G. &

Tsigler, A. Benign overfitting in linear regression. _Proc. Natl Acad. Sci. USA_ 117, 30063–30070 (2020). Article Google Scholar * Montanari, A., Ruan, F., Sohn, Y. & Yan, J. The

generalization error of max-margin linear classifiers: high-dimensional asymptotics in the overparametrized regime. Preprint at https://arxiv.org/pdf/1911.01544.pdf (2019). * Karzand, M.

& Nowak, R. D. Active learning in the overparameterized and interpolating regime. Preprint at https://arxiv.org/pdf/1905.12782.pdf (2019). * Xing, Y., Song, Q. & Cheng, G.

Statistical optimality of interpolated nearest neighbor algorithms. Preprint at https://arxiv.org/pdf/1810.02814.pdf (2018). * Anthony, M. & Bartlett, P. L. _Neural Network Learning_:

_Theoretical Foundations_ (Cambridge Univ. Press, 1999). * Arora, S., Du, S., Hu, W., Li, Z. & Wang, R. Fine-grained analysis of optimization and generalization for overparameterized

two-layer neural networks. In _International Conference on Machine Learnin_g 322–332 (PMLR, 2019). * Allen-Zhu, Z., Li, Y. & Liang, Y. Learning and generalization in overparameterized

neural networks, going beyond two layers. In _Advances in Neural Information Processing Systems_ Vol. 32, 6158–6169 (NIPS, 2019). * Schapire, R. E. et al. Boosting the margin: a new

explanation for the effectiveness of voting methods. _Ann. Stat._ 26, 1651–1686 (1998). MathSciNet MATH Google Scholar * Mücke, N. & Steinwart, I. Global minima of DNNs: the plenty

pantry. Preprint at https://arxiv.org/pdf/1905.10686.pdf (2019). * Nadaraya, E. A. On estimating regression. _Theory Probability Appl._ 9, 141–142 (1964). Article Google Scholar * Watson,

G. Smooth regression analysis. _Sankhya A_ 26, 359–372 (1964). MathSciNet MATH Google Scholar * Cover, T. & Hart, P. Nearest neighbor pattern classification. _IEEE Trans. Inf. Theory_

13, 21–27 (1967). Article Google Scholar * Shepard, D. A two-dimensional interpolation function for irregularly-spaced data. In _Proc. 23rd ACM National Conference_ 517–524 (ACM, 1968). *

Devroye, L., Györfi, L. & Krzyżak, A. The Hilbert kernel regression estimate. _J. Multivariate Anal._ 65, 209–227 (1998). Article MathSciNet Google Scholar * Waring, E. VII. Problems

concerning interpolations. _Philos. Trans. R. Soc. Lond_ 69, 59–67 (1779). Google Scholar * Runge, C. Über empirische Funktionen und die Interpolation zwischen äquidistanten Ordinaten. _Z.

Math. Phys._ 46, 20 (1901). MATH Google Scholar * Xing, Y., Song, Q. & Cheng, G. Benefit of interpolation in nearest neighbor algorithms. Preprint at

https://arxiv.org/pdf/1909.11720.pdf (2019). * Cybenko, G. Approximation by superpositions of a sigmoidal function. _Math. Control Signals Syst._ 2, 303–314 (1989). Article MathSciNet

Google Scholar * Jacot, A., Gabriel, F. & Hongler, C. Neural tangent kernel: convergence and generalization in neural networks. In _Advances in Neural Information Processing Systems_

Vol. 31, 8571–8580 (NIPS, 2018). * Goodfellow, I., Bengio, Y. & Courville, A. _Deep Learning_ (MIT Press, 2016). * Bartlett, P. L. For valid generalization the size of the weights is

more important than the size of the network. In _Advances in Neural Information Processing Systems_ Vol. 9, 134–140 (NIPS, 1997). * Neyshabur, B., Tomioka, R. & Srebro, N. In search of

the real inductive bias: on the role of implicit regularization in deep learning. Preprint at https://arxiv.org/pdf/1412.6614.pdf (2014). * Heaven, W. D. Our weird behavior during the

pandemic is messing with AI models. _MIT Technology Review_ (11 May 2020). * Goodfellow, I. J., Shlens, J. & Szegedy, C. Explaining and harnessing adversarial examples. Preprint at

https://arxiv.org/pdf/1412.6572.pdf (2014). * Engstrom, L., Tran, B., Tsipras, D., Schmidt, L. & Madry, A. A rotation and a translation suffice: fooling CNNs with simple transformations.

In _International Conference on Learning Representations_ (ICLR, 2018). * Ma, S., Bassily, R. & Belkin, M. The power of interpolation: understanding the effectiveness of SGD in modern

over-parametrized learning. In _International Conference on Machine Learnin_g 3325–3334 (PMLR, 2018). * Mitra, P. P. Fast convergence for stochastic and distributed gradient descent in the

interpolation limit. In _Proc. 26th European Signal Processing Conference_ (_EUSIPCO_) 1890–1894 (IEEE, 2018). * Loog, M., Viering, T., Mey, A., Krijthe, J. H. & Tax, D. M. A brief

prehistory of double descent. _Proc. Natl Acad. Sci. USA_ 117, 10625–10626 (2020). Article Google Scholar * Engel, A. & Van den Broeck, C. _Statistical Mechanics of Learning_

(Cambridge Univ. Press, 2001). * Hastie, T., Montanari, A., Rosset, S. & Tibshirani, R. J. Surprises in high-dimensional ridgeless least squares interpolation. Preprint at

https://arxiv.org/pdf/1903.08560.pdf (2019). * Mitra, P. P. Understanding overfitting peaks in generalization error: analytical risk curves for _l_2 and _l_1 penalized interpolation.

Preprint at https://arxiv.org/pdf/1906.03667.pdf (2019). * Adlam, B. & Pennington, J. The neural tangent kernel in high dimensions: triple descent and a multi-scale theory of

generalization. In _International Conference on Machine Learning_ 74–84 (PMLR, 2020). * Hessel, M. et al. Rainbow: combining improvements in deep reinforcement learning. In _Proc. 32nd AAAI

Conference on Artificial Intelligence_ (AAAI, 2018); https://ojs.aaai.org/index.php/AAAI/article/view/11796 * Fehér, O., Wang, H., Saar, S., Mitra, P. P. & Tchernichovski, O. De novo

establishment of wild-type song culture in the zebra finch. _Nature_ 459, 564–568 (2009). Article Google Scholar * Chomsky, N. et al. _Language and Problems of Knowledge_: _The Managua

Lectures_ Vol. 16 (MIT Press, 1988). * Ronneberger, O., Fischer, P. & Brox, T. U-Net: convolutional networks for biomedical image segmentation. In _International Conference on Medical

Image Computing and Computer-Assisted Intervention_ 234–241 (Springer, 2015). * Sutton, R. S. & Barto, A. G. _Reinforcement Learning_: _An Introduction_ (MIT Press, 2018). * Hochreiter,

S. & Schmidhuber, J. Long short-term memory. _Neural Comput._ 9, 1735–1780 (1997). Article Google Scholar * Vaswani, A. et al. Attention is all you need. In _Advances in Neural

Information Processing Systems_ Vol. 30, 5998–6008 (NIPS, 2017). * Abraham, T. H. (Physio)logical circuits: the intellectual origins of the McCulloch–Pitts neural networks. _J. History

Behav. Sci._ 38, 3–25 (2002). Article Google Scholar * Turing, A. M. _Intelligent Machinery_ (National Physical Laboratory, 1948). * LeCun, Y., Bengio, Y. & Hinton, G. Deep learning.

_Nature_ 521, 436–444 (2015). Article Google Scholar * Sillito, A. M., Cudeiro, J. & Jones, H. E. Always returning: feedback and sensory processing in visual cortex and thalamus.

_Trends Neurosci._ 29, 307–316 (2006). Article Google Scholar * Bohland, J. W. et al. A proposal for a coordinated effort for the determination of brainwide neuroanatomical connectivity in

model organisms at a mesoscopic scale. _PLoS Comput. Biol._ 5, e1000334 (2009). Article Google Scholar * Mitra, P. P. The circuit architecture of whole brains at the mesoscopic scale.

_Neuron_ 83, 1273–1283 (2014). Article Google Scholar * Oh, S. W. et al. A mesoscale connectome of the mouse brain. _Nature_ 508, 207–214 (2014). Article Google Scholar * Scheffer, L. K.

et al. A connectome and analysis of the adult Drosophila central brain. _eLife_ 9, e57443 (2020). Article Google Scholar * Majka, P. et al. Unidirectional monosynaptic connections from

auditory areas to the primary visual cortex in the marmoset monkey. _Brain Struct. Funct._ 224, 111–131 (2019). Article Google Scholar * Kaelbling, L. P. The foundation of efficient robot

learning. _Science_ 369, 915–916 (2020). Article Google Scholar * Muñoz-Castaneda, R. et al. Cellular anatomy of the mouse primary motor cortex. Preprint at _bioRxiv_

https://doi.org/10.1101/2020.10.02.323154 (2020). * Schmidt, M. & Lipson, H. Distilling free-form natural laws from experimental data. _Science_ 324, 81–85 (2009). Article Google

Scholar * Katz, Y. Noam Chomsky on where artificial intelligence went wrong. _The Atlantic_ (1 November 2012). * Acosta, G., Smith, E. & Kreinovich, V. Need for a careful comparison

between hypotheses: case study of epicycles. In _Towards Analytical Techniques for Systems Engineering Applications_ 61–64 (Springer, 2020). * Landauer, R. Irreversibility and heat

generation in the computing process. _IBM J. Res. Dev._ 5, 183–191 (1961). Article MathSciNet Google Scholar * Feynman, R. P. _Feynman Lectures on Computation_ (CRC Press, 2018). Download

references ACKNOWLEDGEMENTS This work was supported by the Crick–Clay Professorship (CSHL) and the H. N. Mahabala Chair Professorship (IIT Madras). AUTHOR INFORMATION AUTHORS AND

AFFILIATIONS * Cold Spring Harbor Laboratory Cold Spring Harbor, New York, NY, USA Partha P. Mitra * Center for Computational Brain Research, IIT, Madras, India Partha P. Mitra Authors *

Partha P. Mitra View author publications You can also search for this author inPubMed Google Scholar CORRESPONDING AUTHOR Correspondence to Partha P. Mitra. ETHICS DECLARATIONS COMPETING

INTERESTS The author declares no competing interests. ADDITIONAL INFORMATION PEER REVIEW INFORMATION _Nature Machine Intelligence_ thanks Samet Oymak and the other, anonymous, reviewer(s)

for their contribution to the peer review of this work. PUBLISHER’S NOTE Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional

affiliations. RIGHTS AND PERMISSIONS Reprints and permissions ABOUT THIS ARTICLE CITE THIS ARTICLE Mitra, P.P. Fitting elephants in modern machine learning by statistically consistent

interpolation. _Nat Mach Intell_ 3, 378–386 (2021). https://doi.org/10.1038/s42256-021-00345-8 Download citation * Received: 26 November 2019 * Accepted: 15 April 2021 * Published: 19 May

2021 * Issue Date: May 2021 * DOI: https://doi.org/10.1038/s42256-021-00345-8 SHARE THIS ARTICLE Anyone you share the following link with will be able to read this content: Get shareable

link Sorry, a shareable link is not currently available for this article. Copy to clipboard Provided by the Springer Nature SharedIt content-sharing initiative

Fitting elephants in modern machine learning by statistically consistent interpolation

Play all audios:

Trending News

Latest News