# Statistics and Deep Learning

We at TOELT LLC are interested in developing mathematical frameworks to give a more solid base to how machine learning is used, especially in natural sciences. We publish a large number of papers and books trying to analyze and make using machine learning more suitable for natural sciences as physics, chemistry, biology, medicine and so on.

## Research interests

• fundamentals of machine learning: statistical study of large neural networks, random matrix theory.
• statistics: especially interesting for us are mixture distributions, extreme value theory (application of the Fisher–Tippett–Gnedenko theorem), modelling of complex systems.
• Correct application of machine learning techniques: study of metric development via the central limit theorem, effect of errors on labels (Michelucci & Venturini, 2022) in supervised learning, determination of exact Bayes error (Michelucci et al., 2021) for classifiers, Bootstrapping and validation techniques (Michelucci & Venturini, 2021).
• Development of new algorithms: for exampling applying Fourier transform techniques for interference-immune tunable absorption spectroscopy gas Sensing (Michelucci & Venturini, 2017), denoising.
• Advanced Deep Learning: superresolution statistical stabilty and significance, symmetry aware neural networks.

# Books

Umberto Michelucci has published 3 books with Springer Nature on the theory of Deep Learning and applications with TensorFlow that have received more than 100 citations and has been downloaded more than 100000 times. He is also writing a fourth book (that will be available end of 2023) that will be a unversity text book for scientists, to enable them to apply machine learning in their research projects.

## Book References

1. Deep Learning for Natural Sciences. (2023). [Book]. In Springer Nature. Springer Nature.
@book{michelucci2023book,
title = {Deep Learning for Natural Sciences},
journal = {Springer Nature},
year = {2023},
publisher = {Springer Nature},
bibtex_show = {true},
abbr = {Book},
type = {Book},
topic = {mltheory}
}

2. Michelucci, U. (2022). Applied Deep Learning with TensorFlow 2 [Book]. Springer Nature/Apress.
@book{michelucciapplied,
title = {Applied Deep Learning with TensorFlow 2},
author = {Michelucci, Umberto},
publisher = {Springer Nature/Apress},
year = {2022},
abbr = {Book},
topic = {mltheory},
type = {Book},
selected = {true},
bibtex_show = {true}
}

3. Michelucci, U. (2019). Advanced applied deep learning: convolutional neural networks and object detection [Book]. Springer Nature.
@article{michelucci2019advanced,
title = {Advanced applied deep learning: convolutional neural networks and object detection},
author = {Michelucci, Umberto},
year = {2019},
publisher = {Springer Nature/Apress},
journal = {Springer Nature},
bibtex_show = {true},
abbr = {Book},
type = {Book},
topic = {mltheory}
}

4. Michelucci, U. (2018). Applied Deep Learning - A Case-Based Approach to Understanding Deep Neural Networks [Book]. In Springer Nature. Springer Nature/Apress.
@book{michelucci2018applied,
title = {Applied Deep Learning - A Case-Based Approach to Understanding Deep Neural Networks},
author = {Michelucci, Umberto},
journal = {Springer Nature},
year = {2018},
publisher = {Springer Nature/Apress},
bibtex_show = {true},
abbr = {Book},
type = {Book},
topic = {mltheory}
}


# Publications

Below you can find all the references to publications that deal with statistics and machine learning theory.

1. Deep Learning for Natural Sciences. (2023). [Book]. In Springer Nature. Springer Nature.
@book{michelucci2023book,
title = {Deep Learning for Natural Sciences},
journal = {Springer Nature},
year = {2023},
publisher = {Springer Nature},
bibtex_show = {true},
abbr = {Book},
type = {Book},
topic = {mltheory}
}

2. Michelucci, U., & Venturini, F. (2022). New Metric Formulas that Include Measurement Errors in Machine Learning for Natural Sciences. ArXiv Preprint.
@article{michelucci2022errorsArxiv,
title = {New Metric Formulas that Include Measurement Errors in Machine Learning for Natural Sciences},
author = {Michelucci, Umberto and Venturini, Francesca},
journal = {arXiv preprint},
year = {2022},
month = oct,
topic = {mltheory},
abbr = {Theory},
selected = {true}
}

3. Michelucci, U. (2022). An Introduction to Autoencoders [Theory]. ArXiv Preprint ArXiv:2201.03898.
@article{michelucci2022introduction,
title = {An Introduction to Autoencoders},
author = {Michelucci, Umberto},
journal = {arXiv preprint arXiv:2201.03898},
year = {2022},
topic = {mltheory},
type = {Theory},
abbr = {Theory}
}

4. Michelucci, U. (2022). Applied Deep Learning with TensorFlow 2 [Book]. Springer Nature/Apress.
@book{michelucciapplied,
title = {Applied Deep Learning with TensorFlow 2},
author = {Michelucci, Umberto},
publisher = {Springer Nature/Apress},
year = {2022},
abbr = {Book},
topic = {mltheory},
type = {Book},
selected = {true},
bibtex_show = {true}
}

5. Michelucci, U., & Venturini, F. (2021). Estimating Neural Network’s Performance with Bootstrap: A Tutorial. Machine Learning and Knowledge Extraction, 3(2), 357–373. https://doi.org/10.3390/make3020018

Neural networks present characteristics where the results are strongly dependent on the training data, the weight initialisation, and the hyperparameters chosen. The determination of the distribution of a statistical estimator, as the Mean Squared Error (MSE) or the accuracy, is fundamental to evaluate the performance of a neural network model (NNM). For many machine learning models, as linear regression, it is possible to analytically obtain information as variance or confidence intervals on the results. Neural networks present the difficulty of not being analytically tractable due to their complexity. Therefore, it is impossible to easily estimate distributions of statistical estimators. When estimating the global performance of an NNM by estimating the MSE in a regression problem, for example, it is important to know the variance of the MSE. Bootstrap is one of the most important resampling techniques to estimate averages and variances, between other properties, of statistical estimators. In this tutorial, the application of resampling techniques (including bootstrap) to the evaluation of neural networks’ performance is explained from both a theoretical and practical point of view. The pseudo-code of the algorithms is provided to facilitate their implementation. Computational aspects, as the training time, are discussed, since resampling techniques always require simulations to be run many thousands of times and, therefore, are computationally intensive. A specific version of the bootstrap algorithm is presented that allows the estimation of the distribution of a statistical estimator when dealing with an NNM in a computationally effective way. Finally, algorithms are compared on both synthetically generated and real data to demonstrate their performance.

@article{make3020018,
author = {Michelucci, Umberto and Venturini, Francesca},
title = {Estimating Neural Network’s Performance with Bootstrap: A Tutorial},
journal = {Machine Learning and Knowledge Extraction},
volume = {3},
year = {2021},
number = {2},
pages = {357--373},
issn = {2504-4990},
bibtex_show = {true},
abbr = {Theory},
doi = {10.3390/make3020018},
topic = {mltheory}
}

6. Michelucci, U., Sperti, M., Piga, D., Venturini, F., & Deriu, M. A. (2021). A Model-Agnostic Algorithm for Bayes Error Determination in Binary Classification. Algorithms, 14(11), 301.
@article{michelucci2021model,
title = {A Model-Agnostic Algorithm for Bayes Error Determination in Binary Classification},
author = {Michelucci, Umberto and Sperti, Michela and Piga, Dario and Venturini, Francesca and Deriu, Marco A},
journal = {Algorithms},
volume = {14},
number = {11},
pages = {301},
year = {2021},
bibtex_show = {true},
abbr = {Theory},
publisher = {Multidisciplinary Digital Publishing Institute},
topic = {mltheory}
}

7. Michelucci, U. (2019). Advanced applied deep learning: convolutional neural networks and object detection [Book]. Springer Nature.
@article{michelucci2019advanced,
title = {Advanced applied deep learning: convolutional neural networks and object detection},
author = {Michelucci, Umberto},
year = {2019},
publisher = {Springer Nature/Apress},
journal = {Springer Nature},
bibtex_show = {true},
abbr = {Book},
type = {Book},
topic = {mltheory}
}

8. Michelucci, U. (2018). Applied Deep Learning - A Case-Based Approach to Understanding Deep Neural Networks [Book]. In Springer Nature. Springer Nature/Apress.
@book{michelucci2018applied,
title = {Applied Deep Learning - A Case-Based Approach to Understanding Deep Neural Networks},
author = {Michelucci, Umberto},
journal = {Springer Nature},
year = {2018},
publisher = {Springer Nature/Apress},
bibtex_show = {true},
abbr = {Book},
type = {Book},
topic = {mltheory}
}

9. Michelucci, U., & Venturini, F. (2017). Novel semi-parametric algorithm for interference-immune tunable absorption spectroscopy gas Sensing. Sensors, 17(10), 2281.
@article{michelucci2017novel,
title = {Novel semi-parametric algorithm for interference-immune tunable absorption spectroscopy gas Sensing},
author = {Michelucci, Umberto and Venturini, Francesca},
journal = {Sensors},
volume = {17},
number = {10},
pages = {2281},
year = {2017},
publisher = {Multidisciplinary Digital Publishing Institute},
bibtex_show = {true},
topic = {mltheory},
abbr = {Algorithms}
}


1. Michelucci, U., & Venturini, F. (2022). New Metric Formulas that Include Measurement Errors in Machine Learning for Natural Sciences. ArXiv Preprint ArXiv:2209.15588.
@article{michelucci2022new,
title = {New Metric Formulas that Include Measurement Errors in Machine Learning for Natural Sciences},
author = {Michelucci, Umberto and Venturini, Francesca},
journal = {arXiv preprint arXiv:2209.15588},
year = {2022}
}

2. Michelucci, U., & Venturini, F. (2021). Estimating Neural Network’s Performance with Bootstrap: A Tutorial. Machine Learning and Knowledge Extraction, 3(2), 357–373. https://doi.org/10.3390/make3020018

Neural networks present characteristics where the results are strongly dependent on the training data, the weight initialisation, and the hyperparameters chosen. The determination of the distribution of a statistical estimator, as the Mean Squared Error (MSE) or the accuracy, is fundamental to evaluate the performance of a neural network model (NNM). For many machine learning models, as linear regression, it is possible to analytically obtain information as variance or confidence intervals on the results. Neural networks present the difficulty of not being analytically tractable due to their complexity. Therefore, it is impossible to easily estimate distributions of statistical estimators. When estimating the global performance of an NNM by estimating the MSE in a regression problem, for example, it is important to know the variance of the MSE. Bootstrap is one of the most important resampling techniques to estimate averages and variances, between other properties, of statistical estimators. In this tutorial, the application of resampling techniques (including bootstrap) to the evaluation of neural networks’ performance is explained from both a theoretical and practical point of view. The pseudo-code of the algorithms is provided to facilitate their implementation. Computational aspects, as the training time, are discussed, since resampling techniques always require simulations to be run many thousands of times and, therefore, are computationally intensive. A specific version of the bootstrap algorithm is presented that allows the estimation of the distribution of a statistical estimator when dealing with an NNM in a computationally effective way. Finally, algorithms are compared on both synthetically generated and real data to demonstrate their performance.

@article{make3020018,
author = {Michelucci, Umberto and Venturini, Francesca},
title = {Estimating Neural Network’s Performance with Bootstrap: A Tutorial},
journal = {Machine Learning and Knowledge Extraction},
volume = {3},
year = {2021},
number = {2},
pages = {357--373},
issn = {2504-4990},
bibtex_show = {true},
abbr = {Theory},
doi = {10.3390/make3020018},
topic = {mltheory}
}

3. Michelucci, U., Sperti, M., Piga, D., Venturini, F., & Deriu, M. A. (2021). A Model-Agnostic Algorithm for Bayes Error Determination in Binary Classification. Algorithms, 14(11), 301.
@article{michelucci2021model,
title = {A Model-Agnostic Algorithm for Bayes Error Determination in Binary Classification},
author = {Michelucci, Umberto and Sperti, Michela and Piga, Dario and Venturini, Francesca and Deriu, Marco A},
journal = {Algorithms},
volume = {14},
number = {11},
pages = {301},
year = {2021},
bibtex_show = {true},
abbr = {Theory},
publisher = {Multidisciplinary Digital Publishing Institute},
topic = {mltheory}
}

4. Michelucci, U., & Venturini, F. (2017). Novel semi-parametric algorithm for interference-immune tunable absorption spectroscopy gas Sensing. Sensors, 17(10), 2281.
@article{michelucci2017novel,
title = {Novel semi-parametric algorithm for interference-immune tunable absorption spectroscopy gas Sensing},
author = {Michelucci, Umberto and Venturini, Francesca},
journal = {Sensors},
volume = {17},
number = {10},
pages = {2281},
year = {2017},
publisher = {Multidisciplinary Digital Publishing Institute},
bibtex_show = {true},
topic = {mltheory},
abbr = {Algorithms}
}