- Lucas Boettcher
Bottom-up approach in forecasting COVID-19 outbreak adopting ARIMA time-series model
Stefanelli's team uses an ARIMA-based time-series model to predict the evolution of COVID-19 case numbers.

Stefanelli’s team:
Marcello Stefanelli (33, individual participant) obtained a degree in Economics and Statistics at Università degli Studi di Pavia. I’m a Business Strategy Manager in a global consulting firm. I’m passioned about public and private health care systems management and of course data science.
The applied approach in predicting daily time series is based on a classical statistical model for analyzing and forecasting time-series data, the ARIMA (Autoregressive Integrated Moving Average). As highlighted by Columbia University Mailman School of Public Health, ARIMA model adoption in epidemiology is varied spacing from outbreak detection in the area of infectious diseases and to the evaluation of population level health interventions in the format of interrupted time series analysis.
However, there are some constraints that we have to take in account: o ARIMA models do not predict rare “black swan” events, as there is no pattern in the time series o Data sample must be consistent; the performance of the model could be biased with a reduced number of observations, as happens in the early stages of the infection I started adopting this methodology working on Italy official data with the aim of assessing the existence of some degree of correlation between COVID-19 outbreak daily observations and daily economic variables volatility (eg., stock market volatility transmission modeled with GARCH time series analysis).
What I’m interested in is predicting the daily observations considered as a “flow” variables rather than stock or cumulative variables that I’ve derived as a function of daily observations. Therefore I’ve applied ARIMA models to flow time series variables (e.g., daily cases, daily recovered cases) obtaining better performances as the number of observations increases. Below some examples of observed data and forecasts on daily time series in Italy.
This project has given me the opportunity to enrich my knowledge on how epidemiology data generation processes and how time series analysis can help to predict outbreak evolution at a broader level.
Furthermore it’s my first time in this kind of “competition” and its thrilling for me despite the lockdown and all the related impacts on our everyday life.
All my analyses are performed using GRETL, an open source econometrics software. I’m considering of extending my bottom-up approach to other countries that since now I’ve analyzed adopting a top-down approach as some other participant did.
What I’m interested in is to share my result on COVID-19 socio-economic impacts and provide policy recommendations for western governments on how to deal with pandemics in a global economy.
Main references:
Estimation of COVID-19 prevalence in Italy, Spain, and France – Z. Celyan https://dx.doi.org/10.1016%2Fj.scitotenv.2020.138817
Mathematical Modeling and Epidemic Prediction of COVID-19 and Its Significance to Epidemic Prevention and Control Measures - Yichi Li
Application of the ARIMA model on the COVID-2019 epidemic dataset - Domenico Benvenuto
https://doi.org/10.1016/j.dib.2020.105340
Coronavirus (COVID-19): ARIMA based time-series analysis to forecast near future - Hiteshi Tandon
https://arxiv.org/ftp/arxiv/papers/2004/2004.07859.pdf