• Lucas Boettcher

We present the SIR model approach of Team Valencia.

My name is Francisco Collado.

I am a teacher in a high school in Valencia.- Spain. I teach Automation and Industrial Robotics.

Regarding Covid19, by those early days of March, it was already known that several Valencia CF fans who had gone to Milano to attend the Champions game against Atalanta Bergamo had become ill. Although there was still ridicule, as a class exercise we started to think about making a field hospital using containers and infrared sensors for fever, humidity for sweat and CO2 / airflow for cough, and sick breathing. So we started doing calculations to find out how many containers we would need to assemble, so how many people would get sick.

When we were leaving for the first week of March I told my students that we would reach 100,000 sick and 500 dead in one month, using a SIR model spreadsheet; we were still going for 400 infected and 8 dead. Call me exaggerated. How were we going to reach 500 dead? We weren't in China!

And it arrived on March 8: International Women's Day and hundreds of stadiums with football games. Tens of thousands of people summoned in every event, without notice from the authorities; on the contrary, several ministers encouraged attendance at feminist events. After March 8, it was passed to 1000 sick and 16 dead. Duplicated !! A qualitative and quantitative leap. That Tuesday I performed the sequence of numbers again. I forecast for the first of April 100,000 infected and 10,000 dead.

But we were in full spring festivities throughout Spain. Las Fallas de Valencia, where mass meetings of 10,000 people are held. In the worst-case scenario, a million people were killed. I stared at the screen. They were typical data from 1349, the black plague.

And the middle of March came and the order came to stay home. So I continued alone the calculations I had shown in the classroom.

I started with the SIR model. Unfortunately, the health ministry changed the methods of calculating infected about fifteen days ago, which created a question for me: to continue with the initial numbers or to modify them. I kept the initial sequence, so the forecast is not the official figure, but what it really should be.

For a future datathon, there would be several options:

A Markov process could be carried out to determine the evolution of the cycle. Or apply a new method: "Machine Learning" and analyze images of pandemic curve patterns using a bot that creates predictive behavior. For this, I would emphasize that a disease curve pattern can be found based on these variables (By country):

1.- When is the zero outbreak discovered?

2.- When are restrictive measures of confinement taken?

3- When 1000 cases of contagion have been verified.

4.- Hospital capacity in beds per 100,000 inhabitants and sanitary equipment and materials.

  • Lucas Boettcher

In this week's blog post, we present the linear-regression approach of C.F.R.S.S team.

Our team C.F.R.S.S consists of five ETH Zurich students from different backgrounds: ManHin Cheng (Master in Physics), Leonardo Fossati (Master in Physics), Philipp Rupp (Master in Electrical engineering), Antoine Suter (Bachelor in Computer Science), and Giona Sala (Bachelor in Physics). As you may have noticed that our team name corresponds to the first letters of our names.

You may be surprised by the method we are using currently for the prediction, it is simply just a linear regression model although we are still working for other better models to capture more features. In this linear model, we assumed the data are evolved under a short memory Markov chain process. In the 2-day prediction, we simply do a linear fitting for the last 5 days data and extrapolate to forecast the total confirmed, total recovered, total death after 2 days separately. For a prediction in a longer period, a simple linear fit may not capture the data the best and hence the linear regression model have a sightly modification for this. Instead of a direct linear fit in the future, we predict day by day consecutively. i.e. A linear fit using data from the day (t-5) to the day (t-1) to predict day t, then use the data from the day (t-4) to the day t to obtain the value at day (t+1), until we get the date required.

Until this current point, our simple naive model treats confirmed, recovered, and death independently which shouldn’t be the case. There at least one trivial constraint needs to be satisfied that is the total number of the confirmed case must not be less than the sum of the total recovered case and total death case. By considering this constraint, the predicted data will have a slightly modification if it violates the rule by adding some random noise to reduce the number of death and recovered over-predicted.

As a comment for this linear regression model to be successful give a clue that this epidemic is still seriously affecting our daily life and we should continuous to keep the social distance. Some of you may have also notice that some countries start to ease the lockdown situation which turns out that the infection rate starts to increase again and could probably resulting in their second wave of virus cases. Easing of lockdown should be considered only when there is no new local infection case for at least 2 weeks due to its highly spreading power.

Lastly, we are interested in how modeling can accurately forecast real-life data and we are still working with some another model and hope to include also mobility for the predication to make our model more accurate in the future. We hope to get more insight on better modeling skills through this Datathon and the course goes on.

If you are interested, you may take a look at our current linear regression model at:

Stefanelli's team uses an ARIMA-based time-series model to predict the evolution of COVID-19 case numbers.

Stefanelli’s team:

Marcello Stefanelli (33, individual participant) obtained a degree in Economics and Statistics at Università degli Studi di Pavia. I’m a Business Strategy Manager in a global consulting firm. I’m passioned about public and private health care systems management and of course data science.

The applied approach in predicting daily time series is based on a classical statistical model for analyzing and forecasting time-series data, the ARIMA (Autoregressive Integrated Moving Average). As highlighted by Columbia University Mailman School of Public Health, ARIMA model adoption in epidemiology is varied spacing from outbreak detection in the area of infectious diseases and to the evaluation of population level health interventions in the format of interrupted time series analysis.

However, there are some constraints that we have to take in account: o ARIMA models do not predict rare “black swan” events, as there is no pattern in the time series o Data sample must be consistent; the performance of the model could be biased with a reduced number of observations, as happens in the early stages of the infection I started adopting this methodology working on Italy official data with the aim of assessing the existence of some degree of correlation between COVID-19 outbreak daily observations and daily economic variables volatility (eg., stock market volatility transmission modeled with GARCH time series analysis).

What I’m interested in is predicting the daily observations considered as a “flow” variables rather than stock or cumulative variables that I’ve derived as a function of daily observations. Therefore I’ve applied ARIMA models to flow time series variables (e.g., daily cases, daily recovered cases) obtaining better performances as the number of observations increases. Below some examples of observed data and forecasts on daily time series in Italy.

This project has given me the opportunity to enrich my knowledge on how epidemiology data generation processes and how time series analysis can help to predict outbreak evolution at a broader level.

Furthermore it’s my first time in this kind of “competition” and its thrilling for me despite the lockdown and all the related impacts on our everyday life.

All my analyses are performed using GRETL, an open source econometrics software. I’m considering of extending my bottom-up approach to other countries that since now I’ve analyzed adopting a top-down approach as some other participant did.

What I’m interested in is to share my result on COVID-19 socio-economic impacts and provide policy recommendations for western governments on how to deal with pandemics in a global economy.

Main references:

Estimation of COVID-19 prevalence in Italy, Spain, and France – Z. Celyan

Mathematical Modeling and Epidemic Prediction of COVID-19 and Its Significance to Epidemic Prevention and Control Measures - Yichi Li

Application of the ARIMA model on the COVID-2019 epidemic dataset - Domenico Benvenuto

Coronavirus (COVID-19): ARIMA based time-series analysis to forecast near future - Hiteshi Tandon 2020. For Questions - Contact us by email