• Lucas Boettcher

In this weeks's blog post, we have the pleasure to take look at the prediction methodology of "GNTM team". Their autoregressive-moving-average model is currently outperforming other prediction frameworks (see our prediction score table).

Our team consists of five ETH Zürich Master’s students from Greece:

Iliana Papadopoulou (team leader) obtained a Diploma in Electrical & Computer Engineering and has been a research member at Diana Bioinformatics lab. She is a master student in Computational Biology & Bioinformatics. Her interests are predictive models for therapeutic drugs and cancer research. She is about to start an internship at Roche.

Chrysa Papadopoulou completed her Diploma in Electrical and Computer Engineering. She was involved in the comparison of miRNA-mRNA target interactions as a research member at Diana Bioinformatics group. She is a master student in Computational Biology & Bioinformatics. Currently, she is working on the inference of B cell phylogenetic networks.

Athina Nisioti has received a degree in Mathematics. She is currently in her second semester of the Data Science Master’s program. She is particularly interested in applications of statistics and machine learning.

Dimitrios Gkouletsos has received a Mechanical Engineering Diploma and has worked as a Research Engineer in CERTH Greece. Currently, he is pursuing his Master Degree of Robotics, Systems & Control. His main interests include advanced control theory,

optimization and design with applications in intelligent robots, UAVs and refrigeration cycles.

Marina Panteli has received a Chemical Engineering Diploma and is now a Computational Biology and Bioinformatics Master’s student. She is mainly interested in applying computational techniques in innovative areas of synthetic biology and in studying population dynamics of infectious diseases and ways of limiting their spread.

The applied approach is based on a statistical model for analyzing and forecasting time-series data, called Autoregressive Integrated Moving Average model (ARIMA(p,d,q)). Since the cases in the data are presented cumulatively, the time-series is not stationary and we therefore need to use the “integrated” part of the ARIMA model. The parameter d of the number of differences between consecutive observations should be estimated to obtain a stationary time-series. As a guarantee of stationarity we use the augmented-Dickey Fuller statistical test in α=0.1 significance level.

For the estimation of the remaining parameters of the model we use the Akaike information criterion (AIC) in a grid search provided by auto_arima, a function available in Python.In order to avoid overfitting, we are only restricted to small values of p and q. This is extremely helpful especially in the countries with shorter time series. Apart from the point predictions for the confirmed, recovered and deceased cases we also calculate the 95% confidence intervals which give us a better overview of the uncertainty of the predicted values. Currently, we are in the process of enriching our predictions by considering adding more groups of interest, such as the number of people getting tested for COVID-19.

We are very intrigued by the idea of participating in a project relevant to this pandemic which has afflicted the entire globe and ignited the scientific community. Our passion for Data Science in employing and enhancing existing tools, along with our prior experience, has motivated us to actively participate in the Epidemic Datathon and contribute in the scientific sphere during these seemingly unpredictable times.

With respect towards social distancing etiquette, we have been working very hard and have dedicated many hours on video-calls in order to develop the most accurate and effective prediction model.

P.S. If you are curious about our team name… it comes from “Greece’s Next Top (epidemic) Model

The relevant code and additional supporting material are provided in the following link:

342 views0 comments
  • Lucas Boettcher

Updated: Apr 19, 2020

Three weeks after the launch of Epidemic Datathon the team "stayhome" has the highest prediction score due to their accurate case number predictions for a large number of countries. In this blog post, we want to give the members of the team "stayhome" a chance to introduce themselves and briefly describe their methodology.

About the team "stayhome"

We are Stefan Strub (25) and Hannes Löbner (24), two students who just started their PhD in the beginning of the year in signal processing of gravitational wave recordings and medical physics at DERDW and DITET at ETH Zurich. We both started studying physics together at ETH back in 2014 and except for an exchange semester abroad in Taiwan and Stockholm, we have been good friends and made it through the Bachelor and Master together. Our first programming experience was in the second semester with “Numerische Methoden für Physiker”, which opened a new and in the beginning quite difficult world to us. However finishing our Masterthesis in the subjects of simulation and optimization we feel quite comfortable now in the programming world.

Predicting case numbers

There are basically two code versions: The first version works in a script manner and is our main code, and the other one looks nice and is object oriented. Our predictions are based on the Johns Hopkins University datasets.

All codes are publicly accessible on:

Due to the current lockdown, we were not able to meet in person, and so basically everyone started on his own to get an overview of the data, how it is organized and how to handle it. The class structure of the second version is not yet implemented in the first code, but will be in the next days. As the quality of the data itself is bad, in regard of representing the actual cases (due to insufficient testing or unreliable communication), we refrained from implementing an actual model with R0, R1, … etc. Further, as a the developments in each country are quite different, and would take a lot of time to implement factors such as overfilled hospitals or age-depended demographics, we do not derive the development of one country to another. Instead we fitted exponential functions to the last 5 days of the averaged new confirmed cases for each country. This enables us to predict the average new confirmed cases and therefore predict the number of total confirmed cases.

Predicting the case fatality rate

For predicting the number of deaths we use grid search, which is a tool from inverse theory, to estimate the death latency and final case fatality rate (CRF) of the disease. Knowing these two numbers we are able to time shift and shrink the averaged new confirmed cases in order to match the average new deaths. Using this method enables us to predict the deaths based on the confirmed cases knowing that these two numbers correlate. This method has the advantage that new developments of the outbreak, for example because of a look down, which is already showing effect in the numbers of confirmed cases can hopefully predict the same development for the number of deaths even before they show an effect.

190 views0 comments
2/2 2020. For Questions - Contact us by email