- Lucas Boettcher
In this weeks's blog post, we have the pleasure to take look at the prediction methodology of "GNTM team". Their autoregressive-moving-average model is currently outperforming other prediction frameworks (see our prediction score table).
Our team consists of five ETH Zürich Master’s students from Greece:
Iliana Papadopoulou (team leader) obtained a Diploma in Electrical & Computer Engineering and has been a research member at Diana Bioinformatics lab. She is a master student in Computational Biology & Bioinformatics. Her interests are predictive models for therapeutic drugs and cancer research. She is about to start an internship at Roche.
Chrysa Papadopoulou completed her Diploma in Electrical and Computer Engineering. She was involved in the comparison of miRNA-mRNA target interactions as a research member at Diana Bioinformatics group. She is a master student in Computational Biology & Bioinformatics. Currently, she is working on the inference of B cell phylogenetic networks.
Athina Nisioti has received a degree in Mathematics. She is currently in her second semester of the Data Science Master’s program. She is particularly interested in applications of statistics and machine learning.
Dimitrios Gkouletsos has received a Mechanical Engineering Diploma and has worked as a Research Engineer in CERTH Greece. Currently, he is pursuing his Master Degree of Robotics, Systems & Control. His main interests include advanced control theory,
optimization and design with applications in intelligent robots, UAVs and refrigeration cycles.
Marina Panteli has received a Chemical Engineering Diploma and is now a Computational Biology and Bioinformatics Master’s student. She is mainly interested in applying computational techniques in innovative areas of synthetic biology and in studying population dynamics of infectious diseases and ways of limiting their spread.
The applied approach is based on a statistical model for analyzing and forecasting time-series data, called Autoregressive Integrated Moving Average model (ARIMA(p,d,q)). Since the cases in the data are presented cumulatively, the time-series is not stationary and we therefore need to use the “integrated” part of the ARIMA model. The parameter d of the number of differences between consecutive observations should be estimated to obtain a stationary time-series. As a guarantee of stationarity we use the augmented-Dickey Fuller statistical test in α=0.1 significance level.
For the estimation of the remaining parameters of the model we use the Akaike information criterion (AIC) in a grid search provided by auto_arima, a function available in Python.In order to avoid overfitting, we are only restricted to small values of p and q. This is extremely helpful especially in the countries with shorter time series. Apart from the point predictions for the confirmed, recovered and deceased cases we also calculate the 95% confidence intervals which give us a better overview of the uncertainty of the predicted values. Currently, we are in the process of enriching our predictions by considering adding more groups of interest, such as the number of people getting tested for COVID-19.
We are very intrigued by the idea of participating in a project relevant to this pandemic which has afflicted the entire globe and ignited the scientific community. Our passion for Data Science in employing and enhancing existing tools, along with our prior experience, has motivated us to actively participate in the Epidemic Datathon and contribute in the scientific sphere during these seemingly unpredictable times.
With respect towards social distancing etiquette, we have been working very hard and have dedicated many hours on video-calls in order to develop the most accurate and effective prediction model.
P.S. If you are curious about our team name… it comes from “Greece’s Next Top (epidemic) Model”
The relevant code and additional supporting material are provided in the following link: