Naive Linear Regression Model Prediction
In this week's blog post, we present the linear-regression approach of C.F.R.S.S team.
Our team C.F.R.S.S consists of five ETH Zurich students from different backgrounds: ManHin Cheng (Master in Physics), Leonardo Fossati (Master in Physics), Philipp Rupp (Master in Electrical engineering), Antoine Suter (Bachelor in Computer Science), and Giona Sala (Bachelor in Physics). As you may have noticed that our team name corresponds to the first letters of our names.
You may be surprised by the method we are using currently for the prediction, it is simply just a linear regression model although we are still working for other better models to capture more features. In this linear model, we assumed the data are evolved under a short memory Markov chain process. In the 2-day prediction, we simply do a linear fitting for the last 5 days data and extrapolate to forecast the total confirmed, total recovered, total death after 2 days separately. For a prediction in a longer period, a simple linear fit may not capture the data the best and hence the linear regression model have a sightly modification for this. Instead of a direct linear fit in the future, we predict day by day consecutively. i.e. A linear fit using data from the day (t-5) to the day (t-1) to predict day t, then use the data from the day (t-4) to the day t to obtain the value at day (t+1), until we get the date required.
Until this current point, our simple naive model treats confirmed, recovered, and death independently which shouldn’t be the case. There at least one trivial constraint needs to be satisfied that is the total number of the confirmed case must not be less than the sum of the total recovered case and total death case. By considering this constraint, the predicted data will have a slightly modification if it violates the rule by adding some random noise to reduce the number of death and recovered over-predicted.
As a comment for this linear regression model to be successful give a clue that this epidemic is still seriously affecting our daily life and we should continuous to keep the social distance. Some of you may have also notice that some countries start to ease the lockdown situation which turns out that the infection rate starts to increase again and could probably resulting in their second wave of virus cases. Easing of lockdown should be considered only when there is no new local infection case for at least 2 weeks due to its highly spreading power.
Lastly, we are interested in how modeling can accurately forecast real-life data and we are still working with some another model and hope to include also mobility for the predication to make our model more accurate in the future. We hope to get more insight on better modeling skills through this Datathon and the course goes on.
If you are interested, you may take a look at our current linear regression model at: