Monday, March 23, 2020

COVID-19 data analysis using Pentaho tools..



The world is going under pandemic and is being caused by novel corona virus (disease is COVID-19). In order to understand how it is impacted around the world, JHU's Corona Virus Research center, has provided data sets.
Data Source: https://data.humdata.org/dataset/novel-coronavirus-2019-ncov-cases

For the analysis here, have chosen global narrow data sets (dates are in one column) for ETL processing (they have also data sets where each date is a column).


Pentaho Data Integration used here. A job was designed to download the files (Confirmed, Recovered, Deaths). Also a transformation was created to load data into a MySQL table. Later used DSW [Data Source Wizard] from PUC[Pentaho User Console], which is a Mondrian model based model and generated reporting Pentaho Analyzer (PAZ) reports to put under PDD (Pentaho Designer Dashboard). More snapshot of the process is below here.

ETL via Pentaho Data Integration:

















Dash Board for analysis:

As you can see there are #3 reports are collected, which are controlled through two dashboard prompts (country and date). As clear, from confirmed cases China's data is already flatten (stable health), where as other countries are going upward. The next prominent country is "Italy" and so also we have seen many deaths there and the improvement on medical recovery is still challenging (recovered).

Mar22, 2020




Mar27, 2020

As you can see, US and Italy has surpassed China on confirmed cases and deaths, but yet to see recovery numbers to grow higher.


Hoping the recovery curve will be uplifted in the weeks to come (sooner).

Apr 3, 2020

You can see the numbers are rising still in confirmed cases (scaled independently to each country). It will still take some time to get both recovery and confirmed lines to merge (or come close, like ex: China). Hoping sooner.



Social Distancing Does Matter - Running a Python program via PDI to show a spread variable can make difference on infected population
- Spread =1, with no social-distancing, it infected all 100K population, 
- Spread =  0.5 [1 person infected , other 1 person maintained social distancing, that brought down infected population to 80K]
Spread = 0.25 [1 infected 3 maintained social distancing it brought down infected population to 35K].
Spread = 0.2 [1 infected 4 maintained social distancing it brought down infected population to 20K].



Apr 13, 2020:

This chart captures the daily delta on confirmed COVID-19 cases and as you can see an early sign that US has started flattening. Also other European countries, we can see similar trend. From now and next few weeks, will be better to maintain social distancing to completely flatten out these curves!

Apr 27, 2020:

The scale is independent.
The recovery curve is still progressing slowly in US. UK's recovery is very small. Germany,Spain are doing better in recovery trajectory.

July 29, 2020:

As you see now the cases have been growing in countries like Brazil, India. Here is a projection of daily delta on confirmed cases.














Here is the trend of countries by almost end of July 2020. US, India, Brazil are going upward, with Spain as well. It's clear that Germany, Italy, UK are tending towards a stable curve.



Chart on Deaths trend by country:

Sep 10, 2020:

When #of cases to project on delta count compared to previous day, India has crossed Brazil and US.




















Recovery pattern analysis: Brazil, India, Germany are following a pattern where the recovery rate is close to cases, where as the recovery rate in Spain, US, Italy  (Cases and recovery) are not close. UK recovery number can be exception to this.


















Nov 11, 2020:

The US confirmed cases are increasing pretty rapidly. The current daily numbers is more than two times the peak from Jul.
















Nov 19 2020:

Just to see 5Days Moving-Avg of confirmed cases, it's clear that in the US, cases are going up.



















Apr12 2021:

As world is gradually moving towards vaccination and also opening up economy, so it's a mixed outcome coming out of many countries. Cases are spiking up in India, Brazil which is clear from this, comparing the vaccinated individuals to population size.

--5Day Moving Average Confirmed cases
















--Daily Delta Confirmed cases













Apr20 2021:

As the Corona virus cases are spiking from the second wave in India, guides lines have come from CDC Guidelines on travel to India

It's going to be sometime, till we see the curve gets flatten for India.

Apr27 2021:

As you can still see the daily cases in India are still growing, with the second wave of virus, in matter of weeks cases are surging.