Covid-19 Spread Analysis using Python
In this blog , we will project the total number of confirmed cases of COVID-19 for a country .
First of all , Import required libraries
import pandas as pd import matplotlib.pyplot as plt import plotly.express as px import numpy as np import plotly import plotly.graph_objects as go from plotly.subplots import make_subplots
import requests # Getting Data url_request = requests.get("covid-data.json") url_json = url_request.json() df = pd.DataFrame(url_json['features'])
import datetime as dt # a. transforming data data_list = df['attributes'].tolist() data = pd.DataFrame(data_list) data.set_index('OBJECTID') data = data[['Province_State','Country_Region','Last_Update','Lat','Long_','Confirmed','Recovered','Deaths','Active']] data.columns = ('State','Country','Last Update','Lat','Long','Confirmed','Recovered','Deaths','Active') data['State'].fillna(value = '', inplace = True) data # b. cleaning data def convert_time(t): t = int(t) return dt.datetime.fromtimestamp(t) data = data.dropna(subset = ['Last Update']) data['Last Update'] = data['Last Update']/1000 data['Last Update'] = data['Last Update'].apply(convert_time) data
State | Country | Last Update | Lat | Long | Confirmed | Recovered | Deaths | Active | |
---|---|---|---|---|---|---|---|---|---|
0 | Afghanistan | 2021-04-20 14:50:41 | 33.939110 | 67.709953 | 57898 | 52272 | 2546 | 3080 | |
1 | Albania | 2021-04-20 14:50:41 | 41.153300 | 20.168300 | 129694 | 102171 | 2347 | 25176 | |
2 | Algeria | 2021-04-20 14:50:41 | 28.033900 | 1.659600 | 119805 | 83514 | 3160 | 33131 | |
3 | Andorra | 2021-04-20 14:50:41 | 42.506300 | 1.521800 | 12805 | 12203 | 123 | 479 | |
4 | Angola | 2021-04-20 14:50:41 | -11.202700 | 17.873900 | 24518 | 22600 | 563 | 1355 | |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
761 | Ohio | US | 2021-04-20 14:50:41 | 40.388783 | -82.764915 | 1054807 | 0 | 18991 | 0 |
762 | Alabama | US | 2021-04-20 14:50:41 | 32.318200 | -86.902300 | 522512 | 0 | 10790 | 0 |
763 | North Carolina | US | 2021-04-20 14:50:41 | 35.630066 | -79.806419 | 949366 | 0 | 12418 | 0 |
764 | District of Columbia | US | 2021-04-20 14:50:41 | 38.897438 | -77.026817 | 46740 | 0 | 1096 | 0 |
765 | Maine | US | 2021-04-20 14:50:41 | 44.693947 | -69.381927 | 57545 | 0 | 767 | 0 |
764 rows × 9 columns
# a. Top 10 confirmed countries (Bubble plot) top10_confirmed = pd.DataFrame(data.groupby('Country')['Confirmed'].sum().nlargest(10).sort_values(ascending = False)) fig1 = px.scatter(top10_confirmed, x = top10_confirmed.index, y = 'Confirmed', size = 'Confirmed', size_max = 120, color = top10_confirmed.index, title = 'Top 10 Confirmed Cases Countries') fig1.show()