Life Expectancy Analysis with Python
In the article, we will go through a Data Science Project on Life Expectancy Analysis with Python with Matplotlib (a ploting library in python).It help to plot garphs of numerical dataset. It provides an object-oriented API for embedding plots into applications using general-purpose GUI toolkits like Tkinter.
!pip3 install matplotlib
Collecting matplotlib Downloading https://files.pythonhosted.org/packages/09/03/b7b30fa81cb687d1178e085d0f01111ceaea3bf81f9330c937fb6f6c8ca0/matplotlib-3.3.4-cp36-cp36m-manylinux1_x86_64.whl (11.5MB) 100% |████████████████████████████████| 11.5MB 130kB/s ta 0:00:01 Collecting python-dateutil>=2.1 (from matplotlib) Using cached https://files.pythonhosted.org/packages/36/7a/87837f39d0296e723bb9b62bbb257d0355c7f6128853c78955f57342a56d/python_dateutil-2.8.2-py2.py3-none-any.whl Collecting pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3 (from matplotlib) Downloading https://files.pythonhosted.org/packages/8a/bb/488841f56197b13700afd5658fc279a2025a39e22449b7cf29864669b15d/pyparsing-2.4.7-py2.py3-none-any.whl (67kB) 100% |████████████████████████████████| 71kB 336kB/s ta 0:00:01 Collecting kiwisolver>=1.0.1 (from matplotlib) Downloading https://files.pythonhosted.org/packages/a7/1b/cbd8ae738719b5f41592a12057ef5442e2ed5f5cb5451f8fc7e9f8875a1a/kiwisolver-1.3.1-cp36-cp36m-manylinux1_x86_64.whl (1.1MB) 100% |████████████████████████████████| 1.1MB 482kB/s ta 0:00:01 Collecting cycler>=0.10 (from matplotlib) Downloading https://files.pythonhosted.org/packages/f7/d2/e07d3ebb2bd7af696440ce7e754c59dd546ffe1bbe732c8ab68b9c834e61/cycler-0.10.0-py2.py3-none-any.whl Collecting pillow>=6.2.0 (from matplotlib) Downloading https://files.pythonhosted.org/packages/df/74/4a981d12fa26b83c9230b67dee44d1361a372e0f22785f093969fd98b964/Pillow-8.3.1-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (3.0MB) 100% |████████████████████████████████| 3.0MB 282kB/s ta 0:00:01 Collecting numpy>=1.15 (from matplotlib) Using cached https://files.pythonhosted.org/packages/45/b2/6c7545bb7a38754d63048c7696804a0d947328125d81bf12beaa692c3ae3/numpy-1.19.5-cp36-cp36m-manylinux1_x86_64.whl Collecting six>=1.5 (from python-dateutil>=2.1->matplotlib) Using cached https://files.pythonhosted.org/packages/d9/5a/e7c31adbe875f2abbb91bd84cf2dc52d792b5a01506781dbcf25c91daf11/six-1.16.0-py2.py3-none-any.whl Installing collected packages: six, python-dateutil, pyparsing, kiwisolver, cycler, pillow, numpy, matplotlib Successfully installed cycler-0.10.0 kiwisolver-1.3.1 matplotlib-3.3.4 numpy-1.19.5 pillow-8.3.1 pyparsing-2.4.7 python-dateutil-2.8.2 six-1.16.0
import pandas as pd import numpy as np import matplotlib as mpl import matplotlib.pyplot as plt print() df=pd.read_csv('Life_expectancy.csv') print(df) print(df.values) print(df.index.values) print(df['Life expectancy'].mean()) df.plot(kind = 'scatter',x = 'Year',y = 'Life expectancy',color = 'red') # set the title plt.title('ScatterPlot') # show the plot plt.show()
Entity Year Life expectancy 0 Australia 1802 34.049999 1 Australia 1803 34.049999 2 Australia 1804 34.049999 3 Australia 1805 34.049999 4 Australia 1806 34.049999 ... ... ... ... 3248 United States 2012 78.940002 3249 United States 2013 78.959999 3250 United States 2014 78.940002 3251 United States 2015 78.870003 3252 United States 2016 78.860001 [3253 rows x 3 columns] [['Australia' 1802 34.049999] ['Australia' 1803 34.049999] ['Australia' 1804 34.049999] ... ['United States' 2014 78.940002] ['United States' 2015 78.870003] ['United States' 2016 78.860001]] [ 0 1 2 ... 3250 3251 3252] 48.68037967543806
df['Life expectancy'].plot(kind = 'bar',x = 'Year',y = 'Life expectancy',color = 'red') # set the title plt.title('BarGraph of Life expectancy') plt.show()
df['Year'].plot(kind = 'line',x = 'Year',y = 'Life expectancy',color = 'green') # set the title plt.title('LinearGraph of year') plt.show()
df.plot.barh(stacked=True)
<AxesSubplot:>
df.plot.hist(alpha=1)
<AxesSubplot:ylabel='Frequency'>
bp = df.boxplot()
mean1=df['Life expectancy'].mean() max1=df['Life expectancy'].max() min1=df['Life expectancy'].min() sum1=df['Life expectancy'].sum() print(mean1,"\n",max1,"\n",min1,"\n",sum1,'\n') #print(df[0:200]) filter2= df["Entity"]=="India" df1=df.where(filter2) print(df1) plt.bar(df1['Life expectancy'],df1['Year'],color ='maroon',width =0.2)
48.68037967543806 83.940002 8.1088362 158357.27508420002 Entity Year Life expectancy 0 India 1802 34.049999 1 India 1803 34.049999 2 India 1804 34.049999 3 India 1805 34.049999 4 India 1806 34.049999 ... ... ... ... 3248 India 2012 78.940002 3249 India 2013 78.959999 3250 India 2014 78.940002 3251 India 2015 78.870003 3252 India 2016 78.860001 [3253 rows x 3 columns]
<BarContainer object of 3253 artists>
plt.bar(df1['Year'],df1['Life expectancy']-mean1)
<BarContainer object of 3253 artists>
plt.bar(df1['Year'].head(100),df1['Life expectancy'].head(100)-min1,color="green")
<BarContainer object of 100 artists>
plt.hist2d(df1["Life expectancy"].head(217),df1['Year'].head(217))
(array([[22., 22., 21., 11., 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 10., 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 1., 8., 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 10., 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 3., 7., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0., 8., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0., 7., 14., 0., 0., 0.], [ 0., 0., 0., 0., 0., 0., 8., 21., 2., 0.], [ 0., 0., 0., 0., 0., 0., 0., 0., 17., 0.], [ 0., 0., 0., 0., 0., 0., 0., 0., 3., 22.]]), array([31.999807 , 37.0578265, 42.115846 , 47.1738655, 52.231885 , 57.2899045, 62.347924 , 67.4059435, 72.463963 , 77.5219825, 82.580002 ]), array([1800. , 1821.6, 1843.2, 1864.8, 1886.4, 1908. , 1929.6, 1951.2, 1972.8, 1994.4, 2016. ]), <matplotlib.collections.QuadMesh at 0x7f46fc73f438>)
df.describe().iloc[:, 1:]
life_expectancy | |
---|---|
count | 217.000000 |
mean | 34.292578 |
std | 14.646002 |
min | 8.108836 |
25% | 25.252296 |
50% | 25.442400 |
75% | 42.860001 |
max | 68.550003 |
df.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 3253 entries, 1300 to 3252 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 entity 217 non-null object 1 year 217 non-null float64 2 life_expectancy 217 non-null float64 dtypes: float64(2), object(1) memory usage: 261.7+ KB
df.isnull().sum()
entity 3036 year 3036 life_expectancy 3036 dtype: int64
df['Year'].unique()
array([1802, 1803, 1804, 1805, 1806, 1807, 1808, 1809, 1810, 1811, 1812, 1813, 1814, 1815, 1816, 1817, 1818, 1819, 1820, 1821, 1822, 1823, 1824, 1825, 1826, 1827, 1828, 1829, 1830, 1831, 1832, 1833, 1834, 1835, 1836, 1837, 1838, 1839, 1840, 1841, 1842, 1843, 1844, 1845, 1846, 1847, 1848, 1849, 1850, 1851, 1852, 1853, 1854, 1855, 1856, 1857, 1858, 1859, 1860, 1861, 1862, 1863, 1864, 1865, 1866, 1867, 1868, 1869, 1870, 1871, 1872, 1873, 1874, 1875, 1876, 1877, 1878, 1879, 1880, 1881, 1882, 1883, 1884, 1885, 1886, 1887, 1888, 1889, 1890, 1891, 1892, 1893, 1894, 1895, 1896, 1897, 1898, 1899, 1900, 1901, 1902, 1903, 1904, 1905, 1906, 1907, 1908, 1909, 1910, 1911, 1912, 1913, 1914, 1915, 1916, 1917, 1918, 1919, 1920, 1921, 1922, 1923, 1924, 1925, 1926, 1927, 1928, 1929, 1930, 1931, 1932, 1933, 1934, 1935, 1936, 1937, 1938, 1939, 1940, 1941, 1942, 1943, 1944, 1945, 1946, 1947, 1948, 1949, 1950, 1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 1800, 1801])
df['Entity'].unique()
array(['Australia', 'Brazil', 'Canada', 'China', 'France', 'Germany', 'India', 'Italy', 'Japan', 'Mexico', 'Russia', 'Spain', 'Switzerland', 'United Kingdom', 'United States'], dtype=object)
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 3253 entries, 0 to 3252 Data columns (total 3 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Entity 3253 non-null object 1 Year 3253 non-null int64 2 Life expectancy 3253 non-null float64 dtypes: float64(1), int64(1), object(1) memory usage: 76.4+ KB
df['Life expectancy'].describe()
count 3253.000000 mean 48.680380 std 17.965669 min 8.108836 25% 32.000000 50% 41.880001 75% 66.820000 max 83.940002 Name: Life expectancy, dtype: float64
plt.style.use('fivethirtyeight') plt.figure(figsize=(10,5)) plt.title('Countries data') plt.xlabel('Frequency') plt.ylabel('Country') df.Entity.value_counts().plot(kind='pie') plt.show()
plt.style.use('fivethirtyeight') plt.figure(figsize=(10,5)) plt.xlabel("Year") plt.ylabel('Frequency') plt.title("Frequency data per year") df.Year.plot(kind="hist",rwidth=0.2) plt.show()
print(df) country=df['entity'].unique() for i in country: dta=df[["life_expectancy","year"]][df['entity']==i] plt.figure(figsize=(7,7)) plt.plot(dta['year'], dta['life_expectancy'], color='b', linewidth=1) plt.title(f"{i}'s Life expectancy") plt.xlabel('year') plt.ylabel('Life Expectancy') plt.show() print('\n')
entity year life_expectancy 0 Australia 1802 34.049999 1 Australia 1803 34.049999 2 Australia 1804 34.049999 3 Australia 1805 34.049999 4 Australia 1806 34.049999 ... ... ... ... 3248 United States 2012 78.940002 3249 United States 2013 78.959999 3250 United States 2014 78.940002 3251 United States 2015 78.870003 3252 United States 2016 78.860001 [3253 rows x 3 columns]
import plotly.express as px fig = px.scatter_3d(df.iloc[:3000], x='Year', y='Life expectancy',z='Entity',color='Year') fig.show()