• For any query, contact us at
  • +91-9872993883
  • +91-8283824812
  • info@ris-ai.com

Life Expectancy Analysis with Python

In the article, we will go through a Data Science Project on Life Expectancy Analysis with Python with Matplotlib (a ploting library in python).It help to plot garphs of numerical dataset. It provides an object-oriented API for embedding plots into applications using general-purpose GUI toolkits like Tkinter.

Life Expectancy Analysis with Python

Inequality Visualization in Life Expectancy across countries

In [1]:
!pip3 install matplotlib
Collecting matplotlib
  Downloading https://files.pythonhosted.org/packages/09/03/b7b30fa81cb687d1178e085d0f01111ceaea3bf81f9330c937fb6f6c8ca0/matplotlib-3.3.4-cp36-cp36m-manylinux1_x86_64.whl (11.5MB)
    100% |████████████████████████████████| 11.5MB 130kB/s ta 0:00:01
Collecting python-dateutil>=2.1 (from matplotlib)
  Using cached https://files.pythonhosted.org/packages/36/7a/87837f39d0296e723bb9b62bbb257d0355c7f6128853c78955f57342a56d/python_dateutil-2.8.2-py2.py3-none-any.whl
Collecting pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3 (from matplotlib)
  Downloading https://files.pythonhosted.org/packages/8a/bb/488841f56197b13700afd5658fc279a2025a39e22449b7cf29864669b15d/pyparsing-2.4.7-py2.py3-none-any.whl (67kB)
    100% |████████████████████████████████| 71kB 336kB/s ta 0:00:01
Collecting kiwisolver>=1.0.1 (from matplotlib)
  Downloading https://files.pythonhosted.org/packages/a7/1b/cbd8ae738719b5f41592a12057ef5442e2ed5f5cb5451f8fc7e9f8875a1a/kiwisolver-1.3.1-cp36-cp36m-manylinux1_x86_64.whl (1.1MB)
    100% |████████████████████████████████| 1.1MB 482kB/s ta 0:00:01
Collecting cycler>=0.10 (from matplotlib)
  Downloading https://files.pythonhosted.org/packages/f7/d2/e07d3ebb2bd7af696440ce7e754c59dd546ffe1bbe732c8ab68b9c834e61/cycler-0.10.0-py2.py3-none-any.whl
Collecting pillow>=6.2.0 (from matplotlib)
  Downloading https://files.pythonhosted.org/packages/df/74/4a981d12fa26b83c9230b67dee44d1361a372e0f22785f093969fd98b964/Pillow-8.3.1-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (3.0MB)
    100% |████████████████████████████████| 3.0MB 282kB/s ta 0:00:01
Collecting numpy>=1.15 (from matplotlib)
  Using cached https://files.pythonhosted.org/packages/45/b2/6c7545bb7a38754d63048c7696804a0d947328125d81bf12beaa692c3ae3/numpy-1.19.5-cp36-cp36m-manylinux1_x86_64.whl
Collecting six>=1.5 (from python-dateutil>=2.1->matplotlib)
  Using cached https://files.pythonhosted.org/packages/d9/5a/e7c31adbe875f2abbb91bd84cf2dc52d792b5a01506781dbcf25c91daf11/six-1.16.0-py2.py3-none-any.whl
Installing collected packages: six, python-dateutil, pyparsing, kiwisolver, cycler, pillow, numpy, matplotlib
Successfully installed cycler-0.10.0 kiwisolver-1.3.1 matplotlib-3.3.4 numpy-1.19.5 pillow-8.3.1 pyparsing-2.4.7 python-dateutil-2.8.2 six-1.16.0

For analysis on the dataset of Country Life Expectancy, download the life_expectancy.csv file, Then read the csv file as a dataframe and get extract data from dataframe after that perform ploting the graph in scatter form

In [31]:
import pandas as pd
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
print()
df=pd.read_csv('Life_expectancy.csv')
print(df)
print(df.values)
print(df.index.values)
print(df['Life expectancy'].mean())
df.plot(kind = 'scatter',x = 'Year',y = 'Life expectancy',color = 'red')
  
# set the title
plt.title('ScatterPlot')
  
# show the plot
plt.show()
             Entity  Year  Life expectancy
0         Australia  1802        34.049999
1         Australia  1803        34.049999
2         Australia  1804        34.049999
3         Australia  1805        34.049999
4         Australia  1806        34.049999
...             ...   ...              ...
3248  United States  2012        78.940002
3249  United States  2013        78.959999
3250  United States  2014        78.940002
3251  United States  2015        78.870003
3252  United States  2016        78.860001

[3253 rows x 3 columns]
[['Australia' 1802 34.049999]
 ['Australia' 1803 34.049999]
 ['Australia' 1804 34.049999]
 ...
 ['United States' 2014 78.940002]
 ['United States' 2015 78.870003]
 ['United States' 2016 78.860001]]
[   0    1    2 ... 3250 3251 3252]
48.68037967543806
Life Expectancy Analysis with Python Scatterplot

Perfom Bargraph ploting on Dataframe Column

In [187]:
df['Life expectancy'].plot(kind = 'bar',x = 'Year',y = 'Life expectancy',color = 'red')
  
# set the title
plt.title('BarGraph of Life expectancy')
  
plt.show()
Life Expectancy Analysis with Python Bargraph
In [7]:
df['Year'].plot(kind = 'line',x = 'Year',y = 'Life expectancy',color = 'green')
  
# set the title
plt.title('LinearGraph of year')
plt.show()
Life Expectancy Analysis with Python Linear Plot

Ploting a Bar Graph

In [17]:
df.plot.barh(stacked=True)
Out[17]:
<AxesSubplot:>
Life Expectancy Analysis with Python Bargraph

Ploting Historical Graph

In [13]:
df.plot.hist(alpha=1)
Out[13]:
<AxesSubplot:ylabel='Frequency'>
Life Expectancy Analysis with Python Historical Plotting

Ploting Box graph

In [14]:
bp = df.boxplot()
Life Expectancy Analysis with Python Box Plot

Perfoming some opertaion on dataframe like finding mean, max, min, sum and filter some data

In [20]:
mean1=df['Life expectancy'].mean()
max1=df['Life expectancy'].max()
min1=df['Life expectancy'].min()
sum1=df['Life expectancy'].sum()
print(mean1,"\n",max1,"\n",min1,"\n",sum1,'\n')
#print(df[0:200])
filter2= df["Entity"]=="India"
df1=df.where(filter2)
print(df1)
plt.bar(df1['Life expectancy'],df1['Year'],color ='maroon',width =0.2)
48.68037967543806 
 83.940002 
 8.1088362 
 158357.27508420002 

     Entity  Year  Life expectancy
0     India  1802        34.049999
1     India  1803        34.049999
2     India  1804        34.049999
3     India  1805        34.049999
4     India  1806        34.049999
...     ...   ...              ...
3248  India  2012        78.940002
3249  India  2013        78.959999
3250  India  2014        78.940002
3251  India  2015        78.870003
3252  India  2016        78.860001

[3253 rows x 3 columns]
Out[20]:
<BarContainer object of 3253 artists>
Life Expectancy Analysis with Python Bar Container
In [46]:
plt.bar(df1['Year'],df1['Life expectancy']-mean1)
Out[46]:
<BarContainer object of 3253 artists>
Life Expectancy Analysis with Python Bar Container
In [21]:
plt.bar(df1['Year'].head(100),df1['Life expectancy'].head(100)-min1,color="green")
Out[21]:
<BarContainer object of 100 artists>
Life Expectancy Analysis with Python Bar Container

Ploting Historial 2-D graph

In [22]:
plt.hist2d(df1["Life expectancy"].head(217),df1['Year'].head(217))
Out[22]:
(array([[22., 22., 21., 11.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0., 10.,  0.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  1.,  8.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0., 10.,  0.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  3.,  7.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  8.,  0.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  7., 14.,  0.,  0.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  8., 21.,  2.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0., 17.,  0.],
        [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  3., 22.]]),
 array([31.999807 , 37.0578265, 42.115846 , 47.1738655, 52.231885 ,
        57.2899045, 62.347924 , 67.4059435, 72.463963 , 77.5219825,
        82.580002 ]),
 array([1800. , 1821.6, 1843.2, 1864.8, 1886.4, 1908. , 1929.6, 1951.2,
        1972.8, 1994.4, 2016. ]),
 <matplotlib.collections.QuadMesh at 0x7f46fc73f438>)
Life Expectancy Analysis with Python 2D Graph

Discribing dataframe of integer locator values

In [105]:
df.describe().iloc[:, 1:]
Out[105]:
life_expectancy
count 217.000000
mean 34.292578
std 14.646002
min 8.108836
25% 25.252296
50% 25.442400
75% 42.860001
max 68.550003
Provide information about dataframe "info()"
checking existence of some null value "isnull()"
providing Some unique values "unique()"
In [103]:
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 3253 entries, 1300 to 3252
Data columns (total 3 columns):
 #   Column           Non-Null Count  Dtype
---  ------           --------------  -----
 0   entity           217 non-null    object
 1   year             217 non-null    float64
 2   life_expectancy  217 non-null    float64
dtypes: float64(2), object(1)
memory usage: 261.7+ KB
In [106]:
df.isnull().sum()
Out[106]:
entity             3036
year               3036
life_expectancy    3036
dtype: int64
In [114]:
df['Year'].unique()
Out[114]:
array([1802, 1803, 1804, 1805, 1806, 1807, 1808, 1809, 1810, 1811, 1812,
       1813, 1814, 1815, 1816, 1817, 1818, 1819, 1820, 1821, 1822, 1823,
       1824, 1825, 1826, 1827, 1828, 1829, 1830, 1831, 1832, 1833, 1834,
       1835, 1836, 1837, 1838, 1839, 1840, 1841, 1842, 1843, 1844, 1845,
       1846, 1847, 1848, 1849, 1850, 1851, 1852, 1853, 1854, 1855, 1856,
       1857, 1858, 1859, 1860, 1861, 1862, 1863, 1864, 1865, 1866, 1867,
       1868, 1869, 1870, 1871, 1872, 1873, 1874, 1875, 1876, 1877, 1878,
       1879, 1880, 1881, 1882, 1883, 1884, 1885, 1886, 1887, 1888, 1889,
       1890, 1891, 1892, 1893, 1894, 1895, 1896, 1897, 1898, 1899, 1900,
       1901, 1902, 1903, 1904, 1905, 1906, 1907, 1908, 1909, 1910, 1911,
       1912, 1913, 1914, 1915, 1916, 1917, 1918, 1919, 1920, 1921, 1922,
       1923, 1924, 1925, 1926, 1927, 1928, 1929, 1930, 1931, 1932, 1933,
       1934, 1935, 1936, 1937, 1938, 1939, 1940, 1941, 1942, 1943, 1944,
       1945, 1946, 1947, 1948, 1949, 1950, 1951, 1952, 1953, 1954, 1955,
       1956, 1957, 1958, 1959, 1960, 1961, 1962, 1963, 1964, 1965, 1966,
       1967, 1968, 1969, 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977,
       1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988,
       1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999,
       2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010,
       2011, 2012, 2013, 2014, 2015, 2016, 1800, 1801])
In [113]:
df['Entity'].unique()
Out[113]:
array(['Australia', 'Brazil', 'Canada', 'China', 'France', 'Germany',
       'India', 'Italy', 'Japan', 'Mexico', 'Russia', 'Spain',
       'Switzerland', 'United Kingdom', 'United States'], dtype=object)
In [115]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3253 entries, 0 to 3252
Data columns (total 3 columns):
 #   Column           Non-Null Count  Dtype
---  ------           --------------  -----
 0   Entity           3253 non-null   object
 1   Year             3253 non-null   int64
 2   Life expectancy  3253 non-null   float64
dtypes: float64(1), int64(1), object(1)
memory usage: 76.4+ KB
In [116]:
df['Life expectancy'].describe()
Out[116]:
count    3253.000000
mean       48.680380
std        17.965669
min         8.108836
25%        32.000000
50%        41.880001
75%        66.820000
max        83.940002
Name: Life expectancy, dtype: float64
In [ ]:

plot a pie graph along with given life_expectancy data over year of each country

In [33]:
plt.style.use('fivethirtyeight')
plt.figure(figsize=(10,5))
plt.title('Countries data')
plt.xlabel('Frequency')
plt.ylabel('Country')
df.Entity.value_counts().plot(kind='pie')
plt.show()
Life Expectancy Analysis with Python over the Countries

Analysis life_expectany frequency over the year and plot a bar graph

In [161]:
plt.style.use('fivethirtyeight')
plt.figure(figsize=(10,5))
plt.xlabel("Year")
plt.ylabel('Frequency')
plt.title("Frequency data per year")
df.Year.plot(kind="hist",rwidth=0.2)
plt.show()
Life Expectancy Analysis with Python Frequency over the year

Analysis the Life_expectancy data over the country

In [180]:
print(df)
country=df['entity'].unique()
for i in country:
    dta=df[["life_expectancy","year"]][df['entity']==i]

    plt.figure(figsize=(7,7))
    plt.plot(dta['year'], dta['life_expectancy'], color='b', linewidth=1)
    plt.title(f"{i}'s Life expectancy")
    plt.xlabel('year')
    plt.ylabel('Life Expectancy')
    plt.show()
print('\n')
             entity  year  life_expectancy
0         Australia  1802        34.049999
1         Australia  1803        34.049999
2         Australia  1804        34.049999
3         Australia  1805        34.049999
4         Australia  1806        34.049999
...             ...   ...              ...
3248  United States  2012        78.940002
3249  United States  2013        78.959999
3250  United States  2014        78.940002
3251  United States  2015        78.870003
3252  United States  2016        78.860001

[3253 rows x 3 columns]
Life Expectancy Analysis with Python for Australia
Life Expectancy Analysis with Python for Brazil
Life Expectancy Analysis with Python for Canada
Life Expectancy Analysis with Python for China
Life Expectancy Analysis with Python for France
Life Expectancy Analysis with Python for Germany
Life Expectancy Analysis with Python for India
Life Expectancy Analysis with Python for Italy
Life Expectancy Analysis with Python for Japan
Life Expectancy Analysis with Python for Mexico
Life Expectancy Analysis with Python for Russia
Life Expectancy Analysis with Python for Spain
Life Expectancy Analysis with Python for Switzerland
Life Expectancy Analysis with Python for UK
Life Expectancy Analysis with Python for US

Analysis the data and ploting a 3-D view over each country and year

In [34]:
import plotly.express as px
fig = px.scatter_3d(df.iloc[:3000], x='Year', y='Life expectancy',z='Entity',color='Year')
fig.show()