Pandas Advanced DataFrame Operations
Hello guys!,here we will work with some bsic operations of pandas,we have learn the pandas operations in last modules.so this is now the 2nd module in which we will go through Pandas Merging, Joining, and Concatenating .so rest we will disscuss in another module.
We can join, merge, and concat dataframe using different methods. In Dataframe df.merge(),df.join(), and df.concat() methods help in joining, merging and concating different dataframe.
concat() function helps in concatenating a dataframe. We can concat a dataframe in many different ways, they are:
# importing pandas module import pandas as pd # Define a dictionary containing employee data df1 = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'], 'Age':[27, 24, 22, 32], 'Address':['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'], 'Qualification':['Msc', 'MA', 'MCA', 'Phd']} # Define a dictionary containing employee data df2 = {'Name':['Abhi', 'Ayushi', 'Dhiraj', 'Hitesh'], 'Age':[17, 14, 12, 52], 'Address':['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'], 'Qualification':['Btech', 'B.A', 'Bcom', 'B.hons']} d1 = pd.DataFrame(df1,index=[0, 1, 2, 3]) # Convert the dictionary into DataFrame d2 = pd.DataFrame(df2, index=[2, 5, 6, 7]) print(d1, "\n\n", d2)
Name Age Address Qualification 0 Jai 27 Nagpur Msc 1 Princi 24 Kanpur MA 2 Gaurav 22 Allahabad MCA 3 Anuj 32 Kannuaj Phd Name Age Address Qualification 2 Abhi 17 Nagpur Btech 5 Ayushi 14 Kanpur B.A 6 Dhiraj 12 Allahabad Bcom 7 Hitesh 52 Kannuaj B.hons
# using a .concat() method frames = [d1, d2] tabular = pd.concat(frames) tabular
Name | Age | Address | Qualification | |
---|---|---|---|---|
0 | Jai | 27 | Nagpur | Msc |
1 | Princi | 24 | Kanpur | MA |
2 | Gaurav | 22 | Allahabad | MCA |
3 | Anuj | 32 | Kannuaj | Phd |
2 | Abhi | 17 | Nagpur | Btech |
5 | Ayushi | 14 | Kanpur | B.A |
6 | Dhiraj | 12 | Allahabad | Bcom |
7 | Hitesh | 52 | Kannuaj | B.hons |
In concat dataframe, it set different logic on axes. We can set axes in the following three ways:
dff1 = pd.DataFrame(df1,index=[0, 1, 2, 3]) # Convert the dictionary into DataFrame dff2 = pd.DataFrame(df2, index=[2, 3, 4, 5]) inner = pd.concat([dff1, dff2], axis=1, join='inner') inner
Name | Age | Address | Qualification | Name | Age | Address | Qualification | |
---|---|---|---|---|---|---|---|---|
2 | Gaurav | 22 | Allahabad | MCA | Abhi | 17 | Nagpur | Btech |
3 | Anuj | 32 | Kannuaj | Phd | Ayushi | 14 | Kanpur | B.A |
# using a .concat for # union of dataframe outer = pd.concat([dff1, dff2], axis=1, sort=False) outer
Name | Age | Address | Qualification | Name | Age | Address | Qualification | |
---|---|---|---|---|---|---|---|---|
0 | Jai | 27.0 | Nagpur | Msc | NaN | NaN | NaN | NaN |
1 | Princi | 24.0 | Kanpur | MA | NaN | NaN | NaN | NaN |
2 | Gaurav | 22.0 | Allahabad | MCA | Abhi | 17.0 | Nagpur | Btech |
3 | Anuj | 32.0 | Kannuaj | Phd | Ayushi | 14.0 | Kanpur | B.A |
4 | NaN | NaN | NaN | NaN | Dhiraj | 12.0 | Allahabad | Bcom |
5 | NaN | NaN | NaN | NaN | Hitesh | 52.0 | Kannuaj | B.hons |
In order to concat a dataframe, we use .append() function this function concatenate along axis=0, namely the index. This function exist before .concat.
print(d1, "\n\n", d2) app = d1.append(d2) app
Name Age Address Qualification 0 Jai 27 Nagpur Msc 1 Princi 24 Kanpur MA 2 Gaurav 22 Allahabad MCA 3 Anuj 32 Kannuaj Phd Name Age Address Qualification 2 Abhi 17 Nagpur Btech 5 Ayushi 14 Kanpur B.A 6 Dhiraj 12 Allahabad Bcom 7 Hitesh 52 Kannuaj B.hons
Name | Age | Address | Qualification | |
---|---|---|---|---|
0 | Jai | 27 | Nagpur | Msc |
1 | Princi | 24 | Kanpur | MA |
2 | Gaurav | 22 | Allahabad | MCA |
3 | Anuj | 32 | Kannuaj | Phd |
2 | Abhi | 17 | Nagpur | Btech |
5 | Ayushi | 14 | Kanpur | B.A |
6 | Dhiraj | 12 | Allahabad | Bcom |
7 | Hitesh | 52 | Kannuaj | B.hons |
In order to concat dataframe with group keys, we override the column names with the use of the keys argument. Keys argument is to override the column names when creating a new DataFrame based on existing Series.
frames = [d1, d2 ] grp = pd.concat(frames, keys=['x', 'y']) grp
Name | Age | Address | Qualification | ||
---|---|---|---|---|---|
x | 0 | Jai | 27 | Nagpur | Msc |
1 | Princi | 24 | Kanpur | MA | |
2 | Gaurav | 22 | Allahabad | MCA | |
3 | Anuj | 32 | Kannuaj | Phd | |
y | 2 | Abhi | 17 | Nagpur | Btech |
5 | Ayushi | 14 | Kanpur | B.A | |
6 | Dhiraj | 12 | Allahabad | Bcom | |
7 | Hitesh | 52 | Kannuaj | B.hons |
User can concatenate a mix of Series and DataFrame. The Series will be transformed to DataFrame with the column name as the name of the Series.
# importing pandas module import pandas as pd # Define a dictionary containing employee data data1 = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'], 'Age':[27, 24, 22, 32], 'Address':['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'], 'Qualification':['Msc', 'MA', 'MCA', 'Phd']} # Convert the dictionary into DataFrame df = pd.DataFrame(data1,index=[0, 1, 2, 3]) # creating a series s1 = pd.Series([1000, 2000, 3000, 4000], name='Salary') print(df, "\n\n", s1) # combining series and dataframe mix = pd.concat([df, s1], axis=1) mix
Name Age Address Qualification 0 Jai 27 Nagpur Msc 1 Princi 24 Kanpur MA 2 Gaurav 22 Allahabad MCA 3 Anuj 32 Kannuaj Phd 0 1000 1 2000 2 3000 3 4000 Name: Salary, dtype: int64
Name | Age | Address | Qualification | Salary | |
---|---|---|---|---|---|
0 | Jai | 27 | Nagpur | Msc | 1000 |
1 | Princi | 24 | Kanpur | MA | 2000 |
2 | Gaurav | 22 | Allahabad | MCA | 3000 |
3 | Anuj | 32 | Kannuaj | Phd | 4000 |
Pandas provide a single function, merge(), as the entry point for all standard database join operations between DataFrame objects. There are four basic ways to handle the join (inner, left, right, and outer), depending on which rows must retain their data.
# importing pandas module import pandas as pd # Define a dictionary containing employee data data1 = {'key': ['K0', 'K1', 'K2', 'K3'], 'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'], 'Age':[27, 24, 22, 32],} # Define a dictionary containing employee data data2 = {'key': ['K0', 'K1', 'K2', 'K3'], 'Address':['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'], 'Qualification':['Btech', 'B.A', 'Bcom', 'B.hons']} # Convert the dictionary into DataFrame df = pd.DataFrame(data1) # Convert the dictionary into DataFrame df1 = pd.DataFrame(data2) print(df, "\n\n", df1) key_merge = pd.merge(df, df1, on='key') key_merge
key Name Age 0 K0 Jai 27 1 K1 Princi 24 2 K2 Gaurav 22 3 K3 Anuj 32 key Address Qualification 0 K0 Nagpur Btech 1 K1 Kanpur B.A 2 K2 Allahabad Bcom 3 K3 Kannuaj B.hons
key | Name | Age | Address | Qualification | |
---|---|---|---|---|---|
0 | K0 | Jai | 27 | Nagpur | Btech |
1 | K1 | Princi | 24 | Kanpur | B.A |
2 | K2 | Gaurav | 22 | Allahabad | Bcom |
3 | K3 | Anuj | 32 | Kannuaj | B.hons |
# importing pandas module import pandas as pd # Define a dictionary containing employee data data1 = {'key': ['K0', 'K1', 'K2', 'K3'], 'key1': ['K0', 'K1', 'K0', 'K1'], 'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'], 'Age':[27, 24, 22, 32],} # Define a dictionary containing employee data data2 = {'key': ['K0', 'K1', 'K2', 'K3'], 'key1': ['K0', 'K0', 'K0', 'K0'], 'Address':['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'], 'Qualification':['Btech', 'B.A', 'Bcom', 'B.hons']} # Convert the dictionary into DataFrame df = pd.DataFrame(data1) # Convert the dictionary into DataFrame df1 = pd.DataFrame(data2) print(df, "\n\n", df1) #Now we set how = 'left' in order to use keys from left frame only. # using keys from left frame left = pd.merge(df, df1, how='left', on=['key', 'key1']) left
key key1 Name Age 0 K0 K0 Jai 27 1 K1 K1 Princi 24 2 K2 K0 Gaurav 22 3 K3 K1 Anuj 32 key key1 Address Qualification 0 K0 K0 Nagpur Btech 1 K1 K0 Kanpur B.A 2 K2 K0 Allahabad Bcom 3 K3 K0 Kannuaj B.hons
key | key1 | Name | Age | Address | Qualification | |
---|---|---|---|---|---|---|
0 | K0 | K0 | Jai | 27 | Nagpur | Btech |
1 | K1 | K1 | Princi | 24 | NaN | NaN |
2 | K2 | K0 | Gaurav | 22 | Allahabad | Bcom |
3 | K3 | K1 | Anuj | 32 | NaN | NaN |
# using keys from right frame right = pd.merge(df, df1, how='right', on=['key', 'key1']) right
key | key1 | Name | Age | Address | Qualification | |
---|---|---|---|---|---|---|
0 | K0 | K0 | Jai | 27.0 | Nagpur | Btech |
1 | K2 | K0 | Gaurav | 22.0 | Allahabad | Bcom |
2 | K1 | K0 | NaN | NaN | Kanpur | B.A |
3 | K3 | K0 | NaN | NaN | Kannuaj | B.hons |
# getting intersection of keys inner = pd.merge(df, df1, how='inner', on=['key', 'key1']) inner
key | key1 | Name | Age | Address | Qualification | |
---|---|---|---|---|---|---|
0 | K0 | K0 | Jai | 27 | Nagpur | Btech |
1 | K2 | K0 | Gaurav | 22 | Allahabad | Bcom |
# getting union of keys outer = pd.merge(df, df1, how='outer', on=['key', 'key1']) outer
key | key1 | Name | Age | Address | Qualification | |
---|---|---|---|---|---|---|
0 | K0 | K0 | Jai | 27.0 | Nagpur | Btech |
1 | K1 | K1 | Princi | 24.0 | NaN | NaN |
2 | K2 | K0 | Gaurav | 22.0 | Allahabad | Bcom |
3 | K3 | K1 | Anuj | 32.0 | NaN | NaN |
4 | K1 | K0 | NaN | NaN | Kanpur | B.A |
5 | K3 | K0 | NaN | NaN | Kannuaj | B.hons |
In order to join dataframe, we use .join() function this function is used for combining the columns of two potentially differently-indexed DataFrames into a single result DataFrame.
# importing pandas module import pandas as pd # Define a dictionary containing employee data data1 = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'], 'Age':[27, 24, 22, 32]} # Define a dictionary containing employee data data2 = {'Address':['Allahabad', 'Kannuaj', 'Allahabad', 'Kannuaj'], 'Qualification':['MCA', 'Phd', 'Bcom', 'B.hons']} # Convert the dictionary into DataFrame df = pd.DataFrame(data1,index=['K0', 'K1', 'K2', 'K3']) # Convert the dictionary into DataFrame df1 = pd.DataFrame(data2, index=['K0', 'K2', 'K3', 'K4']) print(df, "\n\n", df1) # joining dataframe jo = df.join(df1) jo
Name Age K0 Jai 27 K1 Princi 24 K2 Gaurav 22 K3 Anuj 32 Address Qualification K0 Allahabad MCA K2 Kannuaj Phd K3 Allahabad Bcom K4 Kannuaj B.hons
Name | Age | Address | Qualification | |
---|---|---|---|---|
K0 | Jai | 27 | Allahabad | MCA |
K1 | Princi | 24 | NaN | NaN |
K2 | Gaurav | 22 | Kannuaj | Phd |
K3 | Anuj | 32 | Allahabad | Bcom |