• For any query, contact us at
  • +91-9872993883
  • +91-8283824812
  • info@ris-ai.com

Hello guys!,here we will work with some bsic operations of pandas,we have learn the pandas operations in last modules.so this is now the 2nd module in which we will go through Pandas Merging, Joining, and Concatenating .so rest we will disscuss in another module.

Pandas DataFrame Operation 2

We can join, merge, and concat dataframe using different methods. In Dataframe df.merge(),df.join(), and df.concat() methods help in joining, merging and concating different dataframe.

Concatenating

concat() function helps in concatenating a dataframe. We can concat a dataframe in many different ways, they are:

  1. Concatenating DataFrame using .concat()
  2. Concatenating DataFrame by setting logic on axes
  3. Concatenating DataFrame using .append()
  4. Concatenating DataFrame by ignoring indexes
  5. Concatenating DataFrame with group keys
  6. Concatenating with mixed ndims

1. Concatenating DataFrame using .concat() :

In [27]:
# importing pandas module
import pandas as pd
 
# Define a dictionary containing employee data 
df1 = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],
        'Age':[27, 24, 22, 32],
        'Address':['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'],
        'Qualification':['Msc', 'MA', 'MCA', 'Phd']}
   
# Define a dictionary containing employee data 
df2 = {'Name':['Abhi', 'Ayushi', 'Dhiraj', 'Hitesh'],
        'Age':[17, 14, 12, 52],
        'Address':['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'],
        'Qualification':['Btech', 'B.A', 'Bcom', 'B.hons']}
  
d1 = pd.DataFrame(df1,index=[0, 1, 2, 3])
 
# Convert the dictionary into DataFrame  
d2 = pd.DataFrame(df2, index=[2, 5, 6, 7])
 
print(d1, "\n\n", d2)
     Name  Age    Address Qualification
0     Jai   27     Nagpur           Msc
1  Princi   24     Kanpur            MA
2  Gaurav   22  Allahabad           MCA
3    Anuj   32    Kannuaj           Phd 

      Name  Age    Address Qualification
2    Abhi   17     Nagpur         Btech
5  Ayushi   14     Kanpur           B.A
6  Dhiraj   12  Allahabad          Bcom
7  Hitesh   52    Kannuaj        B.hons
In [28]:
# using a .concat() method
frames = [d1, d2]
 
tabular = pd.concat(frames)
tabular
Out[28]:
Name Age Address Qualification
0 Jai 27 Nagpur Msc
1 Princi 24 Kanpur MA
2 Gaurav 22 Allahabad MCA
3 Anuj 32 Kannuaj Phd
2 Abhi 17 Nagpur Btech
5 Ayushi 14 Kanpur B.A
6 Dhiraj 12 Allahabad Bcom
7 Hitesh 52 Kannuaj B.hons

2.Concatenating DataFrame by setting logic on axes :

In concat dataframe, it set different logic on axes. We can set axes in the following three ways:

  1. Taking the union of them all, join='outer'. This is the default option as it results in zero information loss.
  2. Taking the intersection, join='inner'.
In [24]:
dff1 = pd.DataFrame(df1,index=[0, 1, 2, 3])
# Convert the dictionary into DataFrame  
dff2 = pd.DataFrame(df2, index=[2, 3, 4, 5])
inner = pd.concat([dff1, dff2], axis=1, join='inner')
inner
Out[24]:
Name Age Address Qualification Name Age Address Qualification
2 Gaurav 22 Allahabad MCA Abhi 17 Nagpur Btech
3 Anuj 32 Kannuaj Phd Ayushi 14 Kanpur B.A
In [25]:
# using a .concat for
# union of dataframe
outer = pd.concat([dff1, dff2], axis=1, sort=False)
outer
Out[25]:
Name Age Address Qualification Name Age Address Qualification
0 Jai 27.0 Nagpur Msc NaN NaN NaN NaN
1 Princi 24.0 Kanpur MA NaN NaN NaN NaN
2 Gaurav 22.0 Allahabad MCA Abhi 17.0 Nagpur Btech
3 Anuj 32.0 Kannuaj Phd Ayushi 14.0 Kanpur B.A
4 NaN NaN NaN NaN Dhiraj 12.0 Allahabad Bcom
5 NaN NaN NaN NaN Hitesh 52.0 Kannuaj B.hons

3.Concatenating DataFrame using .append()

In order to concat a dataframe, we use .append() function this function concatenate along axis=0, namely the index. This function exist before .concat.

In [30]:
print(d1, "\n\n", d2)
app = d1.append(d2)
app
     Name  Age    Address Qualification
0     Jai   27     Nagpur           Msc
1  Princi   24     Kanpur            MA
2  Gaurav   22  Allahabad           MCA
3    Anuj   32    Kannuaj           Phd 

      Name  Age    Address Qualification
2    Abhi   17     Nagpur         Btech
5  Ayushi   14     Kanpur           B.A
6  Dhiraj   12  Allahabad          Bcom
7  Hitesh   52    Kannuaj        B.hons
Out[30]:
Name Age Address Qualification
0 Jai 27 Nagpur Msc
1 Princi 24 Kanpur MA
2 Gaurav 22 Allahabad MCA
3 Anuj 32 Kannuaj Phd
2 Abhi 17 Nagpur Btech
5 Ayushi 14 Kanpur B.A
6 Dhiraj 12 Allahabad Bcom
7 Hitesh 52 Kannuaj B.hons

4.Concatenating DataFrame with group keys :

In order to concat dataframe with group keys, we override the column names with the use of the keys argument. Keys argument is to override the column names when creating a new DataFrame based on existing Series.

In [31]:
frames = [d1, d2 ]
 
grp = pd.concat(frames, keys=['x', 'y'])
grp
Out[31]:
Name Age Address Qualification
x 0 Jai 27 Nagpur Msc
1 Princi 24 Kanpur MA
2 Gaurav 22 Allahabad MCA
3 Anuj 32 Kannuaj Phd
y 2 Abhi 17 Nagpur Btech
5 Ayushi 14 Kanpur B.A
6 Dhiraj 12 Allahabad Bcom
7 Hitesh 52 Kannuaj B.hons

5.Concatenating with mixed ndims :

User can concatenate a mix of Series and DataFrame. The Series will be transformed to DataFrame with the column name as the name of the Series.

In [32]:
# importing pandas module
import pandas as pd
 
# Define a dictionary containing employee data 
data1 = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],
        'Age':[27, 24, 22, 32],
        'Address':['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'],
        'Qualification':['Msc', 'MA', 'MCA', 'Phd']}
   
# Convert the dictionary into DataFrame  
df = pd.DataFrame(data1,index=[0, 1, 2, 3])
 
# creating a series
s1 = pd.Series([1000, 2000, 3000, 4000], name='Salary')
 
print(df, "\n\n", s1)

# combining series and dataframe
mix = pd.concat([df, s1], axis=1)
 
mix
     Name  Age    Address Qualification
0     Jai   27     Nagpur           Msc
1  Princi   24     Kanpur            MA
2  Gaurav   22  Allahabad           MCA
3    Anuj   32    Kannuaj           Phd 

 0    1000
1    2000
2    3000
3    4000
Name: Salary, dtype: int64
Out[32]:
Name Age Address Qualification Salary
0 Jai 27 Nagpur Msc 1000
1 Princi 24 Kanpur MA 2000
2 Gaurav 22 Allahabad MCA 3000
3 Anuj 32 Kannuaj Phd 4000

Merging DataFrame

Pandas provide a single function, merge(), as the entry point for all standard database join operations between DataFrame objects. There are four basic ways to handle the join (inner, left, right, and outer), depending on which rows must retain their data.

Pandas DataFrame Operations
In [35]:
# importing pandas module
import pandas as pd
 
# Define a dictionary containing employee data 
data1 = {'key': ['K0', 'K1', 'K2', 'K3'],
         'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],
        'Age':[27, 24, 22, 32],}
   
# Define a dictionary containing employee data 
data2 = {'key': ['K0', 'K1', 'K2', 'K3'],
         'Address':['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'],
        'Qualification':['Btech', 'B.A', 'Bcom', 'B.hons']}
 
# Convert the dictionary into DataFrame  
df = pd.DataFrame(data1)
 
# Convert the dictionary into DataFrame  
df1 = pd.DataFrame(data2)
print(df, "\n\n", df1)

key_merge = pd.merge(df, df1, on='key')
 
key_merge 
  key    Name  Age
0  K0     Jai   27
1  K1  Princi   24
2  K2  Gaurav   22
3  K3    Anuj   32 

   key    Address Qualification
0  K0     Nagpur         Btech
1  K1     Kanpur           B.A
2  K2  Allahabad          Bcom
3  K3    Kannuaj        B.hons
Out[35]:
key Name Age Address Qualification
0 K0 Jai 27 Nagpur Btech
1 K1 Princi 24 Kanpur B.A
2 K2 Gaurav 22 Allahabad Bcom
3 K3 Anuj 32 Kannuaj B.hons
Pandas Dataframe Operations Table
In [36]:
# importing pandas module
import pandas as pd
 
# Define a dictionary containing employee data 
data1 = {'key': ['K0', 'K1', 'K2', 'K3'],
         'key1': ['K0', 'K1', 'K0', 'K1'],
         'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],
        'Age':[27, 24, 22, 32],}
   
# Define a dictionary containing employee data 
data2 = {'key': ['K0', 'K1', 'K2', 'K3'],
         'key1': ['K0', 'K0', 'K0', 'K0'],
         'Address':['Nagpur', 'Kanpur', 'Allahabad', 'Kannuaj'],
        'Qualification':['Btech', 'B.A', 'Bcom', 'B.hons']}
 
# Convert the dictionary into DataFrame  
df = pd.DataFrame(data1)
 
# Convert the dictionary into DataFrame  
df1 = pd.DataFrame(data2)
  
print(df, "\n\n", df1)

#Now we set how = 'left' in order to use keys from left frame only.
# using keys from left frame
left = pd.merge(df, df1, how='left', on=['key', 'key1'])
left
  key key1    Name  Age
0  K0   K0     Jai   27
1  K1   K1  Princi   24
2  K2   K0  Gaurav   22
3  K3   K1    Anuj   32 

   key key1    Address Qualification
0  K0   K0     Nagpur         Btech
1  K1   K0     Kanpur           B.A
2  K2   K0  Allahabad          Bcom
3  K3   K0    Kannuaj        B.hons
Out[36]:
key key1 Name Age Address Qualification
0 K0 K0 Jai 27 Nagpur Btech
1 K1 K1 Princi 24 NaN NaN
2 K2 K0 Gaurav 22 Allahabad Bcom
3 K3 K1 Anuj 32 NaN NaN
In [37]:
# using keys from right frame
right = pd.merge(df, df1, how='right', on=['key', 'key1'])
right
Out[37]:
key key1 Name Age Address Qualification
0 K0 K0 Jai 27.0 Nagpur Btech
1 K2 K0 Gaurav 22.0 Allahabad Bcom
2 K1 K0 NaN NaN Kanpur B.A
3 K3 K0 NaN NaN Kannuaj B.hons
In [38]:
# getting intersection of keys
inner = pd.merge(df, df1, how='inner', on=['key', 'key1'])
inner
Out[38]:
key key1 Name Age Address Qualification
0 K0 K0 Jai 27 Nagpur Btech
1 K2 K0 Gaurav 22 Allahabad Bcom
In [39]:
# getting union  of keys
outer = pd.merge(df, df1, how='outer', on=['key', 'key1'])
outer
Out[39]:
key key1 Name Age Address Qualification
0 K0 K0 Jai 27.0 Nagpur Btech
1 K1 K1 Princi 24.0 NaN NaN
2 K2 K0 Gaurav 22.0 Allahabad Bcom
3 K3 K1 Anuj 32.0 NaN NaN
4 K1 K0 NaN NaN Kanpur B.A
5 K3 K0 NaN NaN Kannuaj B.hons

Joining DataFrame

In order to join dataframe, we use .join() function this function is used for combining the columns of two potentially differently-indexed DataFrames into a single result DataFrame.

In [41]:
# importing pandas module
import pandas as pd
  
# Define a dictionary containing employee data 
data1 = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],
        'Age':[27, 24, 22, 32]}
    
# Define a dictionary containing employee data 
data2 = {'Address':['Allahabad', 'Kannuaj', 'Allahabad', 'Kannuaj'],
        'Qualification':['MCA', 'Phd', 'Bcom', 'B.hons']}
  
# Convert the dictionary into DataFrame  
df = pd.DataFrame(data1,index=['K0', 'K1', 'K2', 'K3'])
  
# Convert the dictionary into DataFrame  
df1 = pd.DataFrame(data2, index=['K0', 'K2', 'K3', 'K4'])
 
 
print(df, "\n\n", df1)


# joining dataframe
jo = df.join(df1)
 
jo
      Name  Age
K0     Jai   27
K1  Princi   24
K2  Gaurav   22
K3    Anuj   32 

       Address Qualification
K0  Allahabad           MCA
K2    Kannuaj           Phd
K3  Allahabad          Bcom
K4    Kannuaj        B.hons
Out[41]:
Name Age Address Qualification
K0 Jai 27 Allahabad MCA
K1 Princi 24 NaN NaN
K2 Gaurav 22 Kannuaj Phd
K3 Anuj 32 Allahabad Bcom
In [ ]:

Resources You Will Ever Need