Uncategorized

Concat & Append Dataframes in Pandas

Concat & Append Dataframes in Pandas

Hi Enthusiastic Learners! While working with dataframes we do come across a problem of combining multiple dataframes together or you can say “Concat & Append Dataframes”. In this article we will be learning different conditions of combining data-sets or concat & append dataframes in different manners. Also we will learn difference between concat & append dataframes and when to use which function.

To learn how you can handle missing values in Pandas, checkout these 2 articles:

  1. Handling Missing values Part-1
  2. Handling Missing Values Part-2

Concat Dataframes

Before we jump on to the function pd.concat(), let’s first create few sample dataframes.

In [1]:
import pandas as pd
import numpy as np

df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'], 
                    'B': ['B0', 'B1', 'B2', 'B3']}) 

print("Dataframe_1")
print(df1)

df2 = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'], 
                    'D': ['D0', 'D1', 'D2', 'D3']})

print("Dataframe_2")
print(df2)
Dataframe_1
    A   B
0  A0  B0
1  A1  B1
2  A2  B2
3  A3  B3
Dataframe_2
    C   D
0  C0  D0
1  C1  D1
2  C2  D2
3  C3  D3

Syntax of concat() function is as follows:

pd.concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False,
keys=None, levels=None, names=None, verify_integrity=False,
copy=True)

Let’s try simply calling pd.concat() on both dataframes df1 & df2

In [2]:
pd.concat([df1, df2], sort=True)
Out[2]:
A B C D
0 A0 B0 NaN NaN
1 A1 B1 NaN NaN
2 A2 B2 NaN NaN
3 A3 B3 NaN NaN
0 NaN NaN C0 D0
1 NaN NaN C1 D1
2 NaN NaN C2 D2
3 NaN NaN C3 D3

Now, what happened over here is, both 1st & 2nd dataframe got joined next to each other, forming 4 columns.

We have got NaN values because in the definition of function above you can see default value for join is OUTER, so concat() function combined both dataframes while doing outer join

Now, let’s set value of JOIN to INNER

Other than that, even the index values haven’t been reset. to reset them we call .reset_index() function

In [3]:
pd.concat([df1, df2], join='inner')
Out[3]:
0
1
2
3
0
1
2
3

We, didn’t get any result because none of the datafrmaes had anything common.

Let’s create a 3rd frame with same column names & merge all 3 dataframes together.

In [4]:
df3 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'],   
                    'D': ['D4', 'D5', 'D6', 'D7']})

df3
Out[4]:
A D
0 A4 D4
1 A5 D5
2 A6 D6
3 A7 D7
In [5]:
print("Merging all 3 dataframes!!")

pd.concat([df1, df2, df3], sort=True).reset_index()
Merging all 3 dataframes!!
Out[5]:
index A B C D
0 0 A0 B0 NaN NaN
1 1 A1 B1 NaN NaN
2 2 A2 B2 NaN NaN
3 3 A3 B3 NaN NaN
4 0 NaN NaN C0 D0
5 1 NaN NaN C1 D1
6 2 NaN NaN C2 D2
7 3 NaN NaN C3 D3
8 0 A4 NaN NaN D4
9 1 A5 NaN NaN D5
10 2 A6 NaN NaN D6
11 3 A7 NaN NaN D7

As you can see 3rd dataframe’s data got added in one of the existing columns as its columns have had same names as that of existing columns.

Findings:

  1. pd.concat() combines dataframes in such a way that new column get next to each other, that is increase in number of columns.
  2. By default, it applies OUTER Join on both data-sets.
  3. If same column names are provided, data gets added to same columns for all dataframes.

Append Dataframes

pd.append() is similar to pd.concat(). The output of append() function is same as one of the basic concatenation of dataframes.

That is –> df1.append(df1) is same as pd.concat([df1,df2])

So, basically append() is a subset function of concat().

Only advantage over here is that we have to write a little lesser amount of code & it is bit faster than ‘concat()’

In [6]:
df1.append(df2, sort=True)
Out[6]:
A B C D
0 A0 B0 NaN NaN
1 A1 B1 NaN NaN
2 A2 B2 NaN NaN
3 A3 B3 NaN NaN
0 NaN NaN C0 D0
1 NaN NaN C1 D1
2 NaN NaN C2 D2
3 NaN NaN C3 D3

In our next article we will discussing about MERGE & JOINS in data-frames using few advanced examples. Stay tuned & keep learning!!

 

 

Leave a Reply

Back To Top
%d bloggers like this: