Concat & Append Dataframes in Pandas
Hi Enthusiastic Learners! While working with dataframes we do come across a problem of combining multiple dataframes together or you can say “Concat & Append Dataframes”. In this article we will be learning different conditions of combining data-sets or concat & append dataframes in different manners. Also we will learn difference between concat & append dataframes and when to use which function.
To learn how you can handle missing values in Pandas, checkout these 2 articles:
Concat Dataframes
Before we jump on to the function pd.concat()
, let’s first create few sample dataframes.
import pandas as pd
import numpy as np
df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']})
print("Dataframe_1")
print(df1)
df2 = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']})
print("Dataframe_2")
print(df2)
Syntax of concat()
function is as follows:
pd.concat(objs, axis=0, join='outer', join_axes=None, ignore_index=False,
keys=None, levels=None, names=None, verify_integrity=False,
copy=True)
Let’s try simply calling pd.concat()
on both dataframes df1
& df2
pd.concat([df1, df2], sort=True)
Now, what happened over here is, both 1st & 2nd dataframe got joined next to each other, forming 4 columns.
We have got NaN
values because in the definition of function above you can see default value for join is OUTER
, so concat() function combined both dataframes while doing outer join
Now, let’s set value of JOIN to INNER
Other than that, even the index values haven’t been reset. to reset them we call .reset_index()
function
pd.concat([df1, df2], join='inner')
We, didn’t get any result because none of the datafrmaes had anything common.
Let’s create a 3rd frame with same column names & merge all 3 dataframes together.
df3 = pd.DataFrame({'A': ['A4', 'A5', 'A6', 'A7'],
'D': ['D4', 'D5', 'D6', 'D7']})
df3
print("Merging all 3 dataframes!!")
pd.concat([df1, df2, df3], sort=True).reset_index()
As you can see 3rd dataframe’s data got added in one of the existing columns as its columns have had same names as that of existing columns.
Findings:
pd.concat()
combines dataframes in such a way that new column get next to each other, that is increase in number of columns.- By default, it applies OUTER Join on both data-sets.
- If same column names are provided, data gets added to same columns for all dataframes.
Append Dataframes
pd.append()
is similar to pd.concat()
. The output of append() function is same as one of the basic concatenation of dataframes.
That is –> df1.append(df1)
is same as pd.concat([df1,df2])
So, basically append() is a subset function of concat().
Only advantage over here is that we have to write a little lesser amount of code & it is bit faster than ‘concat()’
df1.append(df2, sort=True)
In our next article we will discussing about MERGE & JOINS in data-frames using few advanced examples. Stay tuned & keep learning!!