Dataframes are defined as multidimensional arrays often distributed in the form of rows and columns with row labels and column labels respectively, each column defined as Series. A dataframe can contain heterogeneous data – each Series can have data type different from other Series. A dataframe can have missing data, duplicates, garbage values etc and pandas help us in munging this data.

Introduction to Pandas – Importing pandas

Now, let’s learn how we can call pandas and what operations can be done by it.
In order to import pandas, we use the following command:

import pandas

Introduction to Pandas – Checking version

If you want to know the version of the pandas package, then the following command will get you that:

pandas.__version__

'0.25.3'

Also, you’ll see we will be calling pandas a lot of times in the code and typing full package name can be tedious, so in order to combat this, we use alias. We define alias pd for package pandas in the following way:

import pandas as pd
import numpy as np

Fetching documentation and namespace

In order to display the built-in documentation of pandas package, the following command will help:

pd?

And to know about the list of built-in functions in pandas namespace, the following command will do the job:

#pd.
pd.melt

In [43]:

df = pd.Series([0, 1, 2, 3, 4])
df

0    0
1    1
2    2
3    3
4    4
dtype: int64

If we want to fetch the values in a series, we use following command:

df.values

array([0, 1, 2, 3, 4])

In order to access one particular value of df at a particular index, i.e., value at 2nd index, the following command will do the job:

df[1]

1

To extract a part of series, i.e., all values from 1st index to 3rd index, use the following command:

df[1:4]

1    1
2    2
3    3
dtype: int64

The last value, i.e., the 4th index got excluded and 1 was included, i.e., in 1:4, 1 is included and 4 is excluded. We are extracting all values >=1 and <4. If we want to fetch value at 4th index also, we will need to use following command:

df[1:5] #or
df[1:]

1    1
2    2
3    3
4    4
dtype: int64

Dataframe from series

Now, let’s talk about how to create dataframe from two or more series

#Dictionary having keys as houses and values as no of rooms
dict1 = {"House A": 2, "House B": 3, "House C": 4}   

#Converting dictionary to series
NoOfRooms = pd.Series(dict1)   
NoOfRooms

House A    2
House B    3
House C    4
dtype: int64

dict2 = {"House A": 5000000, "House B": 6000000, "House C": 7000000}
PriceInRupees = pd.Series(dict2)
PriceInRupees

House A    5000000
House B    6000000
House C    7000000
dtype: int64

HousesData = pd.DataFrame({"No of rooms": NoOfRooms, "Price in Rupees": PriceInRupees})
HousesData

Here, we formed two dictionaries, then converting each of them to a pandas series and then made a dictionary of both these series and then converted that dictionary into a dataframe. To know the names of indices and columns, we use following commands:

HousesData.index

Index(['House A', 'House B', 'House C'], dtype='object')

HousesData.columns

Index(['No of rooms', 'Price in Rupees'], dtype='object')

Accessing rows and columns

Accessing a particular column/row can be done in the following way:

HousesData['No of rooms']

House A    2
House B    3
House C    4
Name: No of rooms, dtype: int64

#loc helps us in fetching the row corresponding to the row name mentioned in the square brackets
HousesData.loc['House A']

No of rooms              2
Price in Rupees    5000000
Name: House A, dtype: int64

#If you want to access the multiple rows by means of row names, we pass the list of row names
HousesData.loc[['House A', 'House B']]

#iloc is used when we want to access the rows by means of index numbers
HousesData.iloc[1:]

Preparing a dataframe with a single series can be done as follows:

pd.DataFrame(NoOfRooms, columns = ['No of rooms'])

Dataframe from array

Dataframe from a numpy array can be made as follows:

onesArray = np.ones(3, dtype = [('X', 'i8'), ('Y', 'f8')])
onesArray

array([(1, 1.), (1, 1.), (1, 1.)], dtype=[('X', '<i8'), ('Y', '<f8')])

pd.DataFrame(onesArray)

Link to Introduction to Pandas video tutorial

Here is the link to the YouTube video for this blog post Introduction to Pandas.

So guys, with this we conclude this tutorial on pandas. In the next tutorial, we will be learning how to handle missing values using python (Pandas & Numpy).

Introduction to Pandas

Introduction to Pandas

Dataframe and series

Introduction to Pandas – Importing pandas

Introduction to Pandas – Checking version

Fetching documentation and namespace

Dataframe from series

Accessing rows and columns

Dataframe from array

Link to Introduction to Pandas video tutorial

Like this:

Related

One thought on “Introduction to Pandas”

Leave a ReplyCancel reply

Help Stray Dogs! Donate Now!!

	X	Y
0	1	1.0
1	1	1.0
2	1	1.0

Introduction to Pandas

Dataframe and series

Introduction to Pandas – Importing pandas

Introduction to Pandas – Checking version

Fetching documentation and namespace

Dataframe from series

Accessing rows and columns

Dataframe from array

Link to Introduction to Pandas video tutorial

Share this post:

Like this:

Related

One thought on “Introduction to Pandas”

Leave a ReplyCancel reply

Related Posts

Discover more from Machine Learning For Analytics