# Introduction to Pandas

Hi ML Enthusiasts! Today, we will be learning about one of the most popular and power package of Python, Pandas and its usage in the world of data science.

### Dataframe and series

The package pandas has been built on top of numpy and provides an efficient tool to manipulate dataframe.

Dataframes are defined as multidimensional arrays often distributed in the form of rows and columns with row labels and column labels respectively, each column defined as Series. A dataframe can contain heterogeneous data – each Series can have data type different from other Series. A dataframe can have missing data, duplicates, garbage values etc and pandas help us in munging this data.

### Introduction to Pandas – Importing pandas

Now, let’s learn how we can call pandas and what operations can be done by it.

In order to import pandas, we use the following command:

```
import pandas
```

### Introduction to Pandas – Checking version

If you want to know the version of the pandas package, then the following command will get you that:

```
pandas.__version__
```

Also, you’ll see we will be calling pandas a lot of times in the code and typing full package name can be tedious, so in order to combat this, we use alias. We define alias pd for package pandas in the following way:

```
import pandas as pd
import numpy as np
```

### Fetching documentation and namespace

In order to display the built-in documentation of pandas package, the following command will help:

```
pd?
```

And to know about the list of built-in functions in pandas namespace, the following command will do the job:

```
#pd.
pd.melt
```

In [43]:

```
df = pd.Series([0, 1, 2, 3, 4])
df
```

If we want to fetch the values in a series, we use following command:

```
df.values
```

In order to access one particular value of df at a particular index, i.e., value at 2nd index, the following command will do the job:

```
df[1]
```

To extract a part of series, i.e., all values from 1st index to 3rd index, use the following command:

```
df[1:4]
```

The last value, i.e., the 4th index got excluded and 1 was included, i.e., in 1:4, 1 is included and 4 is excluded. We are extracting all values >=1 and <4. If we want to fetch value at 4th index also, we will need to use following command:

```
df[1:5] #or
df[1:]
```

### Dataframe from series

Now, let’s talk about how to create dataframe from two or more series

```
#Dictionary having keys as houses and values as no of rooms
dict1 = {"House A": 2, "House B": 3, "House C": 4}
#Converting dictionary to series
NoOfRooms = pd.Series(dict1)
NoOfRooms
```

```
dict2 = {"House A": 5000000, "House B": 6000000, "House C": 7000000}
PriceInRupees = pd.Series(dict2)
PriceInRupees
```

```
HousesData = pd.DataFrame({"No of rooms": NoOfRooms, "Price in Rupees": PriceInRupees})
HousesData
```

Here, we formed two dictionaries, then converting each of them to a pandas series and then made a dictionary of both these series and then converted that dictionary into a dataframe. To know the names of indices and columns, we use following commands:

```
HousesData.index
```

```
HousesData.columns
```

### Accessing rows and columns

Accessing a particular column/row can be done in the following way:

```
HousesData['No of rooms']
```

```
#loc helps us in fetching the row corresponding to the row name mentioned in the square brackets
HousesData.loc['House A']
```

```
#If you want to access the multiple rows by means of row names, we pass the list of row names
HousesData.loc[['House A', 'House B']]
```

```
#iloc is used when we want to access the rows by means of index numbers
HousesData.iloc[1:]
```

Preparing a dataframe with a single series can be done as follows:

```
pd.DataFrame(NoOfRooms, columns = ['No of rooms'])
```

### Dataframe from array

Dataframe from a numpy array can be made as follows:

```
onesArray = np.ones(3, dtype = [('X', 'i8'), ('Y', 'f8')])
onesArray
```

```
pd.DataFrame(onesArray)
```

### Link to Introduction to Pandas video tutorial

Here is the link to the YouTube video for this blog post **Introduction to Pandas.**

So guys, with this we conclude this tutorial on pandas. In the next tutorial, we will be learning how to handle missing values using python (Pandas & Numpy).

## 2 thoughts on “Introduction to Pandas”