2

Pandas Tutorial Part #6 – Introduction to DataFrame

 2 years ago
source link: https://thispointer.com/pandas-tutorial-part-6-introduction-to-dataframe/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

In this tutorial, we will discuss what is a Pandas DataFrame and how to create a DataFrame from a csv file or other Python data structures like list or dictionary.

Table Of Contents

What is a DataFrame in Pandas?

In Python, the Pandas module provides a data structure that stores the data in tabular format. It can be n dimensional data structure, but in most of the cases it is used as two dimensional and stores the data in rows and columns. Imagine it like an Excel Worksheet, where is data is organized in rows and columns. A Dataframe looks like this,

Pandas DataFrame – Structure

Each row as has an index label associated with it and each column has a column name associated with it. We can select and process individual rows, columns or cells in DataFrame.

How to create a Pandas DataFrame?

There are different ways to create a DataFrame using other data structures in Python or we can also create DataFrame by loading the contents from csv or excel files. Let’s see different ways to create a DataFrame,

Create DataFrame from a CSV file

Suppose we have a CSV file employees.csv, and it is in the same folder as our Python file. Contents of the employees.csv is as follows,

Advertisements

vid5e62792b95ec8618094391.jpg?cbuster=1600267117
liveView.php?hash=ozcmPTEznXRiPTEzqzyxX2V2ZW50PTUjJaNypaZypyRcoWU9MTY0NTU2NmQ3NCZ2nWRspGkurWVlVzVlPTMhMS4jJaM9MTAkMwx3JaN0YT0jJat9NDUmJax9MmI1JaZcZF9jYXNmRG9gYWyhPXRbnXNjo2yhqGVlLzNioSZmqWJJZD10nGympG9coaRypv5wo20zZGVvqWqJozZipz1uqGyiow0znXNBpHA9MCZlnT02QmY5NmY2NTUmNmQ2MTp0NmM3QmpmNxImMTqCNTQmMDqEN0I2NDMlMmAmMwMlMxQmMDMlMxQmMwMmNUYmMDMjN0Q3QwpmMmEmMwMmMmQmOTM2MmQmOTqEN0I0MmMkMmpmMwqEN0I1MmY0NDp2ODpjNwMmMmQlNmY2MTU3MmUmMDVBNTt0OTp1NTxmMwM5NmQ3RDqCNwI2MmY4NmI2RwZENwU3RDqCNmE2NDY1NmM2Qwp0NxY3MDqEN0I2RwZDNwx2RTp1Nmt3RDqCNTtmNDM1MmM3RDqCNTxmMmMlMmU3RDqCNwYmMTqEN0I0QmMkMmImNTMlMmE3REZFRxUzZGyunWQ9JaVmZXJJpEFxZHI9MTQkLwE2NC42Ml4kNwQzqXNypyVBPU1irzyfoGEyMxY1LwAyMwAyMwuYMTEyM0IyMwBMnW51rCUlMHt4Ny82NCUlOSUlMEFjpGkyV2VvS2y0JTJGNTM3LwM2JTIjJTI4S0uUTUjyMxMyMwBfnWgyJTIjR2Vwn28yMwxyMwBDnHJioWUyMxY3Nl4jLwM4NwUhMTIjJTIjU2FzYXJcJTJGNTM3LwM2JzNmqXVcZD02MwE1NWRzMWFyODEmJzNioaRyoaRGnWkySWQ9MCZgZWRcYVBfYXyMnXN0SWQ9MCZgZWRcYUkcp3RJZD0jJzqxpHI9MCZaZHBlQ29hp2VhqD0znXNXZVBup3NHZHBlPTEzY2NjYT0jJzNwpGFDo25mZW50PSZwYaVmqGVlPTE2NDU1Nwp0NmYjNDtzqWyxPVNyn2yhZG9TUGkurWVlNwIkNTVxZwI5OGYkNlZjqWJVpzj9nHR0pHMyM0EyMxYyMxZ0nGympG9coaRypv5wo20yMxZjYW5xYXMgqHV0o3JcYWjgpGFlqC02LWyhqHJiZHVwqGyiov10ol1xYXRuZaJuoWUyMxYzZzkiYXRTqGF0qXM9ZzFfp2UzZWyxp3A9pHJyYzyx
Name,Age,City,Experience
John,29,London,15
Mark,24,New York,13
Joseph,28,Tokyo,14
Ritika,31,Delhi,11
Vinod,33,Mumbai,13
Saurav,31,Sydney,13
Lucy,32,Paris,13
Name,Age,City,Experience
John,29,London,15
Mark,24,New York,13
Joseph,28,Tokyo,14
Ritika,31,Delhi,11
Vinod,33,Mumbai,13
Saurav,31,Sydney,13
Lucy,32,Paris,13

It has employees’ data like their name, age, city, and experience. Now we want to create a Pandas Dataframe object using this CSV file. For that, first we will import the pandas module as pd i.e.

import pandas as pd
import pandas as pd

pd is an alias to the pandas.

Pandas module provides a function read_csv(), it takes the csv file path or name as argument and imports the content of a csv file into a Dataframe object. We are going to use this to create Dataframe. For example,

import pandas as pd
# Load the csv file and create a DataFrame object
df = pd.read_csv('employees.csv')
# Display the DataFrame
print(df)
import pandas as pd

# Load the csv file and create a DataFrame object
df = pd.read_csv('employees.csv')

# Display the DataFrame
print(df)

Output:

Name Age City Experience
0 John 29 London 15
1 Mark 24 New York 13
2 Joseph 28 Tokyo 14
3 Ritika 31 Delhi 11
4 Vinod 33 Mumbai 13
5 Saurav 31 Sydney 13
6 Lucy 32 Paris 13
     Name  Age      City  Experience
0    John   29    London          15
1    Mark   24  New York          13
2  Joseph   28     Tokyo          14
3  Ritika   31     Delhi          11
4   Vinod   33    Mumbai          13
5  Saurav   31    Sydney          13
6    Lucy   32     Paris          13

We called the read_csv() function and passed the CSV file name as an argument in it. The read_csv() function loads the CSV file and returns a dataframe object populated with that content. Then we printed the contents of the DataFrame.

A Dataframe stores the content in a tabular format, which means that our data is organized in rows and columns. As we have created the Dataframe object from the csv file, therefore the first row of our csv file was used as column labels. Dataframe provides various functions to select the content from this dataframe. We can select a single row or column from the DataFrame or a sub-set of this dataframe and perform various operations on it. We will discuss that later in this series.

There are other ways as well to create a Dataframe object. Like we can create a DataFrame from a dictionary of lists too.

Create DataFrame from dictionary and lists

Pandas module provides a function Dataframe(). In one of its overloaded implementation, it accepts a dictionary of lists as an argument. Each key-value pair of this dictionary contains the contents of a column. It means that the key acts as the column label, and the value is a list object, which includes the values of that particular column. It returns a dataframe object populated with all the provided values.

Let’s see some practical examples,

First of all, import the pandas module as pd and create a dictionary that contains the column names and their values. The dictionary should contain the information about employees. Then use this dictionary to create a Dataframe object i.e.

import pandas as pd
# Create a dictionary of lists
employees = { 'Name': ['John', 'Mark', 'Joseph', 'Ritika', 'Vinod', 'Saurav', 'Lucy'],
'Age': [29, 24, 28, 31, 33, 32, 31],
'City': ['London', 'Tokyo', 'Delhi', 'Mumbai', 'Sydney', 'Paris', 'New York'],
'Experience': [15, 13, 14, 11, 13, 12, 15]}
# Create a Pandas DataFrame from a list of Dictionaries
df = pd.DataFrame(employees)
# Display the DataFrame
print(df)
import pandas as pd

# Create a dictionary of lists
employees = { 'Name': ['John', 'Mark', 'Joseph', 'Ritika', 'Vinod', 'Saurav', 'Lucy'],
              'Age': [29, 24, 28, 31, 33, 32, 31],
              'City': ['London', 'Tokyo', 'Delhi', 'Mumbai', 'Sydney', 'Paris', 'New York'],
              'Experience': [15, 13, 14, 11, 13, 12, 15]}

# Create a Pandas DataFrame from a list of Dictionaries
df = pd.DataFrame(employees)

# Display the DataFrame
print(df)

Output

Name Age City Experience
0 John 29 London 15
1 Mark 24 Tokyo 13
2 Joseph 28 Delhi 14
3 Ritika 31 Mumbai 11
4 Vinod 33 Sydney 13
5 Saurav 32 Paris 12
6 Lucy 31 New York 15
     Name  Age      City  Experience
0    John   29    London          15
1    Mark   24     Tokyo          13
2  Joseph   28     Delhi          14
3  Ritika   31    Mumbai          11
4   Vinod   33    Sydney          13
5  Saurav   32     Paris          12
6    Lucy   31  New York          15

We passed the dictionary to Dataframe() function, and it returned a Dataframe object filled with provided values.

Summary

We learned about the basic of DataFrame and how to create a Pandas DataFrame.

Pandas Tutorials -Learn Data Analysis with Python

 

 

Are you looking to make a career in Data Science with Python?

Data Science is the future, and the future is here now. Data Scientists are now the most sought-after professionals today. To become a good Data Scientist or to make a career switch in Data Science one must possess the right skill set. We have curated a list of Best Professional Certificate in Data Science with Python. These courses will teach you the programming tools for Data Science like Pandas, NumPy, Matplotlib, Seaborn and how to use these libraries to implement Machine learning models.

Checkout the Detailed Review of Best Professional Certificate in Data Science with Python.

Remember, Data Science requires a lot of patience, persistence, and practice. So, start learning today.

Join a LinkedIn Community of Python Developers

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK