The basics of Pandas library

Shivang Kainthola
4 min readAug 31, 2023

--

Pandas is a Python library used for carrying out data analysis and manipulation. For any data-related task, having a good knowledge of Pandas is quintessential.

It is in Pandas that we implement data wrangling, identifying possible conversion and changes required and prepare it for further usage.

In this article, I will cover the basic aspect of using Pandas, and go over the basic Python functions in the Pandas library.

Dataframe :

A dataframe is like a 2-dimensional array, which stores the data in a tabular format with rows signifying individual entries, and columns describing different attributes. When you read data using Pandas, it is stored as a dataframe.

Basic datatypes in a dataframe :

  1. Numeric
  • Integer (int64) — 8 bytes
  • Float (float64)

2. Character

  • category : a string variable containing only a few values (takes less space than a string)
  • object : this type covers a mix of numbers, strings as well as the ‘nan’ element used to denote empty

Using the Ecommerce Customers dataset (https://www.kaggle.com/datasets/srolka/ecommerce-customers), let us apply the basic functions in detail :

  1. Read data files

The most common format of data files are .xls, .csv, and .txt.

Let us read the dataset :

2. Getting a glimpse of data

The .head(n) and .tail(n) are useful to take a look at the first or last n rows of data.

3. Get the shape of dataframe

The shape of a dataframe is the number of rows and columns.

4. Get the index of the dataframe

dataframe.index

5. Get the columns of the dataframe

This returns a list of all the column names of the dataframe.

6. Check all the data types present in dataframe (.dtypes function)

7. To get features of numerical data in dataframe

For all the numerical attributes in the dataframe, pandas can conveniently provide the mean, min, max, standard deviation etc. using the describe() function.

This is useful in understanding the ranges of values of the data.

8. Get concise summary of dataframe

9. Look up data

At a specified row and column label ( .at() function)

At a specified row and column number ( .iat() function )

10. Get number of null values in dataframe

These are the very basic commands, to read the data and gain the fundamental insights from it.

In the follow up blogs, we will look into the operations regarding accessing, manipulating and formatting of data.

Hope this was useful, do subscribe!

--

--

No responses yet