Pandas Indexing 🐼

Pandas is a powerful library in Python that facilitates data manipulation and analysis. A fundamental concept in Pandas is indexing, which is used to select specific rows or columns from a DataFrame.

Indexing Methods

There are several ways to index a DataFrame in Pandas:

.loc: This is a label-based indexing method, allowing you to select rows or columns by their labels. It can be used with DataFrame and Series and supports boolean conditions.
.iloc: This is an integer-based indexing method, enabling selection by integer position. It is applicable to both DataFrame and Series.
[] operator: A shorthand method for indexing, allowing selection by labels or integer positions. It also supports boolean indexing.
.at: Used for scalar value retrieval, it is faster than .loc for accessing a single value.
.iat: Similar to .at, but uses integer indexing instead of labels.

Examples

Selecting a single column by label: df.loc[:, 'column_name']
Selecting multiple columns by label: df.loc[:, ['column_1', 'column_2']]
Selecting a single row by integer position: df.iloc[0]
Selecting multiple rows by integer position: df.iloc[0:5]
Selecting a single element by label: df.loc['row_label', 'column_label']
Selecting a single element by integer position: df.iloc[0, 0]

Pandas uses zero-based indexing, so the first row and column have an index of 0.

Test DataFrame

Here is an example of a small DataFrame to test these concepts:

import pandas as pd

data = {
    'name': ['John', 'Mike', 'Sara', 'Kate', 'Bob'],
    'age': [35, 28, 31, 22, 45],
    'city': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix']
}

df = pd.DataFrame(data, columns=['name', 'age', 'city'])

This DataFrame has three columns: 'name', 'age', and 'city', and five rows. You can test the different indexing methods using the following commands:

Selecting a single column by label: print(df.loc[:, 'name'])
Selecting multiple columns by label: print(df.loc[:, ['name', 'age']])
Selecting a single row by integer position: print(df.iloc[0])
Selecting multiple rows by integer position: print(df.iloc[0:3])
Selecting a single element by label: print(df.at[0, 'name'])
Selecting a single element by integer position: print(df.iat[0, 0])

Experiment with different indices to see how the results change.

Advanced Indexing Techniques

In addition to these basic methods, Pandas offers advanced indexing techniques:

Boolean indexing: Select rows that meet certain conditions, e.g., df[df['age'] > 30].
.query() method: Filter DataFrames using a query string, similar to SQL queries, useful for multiple conditions.
.reindex() method: Reorder the rows or columns of a DataFrame based on a new index.
.set_index() method: Reset the index of a DataFrame to a column of your choice.

These examples cover the basics, but Pandas provides many more advanced indexing options to explore.

Pandas indexing

Pandas Indexing 🐼

Indexing Methods

Examples

Test DataFrame

Advanced Indexing Techniques

Comments

Python Bytes

Beginners Guide to Python: Day 1 Learning and Exploring IPython

More from this blog

Managing Python packages the right way and not suffer with "Pip environment pollution".

Beginners Guide to Python: Day 1 Learning and Exploring IPython

Mastering Postgres Transactions: A Deep Dive into BEGIN, SAVEPOINT, ROLLBACK, and COMMIT Commands

Command Palette

Pandas Indexing 🐼

Indexing Methods

Examples

Test DataFrame

Advanced Indexing Techniques

Comments

Python Bytes

Beginners Guide to Python: Day 1 Learning and Exploring IPython

More from this blog