Additional Handy Functions
You can use a few functions which are applicable to both DataFrames and Series. Here is a run down;
iloc on DataFrame
For a DataFrame you can give both row and column index numbers to retrieve values.
Now let us understand how to retrieve values in specific row(s) and column(s) using the loc function in a DataFrame:
dataFrame.iloc[<row selection>, <column selection>] is used to select rows and columns by index number, in the order that they appear in the DataFrame. Row and column indexes start with 0 and span up to the length of the rows/columns - 1. Therefore, the last row index can be calculated by:
df.shape[0] -1
And to calculate the last column index, it would be
df.shape[1] -1
Get the entire first row
df = pd.DataFrame({
'marks': [70, 66, 100, 88], 'age': [29, 32, 31, 28], \
'sex': ['F', 'M', 'F', 'F'], 'name':['Jane', 'John', \
'Sally', 'Sandy'], 'ssn':['1234', '3456', '4567', '5678']
})
row = df.iloc[0] # Gets the first row values as a series
print(type(row))
print(row)
Output:
<class 'pandas.core.series.Series'> FirstName Jane SSN 1234 age 29 marks 70 sex F Name: 0, dtype: object
Related functions for both Series and DataFrames
- df.iloc[-1] - To get all the values in the last row.
Related functions for DataFrames Only
- df.iloc[:,0]- To get all the values in the first column.
- df.iloc[:,-1] - To get all the values in the last column.
More Examples
Get the first row, second column value only
single_value = df.iloc[0, 2] # Gets the value as a scalar
print(type(single_value))
print(single_value)
Output:
<class 'numpy.int64'>
29
Get multiple row values of a single column
To get all the rows from 0 to 2
row_0_to_2_of_2nd_column = df.iloc[0:3, 1] # 3 index is excluded
print(type(row_0_to_2_of_2nd_column)) # Returned type is a Series
print(row_0_to_2_of_2nd_column)
Output:
<class 'pandas.core.series.Series'> 0 1234 1 3456 2 4567 Name: SSN, dtype: object
Get multiple rows and mutiple columns
row_0_2_and_column_0_3 = df.iloc[0:2, 0:3] # index number 2 and 3 are excluded
print(type(row_0_2_and_column_0_3))
print(row_0_2_and_column_0_3)
Output:
<class 'pandas.core.frame.DataFrame'> FirstName SSN age 0 Jane 1234 29 1 John 3456 32
Note
- iloc returns a scalar, Series or a DataFrame based on the results returned. If only a single value is returned it would be one of the basic data types of NumPy. If a collection of one type is returned then a Series object. If two dimensional results are returned then it would be a DataFrame
loc
Using loc you can retrieve the row or column values using the natural index numbers (similar to iloc) if it has no custom labels or custom labels when index is replaced with custom labels.
loc on DataFrame
Here are the various ways you can use loc on DataFrame
- Use the index number just like iloc to retrieve row values
df = pd.DataFrame({
'marks': [70, 66, 100, 88], 'age': [29, 32, 31, 28], \
'sex': ['F', 'M', 'F', 'F'], 'name':['Jane', 'John', \
'Sally', 'Sandy'], 'ssn':['1234', '3456', '4567', '5678']
})
row_values = df.loc[0]
print(type(row_values))
Output:
<class 'pandas.core.series.Series'>
Notice that the returned object is a Series. Now let us print the Series object values
Use multiple specific row indexes and column label to get values
row_column = df.loc[[1, 3], 'name'] print(row_column) # Series is returned
Output:
1 John 3 Sandy Name: name, dtype: object
Use multiple specific column labels with a single row index
row_column = df.loc[1, ['name', 'age']] print(row_column) # Series is returned
Output:
name John age 32 Name: 1, dtype: object
Use multiple specific row indexes and multiple column labels
row_column = df.loc[[1, 3], ['name','age']] print(row_column) # DataFrame is returned
Output:
<class 'pandas.core.frame.DataFrame'> name age 1 John 32 3 Sandy 28
loc will not work with index numbers if the natural index is replaced with a custom one. You can only use the custom labels in that case. In the below example, we replace the existing natural index with the 'ssn' column:
df.set_index('ssn', drop=True, inplace=True)
df.loc['1234']
print(row_values)
Output:
age 29 marks 70 name Jane sex F ssn 1234 Name: 0, dtype: object