Additional Handy Functions

You can use a few functions which are applicable to both DataFrames and Series. Here is a run down;

iloc on DataFrame

For a DataFrame you can give both row and column index numbers to retrieve values.

Now let us understand how to retrieve values in specific row(s) and column(s) using the loc function in a DataFrame:

dataFrame.iloc[<row selection>, <column selection>] is used to select rows and columns by index number, in the order that they appear in the DataFrame. Row and column indexes start with 0 and span up to the length of the rows/columns - 1. Therefore, the last row index can be calculated by:

df.shape[0] -1

And to calculate the last column index, it would be

df.shape[1] -1

Get the entire first row



df = pd.DataFrame({
                'marks': [70, 66, 100, 88], 'age': [29, 32, 31, 28], \
                'sex': ['F', 'M', 'F', 'F'], 'name':['Jane', 'John', \
                'Sally', 'Sandy'], 'ssn':['1234', '3456', '4567', '5678']
                })

row = df.iloc[0]  # Gets the first row values as a series
print(type(row))
print(row)

Output:

<class 'pandas.core.series.Series'>
FirstName    Jane
SSN          1234
age            29
marks          70
sex             F
Name: 0, dtype: object

 

  • df.iloc[-1] - To get all the values in the last row.

 

  • df.iloc[:,0]- To get all the values in the first column.
  • df.iloc[:,-1] - To get all the values in the last column.

More Examples

Get the first row, second column value only

single_value = df.iloc[0, 2]  # Gets the value as a scalar
print(type(single_value))  
print(single_value)

Output:

<class 'numpy.int64'>
29

Get multiple row values of a single column

To get all the rows from 0 to 2

row_0_to_2_of_2nd_column = df.iloc[0:3, 1]  # 3 index is excluded
print(type(row_0_to_2_of_2nd_column))  # Returned type is a Series
print(row_0_to_2_of_2nd_column)

Output:

<class 'pandas.core.series.Series'>
0    1234
1    3456
2    4567
Name: SSN, dtype: object

Get multiple rows and mutiple columns

row_0_2_and_column_0_3 = df.iloc[0:2, 0:3]  # index number 2 and 3 are excluded
print(type(row_0_2_and_column_0_3))
print(row_0_2_and_column_0_3)

Output:

<class 'pandas.core.frame.DataFrame'>
  FirstName   SSN  age
0      Jane  1234   29
1      John  3456   32

Note

  • iloc returns a scalar, Series or a DataFrame based on the results returned. If only a single value is returned it would be one of the basic data types of NumPy. If a collection of one type is returned then a Series object. If two dimensional results are returned then it would be a DataFrame

loc

Using loc you can retrieve the row or column values using the natural index numbers (similar to iloc) if it has no custom labels or custom labels when index is replaced with custom labels.

loc on DataFrame

Here are the various ways you can use loc on DataFrame

  1. Use the index number just like iloc to retrieve row values
df = pd.DataFrame({
                'marks': [70, 66, 100, 88], 'age': [29, 32, 31, 28], \
                'sex': ['F', 'M', 'F', 'F'], 'name':['Jane', 'John', \
                'Sally', 'Sandy'], 'ssn':['1234', '3456', '4567', '5678']
                })

row_values = df.loc[0]
print(type(row_values))

Output:

<class 'pandas.core.series.Series'>

Notice that the returned object is a Series. Now let us print the Series object values

  1. Use multiple specific row indexes and column label to get values

    row_column = df.loc[[1, 3], 'name']
    print(row_column)  # Series is returned
    

    Output:

    1     John
    3    Sandy
    Name: name, dtype: object
    
  2. Use multiple specific column labels with a single row index

    row_column = df.loc[1, ['name', 'age']]
    print(row_column)  # Series is returned
    

    Output:

    name    John
    age       32
    Name: 1, dtype: object
    
  3. Use multiple specific row indexes and multiple column labels

    row_column = df.loc[[1, 3], ['name','age']]
    print(row_column)  # DataFrame is returned
    

    Output:

    <class 'pandas.core.frame.DataFrame'>
     name  age
    1   John   32
    3  Sandy   28
    

loc will not work with index numbers if the natural index is replaced with a custom one. You can only use the custom labels in that case. In the below example, we replace the existing natural index with the 'ssn' column:


df.set_index('ssn', drop=True, inplace=True)
df.loc['1234']
print(row_values)

Output:

age        29
marks      70
name     Jane
sex         F
ssn      1234
Name: 0, dtype: object

results matching ""

    No results matching ""