Pivot Tables and more

You can create spreadsheet-type pivot tables to help aggregate and present column data in different ways. Refer: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.pivot_table.html

You can also refer to the melt function that provides a similar functionality: https://pandas.pydata.org/docs/reference/api/pandas.melt.html

Here is an example of using some of these functions. Create a simple dataframe with rows and columns as shown below:


df = pd.DataFrame({
                'county': ['Wayne', 'Oakland', 'Wasthenaw', 'Kent'], '2020-09-01':[5, 11, \
                10, 5], '2020-09-02':[10, 5, 9, 15], '2020-09-03':[4, 8, 12, 16]
                })

df.head()

Output:

    county    2020-09-01    2020-09-02    2020-09-03
0    Wayne        5             10            4
1    Oakland        11              5            8
2    Wasthenaw    10              9            12
3    Kent        5              15        16

Using melt: Now using melt, you can convert the columns values into a date column as shown below


df2 = df.melt(id_vars=['county'], value_vars=['2020-09-01', '2020-09-02', '2020-09-03'], value_name='count', var_name='date')
df2.head()

Output:

    county        date        count
0    Wayne        2020-09-01    5
1    Oakland        2020-09-01    11
2    Wasthenaw    2020-09-01    10
3    Kent        2020-09-01    5
4    Wayne        2020-09-02    10

While the above statement has hardcoded column values for value_vars variable, you can make it dynamic so that the columns are picked up based on the index positions in the df.columns index, shown in the expression below



df2 = df.melt(id_vars=['county'], value_vars=df.columns[1:4], value_name='count', var_name='date')

And now you can get the county values as column values by using the pivot_table as shown below:


df3 = df2.pivot_table(index=['date'], columns=['county'])
df3.head()

Output:

county        Kent        Oakland        Wasthenaw   Wayne
date                
2020-09-01    5            11            10            5
2020-09-02    15            5            9            10
2020-09-03    16            8            12            4

Once we have in this format, we can now get a timeseries plot across all the dates in one statement as shown below:

ax = df3.plot()
ax.legend(bbox_to_anchor=(1.1, 1.05))

Notice the keyword arguments and the values that are used on the Matplotlib's AxesSubplot object that is returned from the plot() function

Example 2 using pivot_table


import pandas as pd

df = pd.read_csv('https://storage.googleapis.com/mbcc/datasets/pc_per_100.csv')

df.set_index('Country', inplace=True)
df2 = df.loc[['India', 'Algeria']]
df2


df2.pivot_table( columns=['Country']).plot()

results matching ""

    No results matching ""