Recommand · October 14, 2021 0

How can I get the difference between rows in a group in pandas?

I have a dataframe like this:

import pandas as pd

df = pd.DataFrame({'group': [1, 1, 1, 2, 2, 3, 3, 3, 3], 'time': [12, 44, 55, 2, 7, 100, 105, 106, 200]})

#    group  time
# 0      1    12
# 1      1    44
# 2      1    55
# 3      2     2
# 4      2     7
# 5      3   100
# 6      3   105
# 7      3   106
# 8      3   200

And I want to get something like:

#    group  time diff_to_next_step
# 0      1    12  32     (44-12)
# 1      1    44  11     (55-44)
# 2      1    55  None
# 3      2     2  5      (7-2)
# 4      2     7  None
# 5      3   100  5      (105 - 100)
# 6      3   105  1      (106 - 105)
# 7      3   106  94     (200 - 106)
# 8      3   200  None

I think this should somehow work with group / stack / shift, but I don’t get it to work at the moment. Can somebody please help me?

Use diff + shift

import pandas as pd

df = pd.DataFrame({'group': [1, 1, 1, 2, 2, 3, 3, 3, 3], 'time': [12, 44, 55, 2, 7, 100, 105, 106, 200]})

df["diff"] = df.groupby("group").transform("diff").shift(-1)
print(df)

Output

   group  time  diff
0      1    12  32.0
1      1    44  11.0
2      1    55   NaN
3      2     2   5.0
4      2     7   NaN
5      3   100   5.0
6      3   105   1.0
7      3   106  94.0
8      3   200   NaN

Try with shift

df['new'] = df.groupby('group')['time'].shift(-1) - df.time