Pandas DataFrame | rolling method
Start your free 7-days trial now!
Pandas DataFrame.rolling(~) method is used to compute statistics using moving windows. Note that a window is simply a sequence of values used to compute statistics like the mean.
Parameters
1. window | int or offset or BaseIndexer subclass
The size of the moving window.
When dealing with time-series, that is when the index of the source DataFrame is DatetimeIndex, offset represents a time interval of each window.
2. min_periods | int | optional
The minimum number of values in the window. If a window contains there are less than min_periods observations, then NaN is returned for the computed statistic of that window. The default value depends on the following:
if window is offset-based, then
min_periods=1.otherwise,
min_periods=window.
3. center | boolean | optional
If
True, then the observation is set to the center of the window.If
False, then the observation is set to the right of the window.
By default, center=False. Consult examples below for clarification.
4. win_type | string | optional
The type of the window (e.g. boxvar, triang). For more information, consult the official documentationopen_in_new.
5. on | string | optional
The label of the datetime-like column to use instead of DatetimeIndex, This is only relevant when dealing with time-series.
6. axis | int or string | optional
Whether to compute statistics for each column or each row. By default, axis=0, that is, the statistic is computed for each column.
7. closed | string | optional
Whether the endpoints are inclusive or exclusive:
Value | Description |
|---|---|
|
|
|
|
| Both endpoints are inclusive. |
| Both endpoints are exclusive. |
By default,
for offset-based windows,
closed="right".otherwise,
closed="both".
Return Value
A Window or Rolling object that will be used to compute some statistic.
Examples
Basic usage
Consider the following DataFrame:
df = pd.DataFrame({"A":[2,4,8,10],"B":[4,5,6,7]}, index=["a","b","c","d"])df
A Ba 2 4b 4 5c 8 6d 10 7
To compute the sum of values with a moving window of size 2:
df.rolling(window=2).sum()
A Ba NaN NaNb 6.0 9.0c 12.0 11.0d 18.0 13.0
Here, note the following:
since
axis=0(default), we are computing the statistic (sum) down each column.window=2means that the sum is computed using two consecutive observations:we get
6.0in the first column because2+4=6.we get
12.0because4+8=12.we get
18.0because8+10=18.
we get
NaNfor the first row becausemin_periodsis equal to what we specify forwindowfor cases like this when the window is not offset-based. This means that the minimum number of observations required to compute the statistic is2, but for the very first row, we only have one number in the window soNaNis returned.
Specifying center
Consider the following DataFrame:
df = pd.DataFrame({"A":[2,4,8,10]}, index=["a","b","c","d"])df
Aa 2b 4c 8d 10
By default, center=False, which means that the window will not be centered around an observation:
df.rolling(window=3, min_periods=0).sum() # center=False
Aa 2.0b 6.0c 14.0d 22.0
Here, the numbers are computed like so:
A[a]: 2 = 2A[b]: 2 + 4 = 6 # the observation is 4 (see how 4 is right-aligned)A[c]: 2 + 4 + 8 = 14 # the observation is 8A[d]: 4 + 8 + 10 = 22 # the observation is 10
Compare this with the output of center=True:
df.rolling(window=3, min_periods=0, center=True).sum()
Aa 6.0b 14.0c 22.0d 18.0
Here, the numbers are computed like so:
A[a]: 2 + 4 = 6A[b]: 2 + 4 + 8 = 14 # the observation is 4 (see how 4 is centered here)A[c]: 4 + 8 + 10 = 22 # the observation is 8A[d]: 8 + 10 = 18
Time-series case
Consider the following time-series DataFrame:
idx = [pd.Timestamp('20201220 15:00:00'), pd.Timestamp('20201220 15:00:01'), pd.Timestamp('20201220 15:00:02'), pd.Timestamp('20201220 15:00:04'), pd.Timestamp('20201220 15:00:05')]df = pd.DataFrame({"A":[1,10,100,1000,10000]}, index=idx)df
A2020-12-20 15:00:00 12020-12-20 15:00:01 102020-12-20 15:00:02 1002020-12-20 15:00:04 10002020-12-20 15:00:05 10000
Summing a window with a period of 2 seconds:
df.rolling(window="2S").sum()
A2020-12-20 15:00:00 1.02020-12-20 15:00:01 11.02020-12-20 15:00:02 110.02020-12-20 15:00:04 1000.02020-12-20 15:00:05 11000.0
Note that since window is offset-based, the min_periods=1 by default.
You can specify the closed parameter to indicate whether the endpoints should be inclusive/exclusive:
df.rolling(window="2S", closed="both").sum() # both endpoints are inclusive
A2020-12-20 15:00:00 1.02020-12-20 15:00:01 11.02020-12-20 15:00:02 111.02020-12-20 15:00:04 1100.02020-12-20 15:00:05 11000.0