Rollup functions¶
riweather.rollup_starting
¶
Roll up data, labelled with the period start.
Parameters:
-
data
(Series | DataFrame
) –Time series data with a datetime index
-
period
(str
, default:'h'
) –Period to resample to. Defaults to
"h"
, which is hourly. -
upsample_first
(bool
, default:True
) –Perform minute-level upsampling prior to calculating the period average.
Returns:
Examples:
>>> import pandas as pd
>>> t = pd.Series(
... [1, 2, 10],
... index=pd.date_range(
... "2023-01-01 00:01",
... "2023-01-01 01:05",
... freq="32min",
... ),
... )
>>> t
2023-01-01 00:01:00 1
2023-01-01 00:33:00 2
2023-01-01 01:05:00 10
Freq: 32min, dtype: int64
By default, the data are upsampled to minute-level before aggregation.
>>> rollup_starting(t)
2023-01-01 00:00:00 3.207627
2023-01-01 01:00:00 9.375000
Freq: h, dtype: float64
The above is equivalent to upsampling and then aggregating with Pandas resample()
:
>>> x = upsample(t, period="min")
>>> x
2023-01-01 00:01:00 1.00000
2023-01-01 00:02:00 1.03125
2023-01-01 00:03:00 1.06250
2023-01-01 00:04:00 1.09375
2023-01-01 00:05:00 1.12500
...
2023-01-01 01:01:00 9.00000
2023-01-01 01:02:00 9.25000
2023-01-01 01:03:00 9.50000
2023-01-01 01:04:00 9.75000
2023-01-01 01:05:00 10.00000
Freq: min, Length: 65, dtype: float64
>>> x.resample("h").mean()
2023-01-01 00:00:00 3.207627
2023-01-01 01:00:00 9.375000
Freq: h, dtype: float64
To skip upsampling and aggregate raw values only, use upsample_first=False
.
>>> rollup_starting(t, upsample_first=False)
2023-01-01 00:00:00 1.5
2023-01-01 01:00:00 10.0
Freq: h, dtype: float64
Source code in src/riweather/stations.py
riweather.rollup_ending
¶
Roll up data, labelled with the period end.
Parameters:
-
data
(Series | DataFrame
) –Time series data with a datetime index
-
period
(str
, default:'h'
) –Period to resample to. Defaults to
"h"
, which is hourly. -
upsample_first
(bool
, default:True
) –Perform minute-level upsampling prior to calculating the period average.
Returns:
Examples:
>>> import pandas as pd
>>> t = pd.Series(
... [1, 2, 10],
... index=pd.date_range(
... "2023-01-01 00:01",
... "2023-01-01 01:05",
... freq="32min",
... ),
... )
>>> t
2023-01-01 00:01:00 1
2023-01-01 00:33:00 2
2023-01-01 01:05:00 10
Freq: 32min, dtype: int64
By default, the data are upsampled to minute-level before aggregation.
The above is equivalent to upsampling and then aggregating with Pandas resample()
:
>>> x = upsample(t, period="min")
>>> x
2023-01-01 00:01:00 1.00000
2023-01-01 00:02:00 1.03125
2023-01-01 00:03:00 1.06250
2023-01-01 00:04:00 1.09375
2023-01-01 00:05:00 1.12500
...
2023-01-01 01:01:00 9.00000
2023-01-01 01:02:00 9.25000
2023-01-01 01:03:00 9.50000
2023-01-01 01:04:00 9.75000
2023-01-01 01:05:00 10.00000
Freq: min, Length: 65, dtype: float64
>>> x.resample("h", label="right", closed="right").mean()
2023-01-01 01:00:00 3.3
2023-01-01 02:00:00 9.5
Freq: h, dtype: float64
To skip upsampling and aggregate raw values only, use upsample_first=False
.
>>> rollup_ending(t, upsample_first=False)
2023-01-01 01:00:00 1.5
2023-01-01 02:00:00 10.0
Freq: h, dtype: float64
Source code in src/riweather/stations.py
riweather.rollup_midpoint
¶
Roll up data, labelled with the period midpoint.
Parameters:
-
data
(Series | DataFrame
) –Time series data with a datetime index
-
period
(str
, default:'h'
) –Period to resample to. Defaults to
"h"
, which is hourly. -
upsample_first
(bool
, default:True
) –Perform minute-level upsampling prior to calculating the period average.
Returns:
Examples:
>>> import pandas as pd
>>> t = pd.Series(
... [1, 2, 10],
... index=pd.date_range(
... "2023-01-01 00:01",
... "2023-01-01 01:05",
... freq="32min",
... ),
... )
>>> t
2023-01-01 00:01:00 1
2023-01-01 00:33:00 2
2023-01-01 01:05:00 10
Freq: 32min, dtype: int64
By default, the data are upsampled to minute-level before aggregation.
>>> rollup_midpoint(t)
2023-01-01 00:00:00 1.437500
2023-01-01 01:00:00 5.661458
Freq: h, dtype: float64
The above is equivalent to upsampling, shifting the data forward
by half of the period, and then aggregating with Pandas resample()
:
>>> x = upsample(t, period="min")
>>> x
2023-01-01 00:01:00 1.00000
2023-01-01 00:02:00 1.03125
2023-01-01 00:03:00 1.06250
2023-01-01 00:04:00 1.09375
2023-01-01 00:05:00 1.12500
...
2023-01-01 01:01:00 9.00000
2023-01-01 01:02:00 9.25000
2023-01-01 01:03:00 9.50000
2023-01-01 01:04:00 9.75000
2023-01-01 01:05:00 10.00000
Freq: min, Length: 65, dtype: float64
>>> x.shift(freq="30min").resample("h").mean()
2023-01-01 00:00:00 1.437500
2023-01-01 01:00:00 5.661458
Freq: h, dtype: float64
To skip upsampling and aggregate raw values only, use upsample_first=False
.
>>> rollup_midpoint(t, upsample_first=False)
2023-01-01 00:00:00 1.0
2023-01-01 01:00:00 6.0
Freq: h, dtype: float64
Source code in src/riweather/stations.py
riweather.rollup_instant
¶
Roll up data, labelled with interpolated values.
Parameters:
-
data
(Series | DataFrame
) –Time series data with a datetime index
-
period
(str
, default:'h'
) –Period to resample to. Defaults to
h
, which is hourly. -
upsample_first
(bool
, default:True
) –Perform minute-level upsampling prior to returning a value.
Returns:
Examples:
>>> import pandas as pd
>>> t = pd.Series(
... [1, 2, 10],
... index=pd.date_range(
... "2023-01-01 00:01",
... "2023-01-01 01:05",
... freq="32min",
... ),
... )
>>> t
2023-01-01 00:01:00 1
2023-01-01 00:33:00 2
2023-01-01 01:05:00 10
Freq: 32min, dtype: int64
By default, the data are upsampled to minute-level before aggregation.
The above is equivalent to upsampling and then aggregating with Pandas resample()
,
but instead of taking the mean over each time period, taking the first value:
>>> x = upsample(t, period="min")
>>> x
2023-01-01 00:01:00 1.00000
2023-01-01 00:02:00 1.03125
2023-01-01 00:03:00 1.06250
2023-01-01 00:04:00 1.09375
2023-01-01 00:05:00 1.12500
...
2023-01-01 01:01:00 9.00000
2023-01-01 01:02:00 9.25000
2023-01-01 01:03:00 9.50000
2023-01-01 01:04:00 9.75000
2023-01-01 01:05:00 10.00000
Freq: min, Length: 65, dtype: float64
>>> x.resample("h").first()
2023-01-01 00:00:00 1.00
2023-01-01 01:00:00 8.75
Freq: h, dtype: float64
To skip upsampling and aggregate raw values only, use upsample_first=False
. Notice that
this is simply the first value in every time period (hour by default).
>>> rollup_instant(t, upsample_first=False)
2023-01-01 00:00:00 1
2023-01-01 01:00:00 10
Freq: h, dtype: int64
Source code in src/riweather/stations.py
riweather.upsample
¶
Upsample and interpolate time series data.
Parameters:
-
data
(Series | DataFrame
) –Time series data with a datetime index
-
period
(str
, default:'min'
) –Period to upsample to. Defaults to
"min"
, which is minute-level
Returns:
Examples:
>>> import pandas as pd
>>> t = pd.Series(
... [1, 2, 10],
... index=pd.date_range(
... "2023-01-01 00:01",
... "2023-01-01 01:05",
... freq="32min",
... ),
... )
>>> upsample(t)
2023-01-01 00:01:00 1.00000
2023-01-01 00:02:00 1.03125
2023-01-01 00:03:00 1.06250
2023-01-01 00:04:00 1.09375
2023-01-01 00:05:00 1.12500
...
2023-01-01 01:01:00 9.00000
2023-01-01 01:02:00 9.25000
2023-01-01 01:03:00 9.50000
2023-01-01 01:04:00 9.75000
2023-01-01 01:05:00 10.00000
Freq: min, Length: 65, dtype: float64
You can upsample to a different frequency if you want.
>>> upsample(t, period="5min")
2023-01-01 00:00:00 1.000000
2023-01-01 00:05:00 1.166667
2023-01-01 00:10:00 1.333333
2023-01-01 00:15:00 1.500000
2023-01-01 00:20:00 1.666667
2023-01-01 00:25:00 1.833333
2023-01-01 00:30:00 2.000000
2023-01-01 00:35:00 3.142857
2023-01-01 00:40:00 4.285714
2023-01-01 00:45:00 5.428571
2023-01-01 00:50:00 6.571429
2023-01-01 00:55:00 7.714286
2023-01-01 01:00:00 8.857143
2023-01-01 01:05:00 10.000000
Freq: 5min, dtype: float64