Rollup functions¶

riweather.rollup_starting ¶

rollup_starting(data, period='h', *, upsample_first=True)

Roll up data, labelled with the period start.

Parameters:

data (Series | DataFrame) –

Time series data with a datetime index
period (str, default: 'h' ) –

Period to resample to. Defaults to "h", which is hourly.
upsample_first (bool, default: True ) –

Perform minute-level upsampling prior to calculating the period average.

Returns:

Series | DataFrame –

The time series rolled up to the specified period.

Examples:

>>> import pandas as pd
>>> t = pd.Series(
...     [1, 2, 10],
...     index=pd.date_range(
...         "2023-01-01 00:01",
...         "2023-01-01 01:05",
...         freq="32min",
...     ),
... )
>>> t
2023-01-01 00:01:00     1
2023-01-01 00:33:00     2
2023-01-01 01:05:00    10
Freq: 32min, dtype: int64

By default, the data are upsampled to minute-level before aggregation.

>>> rollup_starting(t)
2023-01-01 00:00:00    3.207627
2023-01-01 01:00:00    9.375000
Freq: h, dtype: float64

The above is equivalent to upsampling and then aggregating with Pandas resample():

>>> x = upsample(t, period="min")
>>> x
2023-01-01 00:01:00     1.00000
2023-01-01 00:02:00     1.03125
2023-01-01 00:03:00     1.06250
2023-01-01 00:04:00     1.09375
2023-01-01 00:05:00     1.12500
                         ...
2023-01-01 01:01:00     9.00000
2023-01-01 01:02:00     9.25000
2023-01-01 01:03:00     9.50000
2023-01-01 01:04:00     9.75000
2023-01-01 01:05:00    10.00000
Freq: min, Length: 65, dtype: float64
>>> x.resample("h").mean()
2023-01-01 00:00:00    3.207627
2023-01-01 01:00:00    9.375000
Freq: h, dtype: float64

To skip upsampling and aggregate raw values only, use upsample_first=False.

>>> rollup_starting(t, upsample_first=False)
2023-01-01 00:00:00     1.5
2023-01-01 01:00:00    10.0
Freq: h, dtype: float64

Source code in src/riweather/stations.py

def rollup_starting(
    data: pd.Series | pd.DataFrame, period: str = "h", *, upsample_first: bool = True
) -> pd.Series | pd.DataFrame:
    """Roll up data, labelled with the period start.

    Args:
        data: Time series data with a datetime index
        period: Period to resample to. Defaults to `"h"`, which is hourly.
        upsample_first: Perform minute-level upsampling prior to calculating
            the period average.

    Returns:
        The time series rolled up to the specified period.

    Examples:
        >>> import pandas as pd
        >>> t = pd.Series(
        ...     [1, 2, 10],
        ...     index=pd.date_range(
        ...         "2023-01-01 00:01",
        ...         "2023-01-01 01:05",
        ...         freq="32min",
        ...     ),
        ... )
        >>> t
        2023-01-01 00:01:00     1
        2023-01-01 00:33:00     2
        2023-01-01 01:05:00    10
        Freq: 32min, dtype: int64

        By default, the data are [upsampled][riweather.upsample] to minute-level before aggregation.

        >>> rollup_starting(t)
        2023-01-01 00:00:00    3.207627
        2023-01-01 01:00:00    9.375000
        Freq: h, dtype: float64

        The above is equivalent to upsampling and then aggregating with [Pandas `resample()`][pandas.Series.resample]:

        >>> x = upsample(t, period="min")
        >>> x
        2023-01-01 00:01:00     1.00000
        2023-01-01 00:02:00     1.03125
        2023-01-01 00:03:00     1.06250
        2023-01-01 00:04:00     1.09375
        2023-01-01 00:05:00     1.12500
                                 ...
        2023-01-01 01:01:00     9.00000
        2023-01-01 01:02:00     9.25000
        2023-01-01 01:03:00     9.50000
        2023-01-01 01:04:00     9.75000
        2023-01-01 01:05:00    10.00000
        Freq: min, Length: 65, dtype: float64
        >>> x.resample("h").mean()
        2023-01-01 00:00:00    3.207627
        2023-01-01 01:00:00    9.375000
        Freq: h, dtype: float64

        To skip upsampling and aggregate raw values only, use `upsample_first=False`.

        >>> rollup_starting(t, upsample_first=False)
        2023-01-01 00:00:00     1.5
        2023-01-01 01:00:00    10.0
        Freq: h, dtype: float64
    """
    if upsample_first:
        data = upsample(data, period="min")
    return data.resample(period, label="left", closed="left").mean()

riweather.rollup_ending ¶

rollup_ending(data, period='h', *, upsample_first=True)

Roll up data, labelled with the period end.

Parameters:

data (Series | DataFrame) –

Time series data with a datetime index
period (str, default: 'h' ) –

Period to resample to. Defaults to "h", which is hourly.
upsample_first (bool, default: True ) –

Perform minute-level upsampling prior to calculating the period average.

Returns:

Series | DataFrame –

The time series rolled up to the specified period.

Examples:

>>> import pandas as pd
>>> t = pd.Series(
...     [1, 2, 10],
...     index=pd.date_range(
...         "2023-01-01 00:01",
...         "2023-01-01 01:05",
...         freq="32min",
...     ),
... )
>>> t
2023-01-01 00:01:00     1
2023-01-01 00:33:00     2
2023-01-01 01:05:00    10
Freq: 32min, dtype: int64

By default, the data are upsampled to minute-level before aggregation.

>>> rollup_ending(t)
2023-01-01 01:00:00    3.3
2023-01-01 02:00:00    9.5
Freq: h, dtype: float64

The above is equivalent to upsampling and then aggregating with Pandas resample():

>>> x = upsample(t, period="min")
>>> x
2023-01-01 00:01:00     1.00000
2023-01-01 00:02:00     1.03125
2023-01-01 00:03:00     1.06250
2023-01-01 00:04:00     1.09375
2023-01-01 00:05:00     1.12500
                         ...
2023-01-01 01:01:00     9.00000
2023-01-01 01:02:00     9.25000
2023-01-01 01:03:00     9.50000
2023-01-01 01:04:00     9.75000
2023-01-01 01:05:00    10.00000
Freq: min, Length: 65, dtype: float64
>>> x.resample("h", label="right", closed="right").mean()
2023-01-01 01:00:00    3.3
2023-01-01 02:00:00    9.5
Freq: h, dtype: float64

To skip upsampling and aggregate raw values only, use upsample_first=False.

>>> rollup_ending(t, upsample_first=False)
2023-01-01 01:00:00     1.5
2023-01-01 02:00:00    10.0
Freq: h, dtype: float64

Source code in src/riweather/stations.py

def rollup_ending(
    data: pd.Series | pd.DataFrame, period: str = "h", *, upsample_first: bool = True
) -> pd.Series | pd.DataFrame:
    """Roll up data, labelled with the period end.

    Args:
        data: Time series data with a datetime index
        period: Period to resample to. Defaults to `"h"`, which is hourly.
        upsample_first: Perform minute-level upsampling prior to calculating
            the period average.

    Returns:
        The time series rolled up to the specified period.

    Examples:
        >>> import pandas as pd
        >>> t = pd.Series(
        ...     [1, 2, 10],
        ...     index=pd.date_range(
        ...         "2023-01-01 00:01",
        ...         "2023-01-01 01:05",
        ...         freq="32min",
        ...     ),
        ... )
        >>> t
        2023-01-01 00:01:00     1
        2023-01-01 00:33:00     2
        2023-01-01 01:05:00    10
        Freq: 32min, dtype: int64

        By default, the data are [upsampled][riweather.upsample] to minute-level before aggregation.

        >>> rollup_ending(t)
        2023-01-01 01:00:00    3.3
        2023-01-01 02:00:00    9.5
        Freq: h, dtype: float64

        The above is equivalent to upsampling and then aggregating with [Pandas `resample()`][pandas.Series.resample]:

        >>> x = upsample(t, period="min")
        >>> x
        2023-01-01 00:01:00     1.00000
        2023-01-01 00:02:00     1.03125
        2023-01-01 00:03:00     1.06250
        2023-01-01 00:04:00     1.09375
        2023-01-01 00:05:00     1.12500
                                 ...
        2023-01-01 01:01:00     9.00000
        2023-01-01 01:02:00     9.25000
        2023-01-01 01:03:00     9.50000
        2023-01-01 01:04:00     9.75000
        2023-01-01 01:05:00    10.00000
        Freq: min, Length: 65, dtype: float64
        >>> x.resample("h", label="right", closed="right").mean()
        2023-01-01 01:00:00    3.3
        2023-01-01 02:00:00    9.5
        Freq: h, dtype: float64

        To skip upsampling and aggregate raw values only, use `upsample_first=False`.

        >>> rollup_ending(t, upsample_first=False)
        2023-01-01 01:00:00     1.5
        2023-01-01 02:00:00    10.0
        Freq: h, dtype: float64
    """
    if upsample_first:
        data = upsample(data, period="min")
    return data.resample(period, label="right", closed="right").mean()

riweather.rollup_midpoint ¶

rollup_midpoint(data, period='h', *, upsample_first=True)

Roll up data, labelled with the period midpoint.

Parameters:

data (Series | DataFrame) –

Time series data with a datetime index
period (str, default: 'h' ) –

Period to resample to. Defaults to "h", which is hourly.
upsample_first (bool, default: True ) –

Perform minute-level upsampling prior to calculating the period average.

Returns:

Series | DataFrame –

The time series rolled up to the specified period.

Examples:

>>> import pandas as pd
>>> t = pd.Series(
...     [1, 2, 10],
...     index=pd.date_range(
...         "2023-01-01 00:01",
...         "2023-01-01 01:05",
...         freq="32min",
...     ),
... )
>>> t
2023-01-01 00:01:00     1
2023-01-01 00:33:00     2
2023-01-01 01:05:00    10
Freq: 32min, dtype: int64

By default, the data are upsampled to minute-level before aggregation.

>>> rollup_midpoint(t)
2023-01-01 00:00:00    1.437500
2023-01-01 01:00:00    5.661458
Freq: h, dtype: float64

The above is equivalent to upsampling, shifting the data forward by half of the period, and then aggregating with Pandas resample():

>>> x = upsample(t, period="min")
>>> x
2023-01-01 00:01:00     1.00000
2023-01-01 00:02:00     1.03125
2023-01-01 00:03:00     1.06250
2023-01-01 00:04:00     1.09375
2023-01-01 00:05:00     1.12500
                         ...
2023-01-01 01:01:00     9.00000
2023-01-01 01:02:00     9.25000
2023-01-01 01:03:00     9.50000
2023-01-01 01:04:00     9.75000
2023-01-01 01:05:00    10.00000
Freq: min, Length: 65, dtype: float64
>>> x.shift(freq="30min").resample("h").mean()
2023-01-01 00:00:00    1.437500
2023-01-01 01:00:00    5.661458
Freq: h, dtype: float64

To skip upsampling and aggregate raw values only, use upsample_first=False.

>>> rollup_midpoint(t, upsample_first=False)
2023-01-01 00:00:00    1.0
2023-01-01 01:00:00    6.0
Freq: h, dtype: float64

Source code in src/riweather/stations.py

def rollup_midpoint(
    data: pd.Series | pd.DataFrame, period: str = "h", *, upsample_first: bool = True
) -> pd.Series | pd.DataFrame:
    """Roll up data, labelled with the period midpoint.

    Args:
        data: Time series data with a datetime index
        period: Period to resample to. Defaults to `"h"`, which is hourly.
        upsample_first: Perform minute-level upsampling prior to calculating
            the period average.

    Returns:
        The time series rolled up to the specified period.

    Examples:
        >>> import pandas as pd
        >>> t = pd.Series(
        ...     [1, 2, 10],
        ...     index=pd.date_range(
        ...         "2023-01-01 00:01",
        ...         "2023-01-01 01:05",
        ...         freq="32min",
        ...     ),
        ... )
        >>> t
        2023-01-01 00:01:00     1
        2023-01-01 00:33:00     2
        2023-01-01 01:05:00    10
        Freq: 32min, dtype: int64

        By default, the data are [upsampled][riweather.upsample] to minute-level before aggregation.

        >>> rollup_midpoint(t)
        2023-01-01 00:00:00    1.437500
        2023-01-01 01:00:00    5.661458
        Freq: h, dtype: float64

        The above is equivalent to upsampling, [shifting][pandas.Series.shift] the data forward
        by half of the period, and then aggregating with [Pandas `resample()`][pandas.Series.resample]:

        >>> x = upsample(t, period="min")
        >>> x
        2023-01-01 00:01:00     1.00000
        2023-01-01 00:02:00     1.03125
        2023-01-01 00:03:00     1.06250
        2023-01-01 00:04:00     1.09375
        2023-01-01 00:05:00     1.12500
                                 ...
        2023-01-01 01:01:00     9.00000
        2023-01-01 01:02:00     9.25000
        2023-01-01 01:03:00     9.50000
        2023-01-01 01:04:00     9.75000
        2023-01-01 01:05:00    10.00000
        Freq: min, Length: 65, dtype: float64
        >>> x.shift(freq="30min").resample("h").mean()
        2023-01-01 00:00:00    1.437500
        2023-01-01 01:00:00    5.661458
        Freq: h, dtype: float64

        To skip upsampling and aggregate raw values only, use `upsample_first=False`.

        >>> rollup_midpoint(t, upsample_first=False)
        2023-01-01 00:00:00    1.0
        2023-01-01 01:00:00    6.0
        Freq: h, dtype: float64
    """
    if upsample_first:
        data = upsample(data, period="min")
    half_period = to_offset(period) / 2
    return data.shift(freq=half_period).resample(period, label="left", closed="left").mean()

riweather.rollup_instant ¶

rollup_instant(data, period='h', *, upsample_first=True)

Roll up data, labelled with interpolated values.

Parameters:

data (Series | DataFrame) –

Time series data with a datetime index
period (str, default: 'h' ) –

Period to resample to. Defaults to h, which is hourly.
upsample_first (bool, default: True ) –

Perform minute-level upsampling prior to returning a value.

Returns:

Series | DataFrame –

The time series aligned to the specified period.

Examples:

>>> import pandas as pd
>>> t = pd.Series(
...     [1, 2, 10],
...     index=pd.date_range(
...         "2023-01-01 00:01",
...         "2023-01-01 01:05",
...         freq="32min",
...     ),
... )
>>> t
2023-01-01 00:01:00     1
2023-01-01 00:33:00     2
2023-01-01 01:05:00    10
Freq: 32min, dtype: int64

By default, the data are upsampled to minute-level before aggregation.

>>> rollup_instant(t)
2023-01-01 00:00:00    1.00
2023-01-01 01:00:00    8.75
Freq: h, dtype: float64

The above is equivalent to upsampling and then aggregating with Pandas resample(), but instead of taking the mean over each time period, taking the first value:

>>> x = upsample(t, period="min")
>>> x
2023-01-01 00:01:00     1.00000
2023-01-01 00:02:00     1.03125
2023-01-01 00:03:00     1.06250
2023-01-01 00:04:00     1.09375
2023-01-01 00:05:00     1.12500
                         ...
2023-01-01 01:01:00     9.00000
2023-01-01 01:02:00     9.25000
2023-01-01 01:03:00     9.50000
2023-01-01 01:04:00     9.75000
2023-01-01 01:05:00    10.00000
Freq: min, Length: 65, dtype: float64
>>> x.resample("h").first()
2023-01-01 00:00:00    1.00
2023-01-01 01:00:00    8.75
Freq: h, dtype: float64

To skip upsampling and aggregate raw values only, use upsample_first=False. Notice that this is simply the first value in every time period (hour by default).

>>> rollup_instant(t, upsample_first=False)
2023-01-01 00:00:00     1
2023-01-01 01:00:00    10
Freq: h, dtype: int64

Source code in src/riweather/stations.py

def rollup_instant(
    data: pd.Series | pd.DataFrame, period: str = "h", *, upsample_first: bool = True
) -> pd.Series | pd.DataFrame:
    """Roll up data, labelled with interpolated values.

    Args:
        data: Time series data with a datetime index
        period: Period to resample to. Defaults to `h`, which is hourly.
        upsample_first: Perform minute-level upsampling prior to returning
            a value.

    Returns:
        The time series aligned to the specified period.

    Examples:
        >>> import pandas as pd
        >>> t = pd.Series(
        ...     [1, 2, 10],
        ...     index=pd.date_range(
        ...         "2023-01-01 00:01",
        ...         "2023-01-01 01:05",
        ...         freq="32min",
        ...     ),
        ... )
        >>> t
        2023-01-01 00:01:00     1
        2023-01-01 00:33:00     2
        2023-01-01 01:05:00    10
        Freq: 32min, dtype: int64

        By default, the data are [upsampled][riweather.upsample] to minute-level before aggregation.

        >>> rollup_instant(t)
        2023-01-01 00:00:00    1.00
        2023-01-01 01:00:00    8.75
        Freq: h, dtype: float64

        The above is equivalent to upsampling and then aggregating with [Pandas `resample()`][pandas.Series.resample],
        but instead of taking the mean over each time period, taking the first value:

        >>> x = upsample(t, period="min")
        >>> x
        2023-01-01 00:01:00     1.00000
        2023-01-01 00:02:00     1.03125
        2023-01-01 00:03:00     1.06250
        2023-01-01 00:04:00     1.09375
        2023-01-01 00:05:00     1.12500
                                 ...
        2023-01-01 01:01:00     9.00000
        2023-01-01 01:02:00     9.25000
        2023-01-01 01:03:00     9.50000
        2023-01-01 01:04:00     9.75000
        2023-01-01 01:05:00    10.00000
        Freq: min, Length: 65, dtype: float64
        >>> x.resample("h").first()
        2023-01-01 00:00:00    1.00
        2023-01-01 01:00:00    8.75
        Freq: h, dtype: float64

        To skip upsampling and aggregate raw values only, use `upsample_first=False`. Notice that
        this is simply the first value in every time period (hour by default).

        >>> rollup_instant(t, upsample_first=False)
        2023-01-01 00:00:00     1
        2023-01-01 01:00:00    10
        Freq: h, dtype: int64
    """
    if upsample_first:
        data = upsample(data, period="min")
    return data.resample(period, label="left", closed="left").first()

riweather.upsample ¶

upsample(data, period='min')

Upsample and interpolate time series data.

Parameters:

data (Series | DataFrame) –

Time series data with a datetime index
period (str, default: 'min' ) –

Period to upsample to. Defaults to "min", which is minute-level

Returns:

Series | DataFrame –

Upsampled data

Examples:

>>> import pandas as pd
>>> t = pd.Series(
...     [1, 2, 10],
...     index=pd.date_range(
...         "2023-01-01 00:01",
...         "2023-01-01 01:05",
...         freq="32min",
...     ),
... )
>>> upsample(t)
2023-01-01 00:01:00     1.00000
2023-01-01 00:02:00     1.03125
2023-01-01 00:03:00     1.06250
2023-01-01 00:04:00     1.09375
2023-01-01 00:05:00     1.12500
                         ...
2023-01-01 01:01:00     9.00000
2023-01-01 01:02:00     9.25000
2023-01-01 01:03:00     9.50000
2023-01-01 01:04:00     9.75000
2023-01-01 01:05:00    10.00000
Freq: min, Length: 65, dtype: float64

You can upsample to a different frequency if you want.

>>> upsample(t, period="5min")
2023-01-01 00:00:00     1.000000
2023-01-01 00:05:00     1.166667
2023-01-01 00:10:00     1.333333
2023-01-01 00:15:00     1.500000
2023-01-01 00:20:00     1.666667
2023-01-01 00:25:00     1.833333
2023-01-01 00:30:00     2.000000
2023-01-01 00:35:00     3.142857
2023-01-01 00:40:00     4.285714
2023-01-01 00:45:00     5.428571
2023-01-01 00:50:00     6.571429
2023-01-01 00:55:00     7.714286
2023-01-01 01:00:00     8.857143
2023-01-01 01:05:00    10.000000
Freq: 5min, dtype: float64

Source code in src/riweather/stations.py

def upsample(data: pd.Series | pd.DataFrame, period: str = "min") -> pd.Series | pd.DataFrame:
    """Upsample and interpolate time series data.

    Args:
        data: Time series data with a datetime index
        period: Period to upsample to. Defaults to `"min"`, which is minute-level

    Returns:
        Upsampled data

    Examples:
        >>> import pandas as pd
        >>> t = pd.Series(
        ...     [1, 2, 10],
        ...     index=pd.date_range(
        ...         "2023-01-01 00:01",
        ...         "2023-01-01 01:05",
        ...         freq="32min",
        ...     ),
        ... )
        >>> upsample(t)
        2023-01-01 00:01:00     1.00000
        2023-01-01 00:02:00     1.03125
        2023-01-01 00:03:00     1.06250
        2023-01-01 00:04:00     1.09375
        2023-01-01 00:05:00     1.12500
                                 ...
        2023-01-01 01:01:00     9.00000
        2023-01-01 01:02:00     9.25000
        2023-01-01 01:03:00     9.50000
        2023-01-01 01:04:00     9.75000
        2023-01-01 01:05:00    10.00000
        Freq: min, Length: 65, dtype: float64

        You can upsample to a different frequency if you want.

        >>> upsample(t, period="5min")
        2023-01-01 00:00:00     1.000000
        2023-01-01 00:05:00     1.166667
        2023-01-01 00:10:00     1.333333
        2023-01-01 00:15:00     1.500000
        2023-01-01 00:20:00     1.666667
        2023-01-01 00:25:00     1.833333
        2023-01-01 00:30:00     2.000000
        2023-01-01 00:35:00     3.142857
        2023-01-01 00:40:00     4.285714
        2023-01-01 00:45:00     5.428571
        2023-01-01 00:50:00     6.571429
        2023-01-01 00:55:00     7.714286
        2023-01-01 01:00:00     8.857143
        2023-01-01 01:05:00    10.000000
        Freq: 5min, dtype: float64
    """
    return data.resample(period).mean().interpolate(method="linear", limit=60, limit_direction="both")