Structure of riweather
Data¶
The primary way to retrieve data using riweather
is by using the riweather.Station.fetch_data
function. It
returns data as a Pandas DataFrame for easy integration with other datasets and processing
pipelines.
Column names in the output¶
The columns of the retrieved data match the structure returned by
the riweather
parser and also that which is laid out in the
ISD Data Documentation. Each column is named like
where <observation type>
is one of the fields of the mandatory data section such as
wind
, air_temperature
, or dew_point
, and <attribute name>
is one of the child attributes like speed_rate
or
temperature_c
. So, if we fetch air temperature data from a station:
That DataFrame will contain three columns, one for each of the
attributes of AirTemperatureObservation
.
>>> df.columns.to_list()
['air_temperature.temperature_c',
'air_temperature.quality_code',
'air_temperature.temperature_f']
>>> df["air_temperature.temperature_f"]
2024-01-01 00:15:00+00:00 34.16
2024-01-01 00:35:00+00:00 33.62
2024-01-01 00:55:00+00:00 32.00
2024-01-01 01:15:00+00:00 30.74
2024-01-01 01:35:00+00:00 29.48
Name: air_temperature.temperature_f, dtype: float64
The long column names are a lot of typing, but this was done to ensure there are no conflicts between similar types of
observations. For example, 'air_temperature'
and 'dew_point'
are both AirTemperatureObservations
, so they both
have temperature_c
as attributes.
>>> df = s.fetch_data(2024, ["dew_point", "air_temperature"])
>>> df.columns.to_list()
['air_temperature.temperature_c',
'air_temperature.quality_code',
'air_temperature.temperature_f',
'dew_point.temperature_c',
'dew_point.quality_code',
'dew_point.temperature_f']
We recommend renaming columns to something shorter once you have retrieved the data.
>>> df = df.rename(columns={"air_temperature.temperature_f": "tempF",
... "dew_point.temperature_f": "dewF"})
>>> df[["tempF", "dewF"]].head()
tempF dewF
2024-01-01 00:15:00+00:00 34.16 20.48
2024-01-01 00:35:00+00:00 33.62 20.48
2024-01-01 00:55:00+00:00 32.00 19.40
2024-01-01 01:15:00+00:00 30.74 19.22
2024-01-01 01:35:00+00:00 29.48 19.22
Time zones¶
Danger
All timestamps are reported in UTC by default. If you are aligning weather data to another data set, make sure you convert the weather data to the proper time zone first!
Timestamps in the Integrated Surface Dataset are always stored in UTC. This is great news as consumers of the data, because it eliminates any ambiguity around daylight savings conversions. However, it does mean that the weather observations need to be converted to local time before they can be aligned with other datasets.
You can do this very easily with Pandas after you have retrieved the data.
>>> s = riweather.Station("720534")
>>> df = s.fetch_data(2024, "air_temperature", temp_scale="F", include_quality_codes=False)
>>> df.head() # note the timestamps ending in +00:00, indicating UTC
air_temperature.temperature_f
2024-01-01 00:15:00+00:00 34.16
2024-01-01 00:35:00+00:00 33.62
2024-01-01 00:55:00+00:00 32.00
2024-01-01 01:15:00+00:00 30.74
2024-01-01 01:35:00+00:00 29.48
>>> df = df.tz_convert("US/Mountain")
air_temperature.temperature_f
2023-12-31 17:15:00-07:00 34.16
2023-12-31 17:35:00-07:00 33.62
2023-12-31 17:55:00-07:00 32.00
2023-12-31 18:15:00-07:00 30.74
2023-12-31 18:35:00-07:00 29.48
Or, even easier, riweather.Station.fetch_data
will do it for you if you pass the tz
parameter.
>>> s = riweather.Station("720534")
>>> df = s.fetch_data(2024, "air_temperature", tz="US/Mountain",
... temp_scale="F", include_quality_codes=False)
air_temperature.temperature_f
2023-12-31 17:15:00-07:00 34.16
2023-12-31 17:35:00-07:00 33.62
2023-12-31 17:55:00-07:00 32.00
2023-12-31 18:15:00-07:00 30.74
2023-12-31 18:35:00-07:00 29.48
Quality codes¶
ISD data has lots of shorthand codes indicating the source(s) of the weather observations, the method by which they were collected, and the quality or reliability of the data. These codes make the data more complicated to look at and increase the number of columns by quite a bit, but they can nevertheless be helpful when determining how trustworthy the observations are. See the Shorthand Codes page for informative descriptions of what these codes mean.
riweather
returns these codes so that you can inspect them, but also provides some ways to suppress them from the
output if you’d rather not clutter up the results. For example, specify include_quality_codes=False
to prevent any
column ending in quality_code
from ending up in the output.
>>> s = riweather.Station("720534")
>>> df = s.fetch_data(2024, "air_temperature")
>>> df.columns.to_list()
['air_temperature.temperature_c',
'air_temperature.quality_code',
'air_temperature.temperature_f']
>>> df = s.fetch_data(2024, "air_temperature", include_quality_codes=False)
>>> df.columns.to_list()
['air_temperature.temperature_c',
'air_temperature.temperature_f']
Of course, you could always drop the quality code columns yourself.
>>> s = riweather.Station("720534")
>>> df = s.fetch_data(2024, "air_temperature")
>>> df.columns.to_list()
['air_temperature.temperature_c',
'air_temperature.quality_code',
'air_temperature.temperature_f']
>>> df = df.drop(columns=["air_temperature.quality_code"])
>>> df.columns.to_list()
['air_temperature.temperature_c',
'air_temperature.temperature_f']