API
Documentation of the core API of pyaro.
Pyaro
- pyaro.list_timeseries_engines() dict[str, Engine][source]
Return a dictionary of available timeseries_readers and their objects.
- Return type:
dictionary
Notes
This function lives in the backends namespace (
engs=pyaro.list_timeseries_engines()). More information about each reader is available via the TimeseriesEngine obj.url() and obj.description()# New selection mechanism introduced with Python 3.10. See GH6514.
- pyaro.open_timeseries(name, *args, **kwargs) Reader[source]
open a timeseries reader directly, sending args and kwargs directly to the TimeseriesReader.open_reader() function
- Parameters:
name – the name of the entrypoint as key in list_timeseries_readers
- Returns:
an implementation-object of a TimeseriesReader opened to a location
pyaro.timeseries - User API
- class pyaro.timeseries.Reader(filename_or_obj_or_url, *, filters=None)[source]
Baseclass for timeseries. This can be used with a context manager
- abstract close() None[source]
Cleanup code for the reader.
This method will automatically be called when going out of context. Implement as dummy (pass) if no cleanup needed.
- abstract data(varname: str) Data[source]
Return all data for a variable
- Parameters:
varname – variable name as returned from variables
- Returns:
a data object
- metadata() dict[str, str][source]
Metadata set by the datasource.
The reader-implementation might add metadata depending on the data-source to this method.
:return dictionary with different metadata
- class pyaro.timeseries.Data[source]
Baseclass for data returned from a pyaro.timeseries.Reader.
This is the minimum set of columns required for a reader to return. A reader is welcome to return a self-implemented subclass of Data.
- abstract property altitudes: ndarray
A 1-dimensional array of altitudes (float)
- Returns:
1dim array of floats
- abstract property end_times: ndarray
A 1-dimensional array of int64 datetimes indicating the end of the measurement
- Returns:
1dim array of datetime64
- abstract property flags: ndarray
A 1-dimensional array of flags as defined in pyaro
- Returns:
1dim array of ints
- abstract keys()[source]
all available data-fields, excluding variable and units which are considered metadata
- abstract property latitudes: ndarray
A 1-dimensional array of latitudes (float)
- Returns:
1dim array of floats
- abstract property longitudes: ndarray
A 1-dimensional array of longitudes (float)
- Returns:
1dim array of floats
- abstract slice(index)[source]
Get a copy of this dataset as a slice.
- Parameters:
index – A boolean index of the size of data or integer. array
- Returns:
a new Data object
- abstract property standard_deviations: ndarray
A 1-dimensional array of stdevs. NaNs describe not available stdev per measurement
- Returns:
1dim array of floats
- abstract property start_times: ndarray
A 1-dimensional array of int64 datetimes indicating the start of the measurement
- Returns:
1dim array of datetime64
- abstract property stations: ndarray
A 1-dimensional array of station identifiers (strings, usually name)
- Returns:
1dim array of strings, max-length 64-chars
- abstract property units: str
Units in CF-notation, the same unit applies to all values
- Returns:
Units in CF-notation
- class pyaro.timeseries.Station(fields: dict | None = None, metadata: dict | None = None)[source]
Baseclass for a station returned from a pyaro.timeseries.Reader.
This is the minimum set of columns required for a reader to return. A reader is welcome to return a self-implemented subclass of Station.
All Station fields are accessible as a dict or as property, e.g.
` td = Station() print(td.station) print(td["station"]) `- init_kwargs() dict[str, dict][source]
implement a dict representation of this class to make it easier json serializable. Station(**another_station.init_kwargs()) should make a copy of the station.
- Returns:
a dict representation.
- keys()[source]
all available data-fields, excluding variable and units which are considered metadata
- set_fields(fields: dict)[source]
Initialization code for this station. Only known data-fields will be read from data, i.e. it is not possible to extend TimeseriesData without subclassing.
- Parameters:
fields – dict with the required fields: station, latitude, longitude, altitude, long_name, country, url
- Raises:
KeyError – on missing field
pyaro.timeseries.filters - Filters
- class pyaro.timeseries.Filter.FilterCollection(filterlist=[])[source]
Bases:
objectA collection of DataIndexFilters which can be applied together.
- Parameters:
filterlist – _description_, defaults to []
- Returns:
_description_
- add(difilter: DataIndexFilter)[source]
- filter(ts_reader, variable: str) Data[source]
Filter the data for a variable of a reader with all filters in this collection.
- Parameters:
ts_reader – a timeseries-reader instance
variable – a valid variable-name
- Returns:
filtered data
- filter_data(data: Data, stations: dict[str, Station], variables: str) Data[source]
Filter data with all filters in this collection.
- Parameters:
data – Data from a timeseries-reader, i.e. retrieved by ts.data(varname)
stations – stations-dict of a reader, i.e. retrieved by ts.stations()
variables – variables of a reader, i.e. retrieved by ts.variables()
- Returns:
_description_
- class pyaro.timeseries.Filter.FilterFactory[source]
Bases:
object- get(name, **kwargs)[source]
Get a filter by name. If kwargs are given, they will be send to the filters new method
- Parameters:
name – a filter-name
- Returns:
a filter, optionally initialized
- instance = <pyaro.timeseries.Filter.FilterFactory object>
- class pyaro.timeseries.Filter.AltitudeFilter(min_altitude: float | None = None, max_altitude: float | None = None)[source]
Bases:
StationReductionFilterFilter which filters stations based on their altitude. Can be used to filter for a minimum and/or maximum altitude.
:param min_altitude : float of minimum altitude in meters required to keep the station (inclusive). :param max_altitude : float of maximum altitude in meters required to keep the station (inclusive).
If station elevation is nan, it is always excluded.
- class pyaro.timeseries.Filter.BoundingBoxFilter(include: list[tuple[float, float, float, float]] = [], exclude: list[tuple[float, float, float, float]] = [])[source]
Bases:
StationReductionFilterFilter using geographical bounding-boxes. Coordinates should be given in the range [-180,180] (degrees_east) for longitude and [-90,90] (degrees_north) for latitude. Order of coordinates is clockwise starting with north, i.e.: (north, east, south, west) = NESW
- Parameters:
include – bounding boxes to include. Each bounding box is a tuple of four float for (NESW), defaults to [] meaning no restrictions
exclude – bounding boxes to exclude. Defaults to []
- Raises:
BoundingBoxException – on any errors of the bounding boxes
- filter_stations(stations: dict[str, Station]) dict[str, Station][source]
Filtering of stations list
- Parameters:
stations – List of stations, e.g. from a Reader.stations() call
- Returns:
dict of filtered stations
- class pyaro.timeseries.Filter.CountryFilter(include: list[str] = [], exclude: list[str] = [])[source]
Bases:
StationReductionFilterFilter countries by ISO2 names (capitals!)
- Parameters:
include – countries to include, defaults to [], meaning all countries
exclude – countries to exclude, defaults to [], meaning none
- class pyaro.timeseries.Filter.DuplicateFilter(duplicate_keys: list[str] | None = None)[source]
Bases:
DataIndexFilterremove duplicates from the data. By default, data with common station, start_time, end_time are consider duplicates. Only one of the duplicates is kept.
- Parameters:
duplicate_keys – list of data-fields/columns, defaults to None, being the same as [“stations”, “start_times”, “end_times”]
- default_keys = ['stations', 'start_times', 'end_times']
- class pyaro.timeseries.Filter.FlagFilter(include: list[Flag] = [], exclude: list[Flag] = [])[source]
Bases:
DataIndexFilterFilter data by Flags
- Parameters:
include – flags to include, defaults to [], meaning all flags
exclude – flags to exclude, defaults to [], meaning none
- class pyaro.timeseries.Filter.RelativeAltitudeFilter(topo_file: str | None = None, topo_var: str = 'topography', rdiff: float = 0)[source]
Bases:
StationFilterFilter class which filters stations based on the relative difference between the station altitude, and the gridded topography altitude.
- Parameters:
topo_file – A .nc file from which to read gridded topography data.
topo_var – Name of variable that stores altitude.
rdiff – Relative difference (in meters).
Note:
Stations will be kept if abs(altobs-altmod) <= rdiff.
Stations will not be kept if station altitude is NaN.
Note:
This filter requires additional dependencies (xarray, netcdf4, cf-units) to function. These can be installed with `pip install .[optional]
- class pyaro.timeseries.Filter.StationFilter(include: list[str] = [], exclude: list[str] = [])[source]
Bases:
StationReductionFilter
- class pyaro.timeseries.Filter.TimeBoundsFilter(start_include: list[tuple[str | datetime64 | datetime, str | datetime64 | datetime]] = [], start_exclude: list[tuple[str | datetime64 | datetime, str | datetime64 | datetime]] = [], startend_include: list[tuple[str | datetime64 | datetime, str | datetime64 | datetime]] = [], startend_exclude: list[tuple[str | datetime64 | datetime, str | datetime64 | datetime]] = [], end_include: list[tuple[str | datetime64 | datetime, str | datetime64 | datetime]] = [], end_exclude: list[tuple[str | datetime64 | datetime, str | datetime64 | datetime]] = [])[source]
Bases:
DataIndexFilterFilter data by start and/or end-times of the measurements. Each timebound consists of a bound-start and bound-end (both included). Timestamps are given as YYYY-MM-DD HH:MM:SS in UTC
- Parameters:
start_include – list of tuples of start-times, defaults to [], meaning all
start_exclude – list of tuples of start-times, defaults to []
startend_include – list of tuples of start and end-times, defaults to [], meaning all
startend_exclude – list of tuples of start and end-times, defaults to []
end_include – list of tuples of end-times, defaults to [], meaning all
end_exclude – list of tuples of end-times, defaults to []
- Raises:
TimeBoundsException – on any errors with the time-bounds
Examples:
end_include: [(“2023-01-01 10:00:00”, “2024-01-01 07:00:00”)] will only include observations where the end time of each observation is within the interval specified (i.e. “end” >= 2023-01-01 10:00:00 and “end” <= “2024-01-01 07:00:00”)
Including multiple bounds will act as an OR, allowing multiple selections. If we want every observation in January for 2021, 2022, 2023, and 2024 this could be made as the following filter:
startend_include: [ ("2021-01-01 00:00:00", "2021-02-01 00:00:00"), ("2022-01-01 00:00:00", "2022-02-01 00:00:00"), ("2023-01-01 00:00:00", "2023-02-01 00:00:00"), ("2024-01-01 00:00:00", "2024-02-01 00:00:00"), ]
- contains(dt_start: ndarray[tuple[int, ...], dtype[datetime64]], dt_end: ndarray[tuple[int, ...], dtype[datetime64]]) ndarray[tuple[int, ...], dtype[bool]][source]
Test if datetimes in dt_start, dt_end belong to this filter
- Parameters:
dt_start – start of each observation as a numpy array of datetimes
dt_end – end of each observation as a numpy array of datetimes
- Returns:
numpy boolean array with True/False values
- envelope() tuple[datetime, datetime][source]
Get the earliest and latest time possible for this filter.
- Returns:
earliest start and end-time (approximately)
- Raises:
TimeBoundsException – if has_envelope() is False, or internal errors
- class pyaro.timeseries.Filter.TimeResolutionFilter(resolutions: list[str] = [])[source]
Bases:
DataIndexFilterThe timeresolution filter allows to restrict the observation data to certain time-resolutions. Time-resolutions are not exact, and might be interpreted slightly differently by different observation networks.
- Default named time-resolutions are
minute: 59 to 61 s (+-1sec)
hour: 59*60 s to 61*60 s (+-1min)
day: 22:59:00 to 25:01:00 to allow for leap-days and a extra min
week: 6 to 8 days (+-1 day)
month: 27-33 days (30 +- 3 days)
year: 360-370 days (+- 5days)
- Parameters:
resolutions – a list of wanted time resolutions. A resolution consists of a integer
number and a time-resolution name, e.g. 3 hour (no plural).
- filter_data_idx(data: Data, stations: dict[str, Station], variables: list[str])[source]
Filter data to an index which can be applied to Data.slice(idx) later
- Returns:
a index for Data.slice(idx)
- named_resolutions = {'day': (82740, 90060), 'hour': (3540, 3660), 'minute': (59, 61), 'month': (2332800, 2851200), 'week': (518400, 691200), 'year': (31104000, 31968000)}
- pattern = re.compile('\\s*(\\d+)\\s*(\\w+)\\s*')
- class pyaro.timeseries.Filter.TimeVariableStationFilter(exclude=[], exclude_from_csvfile='')[source]
Bases:
DataIndexFilterExclude combinations of variable station and time from the data
This filter is really a cleanup of the database, but sometimes it is not possible to modify the original database and the cleanup needs to be done on a filter basis.
- Parameters:
exclude – tuple of 4 elements: start-time, end-time, variable, station
exclude_from_csvfile –
this is a helper option to enable a large list of excludes to be read from a “ “ separated file with columns
start end variable station
where start and end are timestamps of format YYYY-MM-DD HH:MM:SS in UTC, e.g. the year 2020 is:
2020-01-01 00:00:00 2020-12-31 23:59:59 …
- class pyaro.timeseries.Filter.ValleyFloorRelativeAltitudeFilter(topo: str | None = None, *, radius: float = 5000, topo_var: str = 'Band1', lower: float | None = None, upper: float | None = None, keep_nan: bool = True)[source]
Bases:
StationFilterFilter for filtering stations based on the difference between the station altitude and valley floor altitude (defined as the lowest altitude within a radius around the station). This ensures that plateau sites are treated like “surface” sites, while sites in hilly or mountaineous areas (eg. Schauinsland) are considered mountain sites. This approach has been used by several papers (eg. Fowler et al., Lloibl et al. 1994).
- Parameters:
topo – Topography file path (either a file or a directory). Must be a dataset openable by xarray, with latitude and longitude stored as “lat” and “lon” respectively. The variable that contains elevation data is assumed to be in meters. If topo is a directory, a metadata.json file containing the geographic bounds of each file must be present (see below for example).
radius – Radius (in meters)
topo_var – Variable name to use in topography dataset
lower – Optional lower bound needed for relative altitude for station to be kept (in meters)
upper – Optional upper bound needed for relative altitude for station to be kept (in meters)
keep_nan – Whether to keep values where relative altitude is calculated as nan. Defaults to True. Note: Since the topography does not contain values for oceans this may happen for small islands and coastal stations.
- Raises:
ModuleNotFoundError – If necessary required additional dependencies (cf_units, xarray) are not available.
Note
This implementation is only tested with GTOPO30 dataset to far.
Available versions of gtopo30 can be found here: /lustre/storeB/project/aerocom/aerocom1/AEROCOM_OBSDATA/GTOPO30/
Note
metadata.json should contain a mapping from each nc file, to it’s geographic latitude/longitude bounds.
For example:
``` {
- “N.nc”: {
“w”: -180, “e”: 180, “n”: 90, “s”: -10
}, “S.nc”: {
“w”: -180, “e”: 180, “n”: -10, “s”: -90
}
- class pyaro.timeseries.Filter.VariableNameFilter(reader_to_new: dict[str, str] = {}, include: list[str] = [], exclude: list[str] = [])[source]
Bases:
FilterFilter to change variable-names and/or include/exclude variables
- Parameters:
reader_to_new – dictionary from readers-variable names to new variable-names, e.g. used in your project, defaults to {}
include – list of variables to include only (new names if changed), defaults to [] meaning keep all variables unless excluded.
exclude – list of variables to exclude (new names if changed), defaults to []
- filter_variables(variables: list[str]) list[str][source]
change variable name and reduce variables applying include and exclude parameters
- Parameters:
variables – variable names as in the reader
- Returns:
valid variable names in translated nomenclature
- has_reader_variable(variable) bool[source]
Check if variable-name is in the list of variables applying include and exclude
- Parameters:
variable – variable as returned from the reader
- Returns:
True or False
- has_variable(variable) bool[source]
check if a variable-name is in the list of variables applying include and exclude
- Parameters:
variable – variable name in translated, i.e. new scheme
- Returns:
True or False
pyaro.timeseries - Dev API
- class pyaro.timeseries.Engine[source]
The engine is the ‘singleton’ generator object for databases of the engines type.
- abstract property args: list[str]
return a tuple of parameters to be passed to open_timeseries, including the mandatory filename_or_obj_or_url parameter.
- abstract property description
Get a descriptive string about this pyaro implementation.
- abstract open(filename_or_obj_or_url, *, filters=None)[source]
open-function of the timeseries, initializing the reader-object, i.e. equivalent to Reader(filename_or_object_or_url, *, filters)
:return pyaro.timeseries.Reader :raises UnknownFilterException
- abstract property supported_filters: list[str]
The class-names of the supported filters by this reader.
If the reader is called with a filter which is not a instance of this class, it is supposed to raise a UnknownFilterException. Using a subclass of a filter is not allowed unless explicitly listed here.
- Returns:
list of classnames
- abstract property url
Get a url about more information, docs of the datasource-engine.
This should be the github-url or similar of the implementation.
- class pyaro.timeseries.NpStructuredData(variable: str = '', units: str = '')[source]
An implementation of Data using numpy Structured Arrays.
This is the minimum set of columns required for a reader to return. A reader is welcome to return a self-implemented subclass of Data.
Data can be added by rows with the append method, or a completed numpy.StructuredArray can be submitted using set_data.
- property altitudes: ndarray
A 1-dimensional array of altitudes (float)
- Returns:
1dim array of floats
- append(value, station, latitude, longitude, altitude, start_time, end_time, flag=Flag.VALID, standard_deviation=nan)[source]
append with a new data-row, or numpy arrays
:param value :param station :param latitude :param longitude :param altitude :param start_time :param end_time :param flag: defaults to Flag.VALID :param standard_deviation: defaults to np.nan
- property end_times: ndarray
A 1-dimensional array of int64 datetimes indicating the end of the measurement
- Returns:
1dim array of datetime64
- property flags: ndarray
A 1-dimensional array of flags as defined in pyaro
- Returns:
1dim array of ints
- keys()[source]
all available data-fields, excluding variable and units which are considered metadata
- property latitudes: ndarray
A 1-dimensional array of latitudes (float)
- Returns:
1dim array of floats
- property longitudes: ndarray
A 1-dimensional array of longitudes (float)
- Returns:
1dim array of floats
- set_data(variable: str, units: str, data: array)[source]
Initialization code for the data. Only known data-fields will be read from data, i.e. it is not possible to extend TimeseriesData without subclassing.
- slice(index)[source]
Get a copy of this dataset as a slice.
- Parameters:
index – A boolean index of the size of data or integer. array
- Returns:
a new Data object
- property standard_deviations: ndarray
A 1-dimensional array of stdevs. NaNs describe not available stdev per measurement
- Returns:
1dim array of floats
- property start_times: ndarray
A 1-dimensional array of int64 datetimes indicating the start of the measurement
- Returns:
1dim array of datetime64
- property stations: ndarray
A 1-dimensional array of station identifiers (strings, usually name)
- Returns:
1dim array of strings, max-length 64-chars
- class pyaro.timeseries.AutoFilterReaderEngine.AutoFilterEngine[source]
The AutoFilterEngine class implements the supported_filters and args method using introspection from the corresponding reader-class. The reader_class method needs therefore to be implemented by this class.
- _abc_impl = <_abc._abc_data object>
- args()[source]
return a tuple of parameters to be passed to open_timeseries, including the mandatory filename_or_obj_or_url parameter.
- open(filename, *args, **kwargs) Reader[source]
open-function of the timeseries, initializing the reader-object, i.e. equivalent to Reader(filename_or_object_or_url, *, filters)
:return pyaro.timeseries.Reader :raises UnknownFilterException
- abstract reader_class() AutoFilterReader[source]
return the class of the corresponding reader
- Returns:
the class returned from open
- class pyaro.timeseries.AutoFilterReaderEngine.AutoFilterReader(filename_or_obj_or_url, *, filters=None)[source]
This helper class applies automatically all filters on the Reader methods Reader.data, Reader.stations and Reader.variables. For this to work, the reader needs to implement _unfiltered_data, _unfiltered_stations and _unfiltered_variables.
It adds also an overwritable classmethod supported_filters() listing all possible filters. This is both used for the AutoFilterEngine, and for the check_filters method which should be used during initialization when filters are given.
The implementation must also use _set_filters() to add the filters from __init__.
- _abc_impl = <_abc._abc_data object>
- _get_filters() list[Filter][source]
Get a list of filters actually set during initialization of this object.
- Returns:
list of filters
- data(varname) Data[source]
Return all data for a variable
- Parameters:
varname – variable name as returned from variables
- Returns:
a data object
- stations() dict[str, Station][source]
Dictionary of all stations available for this reader.
- Returns:
dictionary with station-id as returned from data to Station metadata.
- class pyaro.timeseries.Filter.DataIndexFilter(**kwargs)[source]
Bases:
FilterA abstract baseclass implementing filter_data by an abstract method filter_data_idx
- filter_data(data: Data, stations: dict[str, Station], variables: list[str]) Data[source]
Filtering of data
- Parameters:
data – Data of e.g. a Reader.data(varname) call
stations – List of stations, e.g. from a Reader.stations() call
variables – variables, i.e. from a Reader.variables() call
- Returns:
a updated Data-object with this filter applied
- class pyaro.timeseries.Filter.Filter(**kwargs)[source]
Bases:
ABCBase-class for all filters used from pyaro-Readers
- args() dict[str, Any][source]
retrieve the kwargs possible to retrieve a new object of this filter with filter restrictions
- Returns:
a dictionary possible to use as kwargs for the new method
- filter_data(data: Data, stations: dict[str, Station], variables: list[str]) Data[source]
Filtering of data
- Parameters:
data – Data of e.g. a Reader.data(varname) call
stations – List of stations, e.g. from a Reader.stations() call
variables – variables, i.e. from a Reader.variables() call
- Returns:
a updated Data-object with this filter applied
- filter_stations(stations: dict[str, Station]) dict[str, Station][source]
Filtering of stations list
- Parameters:
stations – List of stations, e.g. from a Reader.stations() call
- Returns:
dict of filtered stations
- filter_variables(variables: list[str]) list[str][source]
Filtering of variables
- Parameters:
variables – List of variables, e.g. from a Reader.variables() call
- Returns:
List of filtered variables.
- abstract name() str[source]
Return a unique name for this filter
- Returns:
a string to be used by FilterFactory
- time_format = '%Y-%m-%d %H:%M:%S'
csvreader for timeseries
A simple implementation of a timeseries reader based on csv-files, usually accessed
as pyaro.open-timeseries('csv_timeseries', ...)
- class pyaro.csvreader.CSVTimeseriesReader(filename, columns={'altitude': '0', 'country': 'NO', 'end_time': 7, 'flag': '0', 'latitude': 3, 'longitude': 2, 'standard_deviation': 'NaN', 'start_time': 6, 'station': 1, 'units': 5, 'value': 4, 'variable': 0}, variable_units: dict[str, str] = {}, country_lookup=False, csvreader_kwargs={'delimiter': ','}, skip_header_rows: int = 0, filters=[])[source]
- close()[source]
Cleanup code for the reader.
This method will automatically be called when going out of context. Implement as dummy (pass) if no cleanup needed.