How to add a new reader
Adding a new reader for read support to pyaerocom does not require to integrate any code in PyAerocom; all you need to do is:
Create a class that inherits from
ReaderandEngineand implements the methods, see TimeseriesEngine/Reader subclassingDeclare the Engine-class as an external plugin in your
setup.pyor equivalent, see RST reader_registration
You can see what backends are currently available in your working environment
with list_timeseries_readers().
TimeseriesEngine/Reader subclassing
It is strongly advised to use the use the helper classes AutoFilterEngine
and AutoFilterReader to implement a Engine and a Reader since Filters will
automatically be handled then.
Subclassing of Engine/Reader using AutoFilterReaderEngine
Your YourEngine should extend AutoFilterEngine
and it must implement the following methods:
reader_class(): the class implementing AutoFilterReader, corresponding to this AutoFilterReaderEngine, i.e. def reader_class(self) -> AutoFilterReader: return YourReaderdescription(): a one-line description of this engineurl(): the link to the implementation source
The YourReader should extend AutoFilterReader
the
__init__method ofReaderwith two fixed args (self and filename_or_obj_or_url) and several kwargs, one of them should be filtersit must store the filters calling self._set_filters(filters)
the
_unfiltered_data()methodthe
_unfiltered_stations()methodthe
_unfiltered_variables()methodthe
close()method (might be pass, but Readers are also contextmanagers and will call close())
An example of an implementation can be found in the CSVTimeseriesReader.
Direct subclassing of Engine/Reader
This section gives an explanation of the basic usage when extending a Engine/Reader without the AutoFilter helper classes. filter-handling is here left to the developer.
Your TimeseriesReader sub-class is the primary interface with PyAerocom, and
it should implement the following attributes and methods:
the
__init__method (mandatory)the
datamethod (mandatory)the
stationsmethod (mandatory)the
variablesmethod (mandatory)the
closemethod (optional, if needed)
The entry-point to your Reader is a Engine, which also needs implementation:
the
openmethod, instantiating theReader(mandatory)the
argsreadonly attribute (mandatory, a list of arguments which can be given to open)the
supported_filtersreadonly attribute (mandatory, a list of filters)the
descriptionreadonly attribute (optional)the
urlreadonly attribute (optional) (reference to repository)
This is what a TimeseriesReader subclass should look like:
from pyaro.timeseries import Data, Reader, Station, Engine
class MyTimeseriesReader(Reader):
def __init__(
self,
filename_or_obj_or_url,
*,
filters=[],
# other backend specific keyword arguments
# `chunks` and `cache` DO NOT go here, they are handled by xarray
):
...
def data(self, varname):
...
def stations(self):
...
def variables(self):
...
class MyTimeseriesEngine(Engine)
def open(self, filename_or_obj_or_url, *args, **kwargs):
return MyTimeseriesReader(filename_or_obj_or_url, *args, **kwargs)
def args(self):
open_parameters = ["filename_or_obj", "filters"]
return open_parameters
def supported_filters(self):
return ["CountryFilter", "FlagFilter"]
def description(self):
return "Engine fro MyTimeseries files."
def url(self):
return "https://link_to/your_backend/documentation"
Reader subclass methods and attributes are detailed in the following.
The backend-Engine open shall implement reading from location, the variables
decoding and it shall instantiate the output PyAerocom class Data.
The following is an example of the high level processing steps:
def open(
self,
filename_or_obj_or_url,
*,
filters
):
return tsr
The input of open method are one argument
(filename_or_obj_or_url) and one keyword argument (drop_variables):
filename_or_obj_or_url: can be any object but usually it is a string containing a path or an instance ofpathlib.Pathor an url.filters: is an iterable containing filters to be (optionally) applied when reading the data.
Your reader can also take as input a set of backend-specific keyword
arguments. All these keyword arguments can be passed to
open() grouped either via the backend_kwargs
parameter or explicitly using the syntax **kwargs.
Engine.args
Engine.args is the list of backend open arguments.
Engine.description and Engine.url
description is used to provide a short text description of the backend.
url is used to include a link to the backend’s documentation or code.
These attributes are surfaced when a user prints list_timeseries_readers().
If description or url are not defined, an empty string is returned.
How to register a reader (backend)
Define a new entrypoint in your setup.py (or setup.cfg or pyproject.toml) with:
group:
pyaro.timeseriesname: the name to be passed to
timeseries()asengineobject reference: the reference of the Engine-class that you have implemented.
You can declare the entrypoint in setup.py using the following syntax:
setuptools.setup(
entry_points={
"pyaro.timeseries": ["my_timeseries_reader=my_package.my_module:MyTimeseriesEngine"],
},
)
in setup.cfg:
[options.entry_points]
pyaro.timeseries =
my_timeseries_reader = my_package.my_module:MyTimeseriesEngine
See https://packaging.python.org/specifications/entry-points/#data-model for more information
If you are using Poetry for your build system, you can accomplish the same thing using “plugins”.
In this case you would need to add the following to your pyproject.toml file:
[tool.poetry.plugins."pyaro.timeseries"]
"my_timesereiesreader" = "my_package.my_module:MyTimeseriesEngine"
See https://python-poetry.org/docs/pyproject/#plugins for more information on Poetry plugins.