How to add a new reader

Adding a new reader for read support to pyaerocom does not require to integrate any code in PyAerocom; all you need to do is:

You can see what backends are currently available in your working environment with list_timeseries_readers().

TimeseriesEngine/Reader subclassing

It is strongly advised to use the use the helper classes AutoFilterEngine and AutoFilterReader to implement a Engine and a Reader since Filters will automatically be handled then.

Subclassing of Engine/Reader using AutoFilterReaderEngine

Your YourEngine should extend AutoFilterEngine and it must implement the following methods:

  • reader_class(): the class implementing AutoFilterReader, corresponding to this AutoFilterReaderEngine, i.e. def reader_class(self) -> AutoFilterReader: return YourReader

  • description(): a one-line description of this engine

  • url(): the link to the implementation source

The YourReader should extend AutoFilterReader

  • the __init__ method of Reader with two fixed args (self and filename_or_obj_or_url) and several kwargs, one of them should be filters

    • it must store the filters calling self._set_filters(filters)

  • the _unfiltered_data() method

  • the _unfiltered_stations() method

  • the _unfiltered_variables() method

  • the close() method (might be pass, but Readers are also contextmanagers and will call close())

An example of an implementation can be found in the CSVTimeseriesReader.

Direct subclassing of Engine/Reader

This section gives an explanation of the basic usage when extending a Engine/Reader without the AutoFilter helper classes. filter-handling is here left to the developer.

Your TimeseriesReader sub-class is the primary interface with PyAerocom, and it should implement the following attributes and methods:

  • the __init__ method (mandatory)

  • the data method (mandatory)

  • the stations method (mandatory)

  • the variables method (mandatory)

  • the close method (optional, if needed)

The entry-point to your Reader is a Engine, which also needs implementation:

  • the open method, instantiating the Reader (mandatory)

  • the args readonly attribute (mandatory, a list of arguments which can be given to open)

  • the supported_filters readonly attribute (mandatory, a list of filters)

  • the description readonly attribute (optional)

  • the url readonly attribute (optional) (reference to repository)

This is what a TimeseriesReader subclass should look like:

from pyaro.timeseries import Data, Reader, Station, Engine


class MyTimeseriesReader(Reader):
    def __init__(
        self,
        filename_or_obj_or_url,
        *,
        filters=[],
        # other backend specific keyword arguments
        # `chunks` and `cache` DO NOT go here, they are handled by xarray
    ):
        ...


    def data(self, varname):
        ...

    def stations(self):
        ...

    def variables(self):
        ...

class MyTimeseriesEngine(Engine)
    def open(self, filename_or_obj_or_url, *args, **kwargs):
        return MyTimeseriesReader(filename_or_obj_or_url, *args, **kwargs)

    def args(self):
        open_parameters = ["filename_or_obj", "filters"]
        return open_parameters

    def supported_filters(self):
        return ["CountryFilter", "FlagFilter"]

    def description(self):
        return "Engine fro MyTimeseries files."

    def url(self):
        return "https://link_to/your_backend/documentation"

Reader subclass methods and attributes are detailed in the following.


The backend-Engine open shall implement reading from location, the variables decoding and it shall instantiate the output PyAerocom class Data.

The following is an example of the high level processing steps:

def open(
    self,
    filename_or_obj_or_url,
    *,
    filters
):
    return tsr

The input of open method are one argument (filename_or_obj_or_url) and one keyword argument (drop_variables):

  • filename_or_obj_or_url: can be any object but usually it is a string containing a path or an instance of pathlib.Path or an url.

  • filters: is an iterable containing filters to be (optionally) applied when reading the data.

Your reader can also take as input a set of backend-specific keyword arguments. All these keyword arguments can be passed to open() grouped either via the backend_kwargs parameter or explicitly using the syntax **kwargs.

Engine.args

Engine.args is the list of backend open arguments.

Engine.description and Engine.url

description is used to provide a short text description of the backend. url is used to include a link to the backend’s documentation or code.

These attributes are surfaced when a user prints list_timeseries_readers(). If description or url are not defined, an empty string is returned.

How to register a reader (backend)

Define a new entrypoint in your setup.py (or setup.cfg or pyproject.toml) with:

  • group: pyaro.timeseries

  • name: the name to be passed to timeseries() as engine

  • object reference: the reference of the Engine-class that you have implemented.

You can declare the entrypoint in setup.py using the following syntax:

setuptools.setup(
    entry_points={
        "pyaro.timeseries": ["my_timeseries_reader=my_package.my_module:MyTimeseriesEngine"],
    },
)

in setup.cfg:

[options.entry_points]
pyaro.timeseries =
    my_timeseries_reader = my_package.my_module:MyTimeseriesEngine

See https://packaging.python.org/specifications/entry-points/#data-model for more information

If you are using Poetry for your build system, you can accomplish the same thing using “plugins”. In this case you would need to add the following to your pyproject.toml file:

[tool.poetry.plugins."pyaro.timeseries"]
"my_timesereiesreader" = "my_package.my_module:MyTimeseriesEngine"

See https://python-poetry.org/docs/pyproject/#plugins for more information on Poetry plugins.