Working with Pyaro filters

Filters in Pyaro exist to reduce or modify the amount of data delivered by a database.

Pyaro has a set of build-in filters under pyaro.filters. In addition, engines can add additional filters for their specific engine.

Listing the default filters

[15]:
import pyaro

pyaro.timeseries.filters.list()
[15]:
mappingproxy({'variables': VariableNameFilter(**{'reader_to_new': {}, 'include': [], 'exclude': []}),
              'stations': StationFilter(**{'include': [], 'exclude': []}),
              'countries': CountryFilter(**{'include': [], 'exclude': []}),
              'bounding_boxes': BoundingBoxFilter(**{'include': [], 'exclude': []}),
              'flags': FlagFilter(**{'include': [], 'exclude': []}),
              'time_bounds': TimeBoundsFilter(**{'start_include': [], 'start_exclude': [], 'startend_include': [], 'startend_exclude': [], 'end_include': [], 'end_exclude': []})})

The keys of the return dictionary, i.e. variables, stations,… should be used to get a initialized filter, e.g. a country-filter selecting only Norway:

[16]:
norway_filter = pyaro.timeseries.filters.get('countries', **{'include': ['NO']})
print(norway_filter.name())
print(norway_filter)
countries
CountryFilter(**{'include': ['NO'], 'exclude': []})

Listing the filters of an engine

[17]:
pr_csv = pyaro.list_timeseries_engines()['csv_timeseries']
pr_csv.supported_filters()
[17]:
[VariableNameFilter(**{'reader_to_new': {}, 'include': [], 'exclude': []}),
 StationFilter(**{'include': [], 'exclude': []}),
 CountryFilter(**{'include': [], 'exclude': []}),
 BoundingBoxFilter(**{'include': [], 'exclude': []}),
 TimeBoundsFilter(**{'start_include': [], 'start_exclude': [], 'startend_include': [], 'startend_exclude': [], 'end_include': [], 'end_exclude': []}),
 FlagFilter(**{'include': [], 'exclude': []})]

Programmatic and Declarative usage of filters

When opening the data-source/the database, these filters can be given as dictionary or list. The following two open-calls are identical. The first one is programmatical, while the second one is declarative. The declarative version is often preferred for use in larger programs like pyaerocom.

If multiple filters are given, all filters must pass the filter tests, in other words, filters are implicitly connected by an AND operator.

[18]:
ts = pyaro.open_timeseries('csv_timeseries', filename="../../tests/testdata/csvReader_testdata.csv", filters=[norway_filter])
print([str(stat) for stat in ts.stations().values()])
ts = pyaro.open_timeseries('csv_timeseries', filename="../../tests/testdata/csvReader_testdata.csv", filters={'countries': {'include': ['NO']}})
print([str(stat) for stat in ts.stations().values()])

["{'station': 'station1', 'latitude': 10.5, 'longitude': 172.5, 'altitude': 0.0, 'long_name': 'station1', 'country': 'NO', 'url': ''}", "{'station': 'station2', 'latitude': 45.5, 'longitude': -103.2, 'altitude': 0.0, 'long_name': 'station2', 'country': 'NO', 'url': ''}"]
["{'station': 'station1', 'latitude': 10.5, 'longitude': 172.5, 'altitude': 0.0, 'long_name': 'station1', 'country': 'NO', 'url': ''}", "{'station': 'station2', 'latitude': 45.5, 'longitude': -103.2, 'altitude': 0.0, 'long_name': 'station2', 'country': 'NO', 'url': ''}"]

Filter-Usage outside of a Reader

Sometimes users want to work with a existing reader with different sets of filters. Many Filters (all which inherit from DataIndexFilter) can work with an existing reader. The FilterCollection helps to bundle these filters.

[19]:
ts = pyaro.open_timeseries('csv_timeseries', filename="../../tests/testdata/csvReader_testdata.csv")
fc = pyaro.timeseries.FilterCollection({
                    "countries": {"include": ["NO"]},
                    "stations": {"include": ["station1"]},
                })
print("all SOx data:", len(ts.data("SOx"))) # 104
print("filtered SOx data:", len(fc.filter(ts, "SOx"))) # 52
all SOx data: 104
filtered SOx data: 52

If you want to apply several filtercollections on the same data without re-reading it from the reader you can use FilterCollection.filter_data, i.e. here for filtering data more explicit.

[20]:
# store some information the filters might need
stations = ts.stations()
variables = ts.variables()
all_data = ts.data("SOx")
for station in stations.keys():
    fc = pyaro.timeseries.FilterCollection({
                    "countries": {"include": ["NO"]},
                    "stations": {"include": [station]},
                })
    data = fc.filter_data(all_data, stations, variables)
    print(f"Data-points for {data.variable} and {data.stations[0]}: {len(data)}")
Data-points for SOx and station1: 52
Data-points for SOx and station2: 52

Filtering data without using a Filter

The Data returned from a reader can also be sliced with a numpy-index array (boolean array with the same size as data). The follow example will only give data-points for low latitutudes <20° north (i.e. only station2, see above stations listing.)

[21]:
low_lat_data = all_data.slice((all_data.latitudes < 20) & (all_data.latitudes > -20))
print("low latitude data-points: ", len(low_lat_data))
low latitude data-points:  52