Pyaro basic example

Install pyaro and check if installation is new enough:

[18]:

import pyaro
pyaro.__version__

[18]:

'0.0.5'

Check a list of installed engines. The most basic installation will install only the csv_timeseries engine. Install e.g. https://github.com/metno/pyaro-readers for many more engines.

[19]:

pyaro.list_timeseries_engines()

[19]:

{'csv_timeseries': <pyaro.csvreader.CSVTimeseriesReader.CSVTimeseriesEngine at 0x7ff77705f250>}

Learn a bit about the engine.

[20]:

pr_csv = pyaro.list_timeseries_engines()['csv_timeseries']
help(pr_csv)

Help on CSVTimeseriesEngine in module pyaro.csvreader.CSVTimeseriesReader object:

class CSVTimeseriesEngine(pyaro.timeseries.AutoFilterReaderEngine.AutoFilterEngine)
 |  Method resolution order:
 |      CSVTimeseriesEngine
 |      pyaro.timeseries.AutoFilterReaderEngine.AutoFilterEngine
 |      pyaro.timeseries.Engine.Engine
 |      abc.ABC
 |      builtins.object
 |
 |  Methods defined here:
 |
 |  description(self)
 |      Get a descriptive string about this pyaro implementation.
 |
 |  open(self, filename, *args, **kwargs) -> pyaro.csvreader.CSVTimeseriesReader.CSVTimeseriesReader
 |      open-function of the timeseries, initializing the reader-object, i.e.
 |      equivalent to Reader(filename_or_object_or_url, *, filters)
 |
 |      :return pyaro.timeseries.Reader
 |      :raises UnknownFilterException
 |
 |  reader_class(self)
 |      return the class of the corresponding reader
 |
 |      :return: the class returned from open
 |
 |  url(self)
 |      Get a url about more information, docs of the datasource-engine.
 |
 |      This should be the github-url or similar of the implementation.
 |
 |  ----------------------------------------------------------------------
 |  Data and other attributes defined here:
 |
 |  __abstractmethods__ = frozenset()
 |
 |  ----------------------------------------------------------------------
 |  Methods inherited from pyaro.timeseries.AutoFilterReaderEngine.AutoFilterEngine:
 |
 |  args(self)
 |      return a tuple of parameters to be passed to open_timeseries, including
 |      the mandatory filename_or_obj_or_url parameter.
 |
 |  supported_filters(self) -> [<class 'pyaro.timeseries.Filter.Filter'>]
 |      The supported filters by this Engine. Maps to the Readers supported_filters.
 |
 |      :return: a list of filters
 |
 |  ----------------------------------------------------------------------
 |  Data descriptors inherited from pyaro.timeseries.Engine.Engine:
 |
 |  __dict__
 |      dictionary for instance variables (if defined)
 |
 |  __weakref__
 |      list of weak references to the object (if defined)

Check the description and the open-arguments to open a database with this engine:

[21]:

print(pr_csv.description())
print(pr_csv.args())

Simple reader of csv-files using python csv-reader
('filename', 'columns', 'variable_units', 'country_lookup', 'csvreader_kwargs', 'filters')

Opening a datasource with an engine

Open now the timeseries ts with a table. You could do that with a with clause in larger code, but for simplicity, we don’t do that here. columns map the files columns to the data, starting with first column as 0, which contains the variable-name in our example file.

The test-file is read using the python csv module. csvreader_kwargs sets up that module, i.e. comma-separated setting the delimiter.

[22]:

file = "../../tests/testdata/csvReader_testdata.csv"
columns = {
            "variable": 0,
            "station": 1,
            "longitude": 2,
            "latitude": 3,
            "value": 4,
            "units": 5,
            "start_time": 6,
            "end_time": 7,
            "altitude": "0",
            "country": "NO",
            "standard_deviation": "NaN",
            "flag": "0",
        }
csvreader_kwargs = {"delimiter": ","}

ts = pyaro.open_timeseries('csv_timeseries',
                           filename=file,
                           columns=columns,
                           csvreader_kwargs=csvreader_kwargs,
                           filters=[])

ts is now the handle to the data-source.

Accessing metadata in the datasource, like available variables and stations

[23]:

print(ts.variables())
print(ts.stations())

dict_keys(['SOx', 'NOx'])
{'station1': <pyaro.timeseries.Station.Station object at 0x7ff776cc9d20>, 'station2': <pyaro.timeseries.Station.Station object at 0x7ff776cca6e0>}

The timeseries must be accessed per variable. It will be returned for all stations. The data-columns can be accessed by keys():

[24]:

var = 'SOx'
ts_data = ts.data(var)
print(ts_data.keys())
ts_data.stations
ts_data.start_times
ts_data.end_times
ts_data.latitudes
ts_data.longitudes
ts_data.altitudes
ts_data.flags
ts_data.values

('values', 'stations', 'latitudes', 'longitudes', 'altitudes', 'start_times', 'end_times', 'flags', 'standard_deviations')

[24]:

array([44.377964 , 73.23672  , 66.83997  , 75.973015 , 54.252964 ,
       95.51215  , 43.424374 , 14.8503275, 39.78734  , 84.14651  ,
        2.3796806, 56.030033 , 90.70785  , 53.49256  , 33.27008  ,
       19.200666 , 16.61291  , 95.239876 , 58.38857  , 25.010443 ,
       49.31731  , 95.74444  , 35.146294 , 31.468204 , 70.109985 ,
       46.82392  , 44.06993  , 15.679094 , 54.04226  , 42.6484   ,
       21.370073 , 37.34375  , 14.086469 , 31.23552  , 12.328813 ,
       85.39133  , 96.85262  , 68.06294  , 67.1648   , 27.18295  ,
       28.523333 ,  1.4397316, 74.56935  , 50.91362  , 34.764988 ,
        4.5323606, 29.767143 , 16.157143 , 61.595753 , 57.319874 ,
       63.740353 ,  4.939785 ,  5.5386314, 73.256615 , 18.165173 ,
       96.29508  , 20.86049  , 60.049885 , 36.644806 , 70.943375 ,
        9.295645 ,  1.7138128, 56.983192 , 89.55616  , 13.375153 ,
       49.939552 , 31.528936 , 78.00686  , 28.33076  , 16.8259   ,
       73.02892  , 96.075714 , 19.514969 , 68.14331  , 21.966438 ,
       62.26828  , 82.37647  , 26.558168 , 58.01865  , 56.723133 ,
       10.252709 ,  7.623141 , 33.05347  , 26.62592  , 41.58915  ,
       27.843248 , 85.996025 , 74.1133   , 42.667347 , 43.756298 ,
       10.930091 , 15.341663 , 44.52167  ,  3.720179 , 88.960014 ,
       61.212017 , 93.44711  , 19.978394 , 61.643723 , 85.183685 ,
       93.348305 , 97.57919  , 19.217777 , 11.676097 ], dtype=float32)

Conversion to pandas

For pandas users, the timeseries data can be converted to a dataframe:

[25]:

pyaro.timeseries_data_to_pd(ts_data)

[25]:

	values	stations	latitudes	longitudes	altitudes	start_times	end_times	flags	standard_deviations
0	44.377964	station1	10.5	172.500000	0.0	1997-01-01	1997-01-02	0	NaN
1	73.236717	station1	10.5	172.500000	0.0	1997-01-02	1997-01-03	0	NaN
2	66.839973	station1	10.5	172.500000	0.0	1997-01-03	1997-01-04	0	NaN
3	75.973015	station1	10.5	172.500000	0.0	1997-01-04	1997-01-05	0	NaN
4	54.252964	station1	10.5	172.500000	0.0	1997-01-05	1997-01-06	0	NaN
...	...	...	...	...	...	...	...	...	...
99	85.183685	station2	45.5	-103.199997	0.0	1997-02-17	1997-02-18	0	NaN
100	93.348305	station2	45.5	-103.199997	0.0	1997-02-18	1997-02-19	0	NaN
101	97.579193	station2	45.5	-103.199997	0.0	1997-02-19	1997-02-20	0	NaN
102	19.217777	station2	45.5	-103.199997	0.0	1997-02-20	1997-02-21	0	NaN
103	11.676097	station2	45.5	-103.199997	0.0	1997-02-21	1997-02-22	0	NaN

104 rows × 9 columns