Pyaro basic example

  • Install pyaro and check if installation is new enough:

[18]:
import pyaro
pyaro.__version__
[18]:
'0.0.5'
  • Check a list of installed engines. The most basic installation will install only the csv_timeseries engine. Install e.g. https://github.com/metno/pyaro-readers for many more engines.

[19]:
pyaro.list_timeseries_engines()
[19]:
{'csv_timeseries': <pyaro.csvreader.CSVTimeseriesReader.CSVTimeseriesEngine at 0x7ff77705f250>}
  • Learn a bit about the engine.

[20]:
pr_csv = pyaro.list_timeseries_engines()['csv_timeseries']
help(pr_csv)
Help on CSVTimeseriesEngine in module pyaro.csvreader.CSVTimeseriesReader object:

class CSVTimeseriesEngine(pyaro.timeseries.AutoFilterReaderEngine.AutoFilterEngine)
 |  Method resolution order:
 |      CSVTimeseriesEngine
 |      pyaro.timeseries.AutoFilterReaderEngine.AutoFilterEngine
 |      pyaro.timeseries.Engine.Engine
 |      abc.ABC
 |      builtins.object
 |
 |  Methods defined here:
 |
 |  description(self)
 |      Get a descriptive string about this pyaro implementation.
 |
 |  open(self, filename, *args, **kwargs) -> pyaro.csvreader.CSVTimeseriesReader.CSVTimeseriesReader
 |      open-function of the timeseries, initializing the reader-object, i.e.
 |      equivalent to Reader(filename_or_object_or_url, *, filters)
 |
 |      :return pyaro.timeseries.Reader
 |      :raises UnknownFilterException
 |
 |  reader_class(self)
 |      return the class of the corresponding reader
 |
 |      :return: the class returned from open
 |
 |  url(self)
 |      Get a url about more information, docs of the datasource-engine.
 |
 |      This should be the github-url or similar of the implementation.
 |
 |  ----------------------------------------------------------------------
 |  Data and other attributes defined here:
 |
 |  __abstractmethods__ = frozenset()
 |
 |  ----------------------------------------------------------------------
 |  Methods inherited from pyaro.timeseries.AutoFilterReaderEngine.AutoFilterEngine:
 |
 |  args(self)
 |      return a tuple of parameters to be passed to open_timeseries, including
 |      the mandatory filename_or_obj_or_url parameter.
 |
 |  supported_filters(self) -> [<class 'pyaro.timeseries.Filter.Filter'>]
 |      The supported filters by this Engine. Maps to the Readers supported_filters.
 |
 |      :return: a list of filters
 |
 |  ----------------------------------------------------------------------
 |  Data descriptors inherited from pyaro.timeseries.Engine.Engine:
 |
 |  __dict__
 |      dictionary for instance variables (if defined)
 |
 |  __weakref__
 |      list of weak references to the object (if defined)

  • Check the description and the open-arguments to open a database with this engine:

[21]:
print(pr_csv.description())
print(pr_csv.args())
Simple reader of csv-files using python csv-reader
('filename', 'columns', 'variable_units', 'country_lookup', 'csvreader_kwargs', 'filters')

Opening a datasource with an engine

Open now the timeseries ts with a table. You could do that with a with clause in larger code, but for simplicity, we don’t do that here. columns map the files columns to the data, starting with first column as 0, which contains the variable-name in our example file.

The test-file is read using the python csv module. csvreader_kwargs sets up that module, i.e. comma-separated setting the delimiter.

[22]:
file = "../../tests/testdata/csvReader_testdata.csv"
columns = {
            "variable": 0,
            "station": 1,
            "longitude": 2,
            "latitude": 3,
            "value": 4,
            "units": 5,
            "start_time": 6,
            "end_time": 7,
            "altitude": "0",
            "country": "NO",
            "standard_deviation": "NaN",
            "flag": "0",
        }
csvreader_kwargs = {"delimiter": ","}

ts = pyaro.open_timeseries('csv_timeseries',
                           filename=file,
                           columns=columns,
                           csvreader_kwargs=csvreader_kwargs,
                           filters=[])

ts is now the handle to the data-source.

  • Accessing metadata in the datasource, like available variables and stations

[23]:
print(ts.variables())
print(ts.stations())
dict_keys(['SOx', 'NOx'])
{'station1': <pyaro.timeseries.Station.Station object at 0x7ff776cc9d20>, 'station2': <pyaro.timeseries.Station.Station object at 0x7ff776cca6e0>}
  • The timeseries must be accessed per variable. It will be returned for all stations. The data-columns can be accessed by keys():

[24]:
var = 'SOx'
ts_data = ts.data(var)
print(ts_data.keys())
ts_data.stations
ts_data.start_times
ts_data.end_times
ts_data.latitudes
ts_data.longitudes
ts_data.altitudes
ts_data.flags
ts_data.values

('values', 'stations', 'latitudes', 'longitudes', 'altitudes', 'start_times', 'end_times', 'flags', 'standard_deviations')
[24]:
array([44.377964 , 73.23672  , 66.83997  , 75.973015 , 54.252964 ,
       95.51215  , 43.424374 , 14.8503275, 39.78734  , 84.14651  ,
        2.3796806, 56.030033 , 90.70785  , 53.49256  , 33.27008  ,
       19.200666 , 16.61291  , 95.239876 , 58.38857  , 25.010443 ,
       49.31731  , 95.74444  , 35.146294 , 31.468204 , 70.109985 ,
       46.82392  , 44.06993  , 15.679094 , 54.04226  , 42.6484   ,
       21.370073 , 37.34375  , 14.086469 , 31.23552  , 12.328813 ,
       85.39133  , 96.85262  , 68.06294  , 67.1648   , 27.18295  ,
       28.523333 ,  1.4397316, 74.56935  , 50.91362  , 34.764988 ,
        4.5323606, 29.767143 , 16.157143 , 61.595753 , 57.319874 ,
       63.740353 ,  4.939785 ,  5.5386314, 73.256615 , 18.165173 ,
       96.29508  , 20.86049  , 60.049885 , 36.644806 , 70.943375 ,
        9.295645 ,  1.7138128, 56.983192 , 89.55616  , 13.375153 ,
       49.939552 , 31.528936 , 78.00686  , 28.33076  , 16.8259   ,
       73.02892  , 96.075714 , 19.514969 , 68.14331  , 21.966438 ,
       62.26828  , 82.37647  , 26.558168 , 58.01865  , 56.723133 ,
       10.252709 ,  7.623141 , 33.05347  , 26.62592  , 41.58915  ,
       27.843248 , 85.996025 , 74.1133   , 42.667347 , 43.756298 ,
       10.930091 , 15.341663 , 44.52167  ,  3.720179 , 88.960014 ,
       61.212017 , 93.44711  , 19.978394 , 61.643723 , 85.183685 ,
       93.348305 , 97.57919  , 19.217777 , 11.676097 ], dtype=float32)

Conversion to pandas

For pandas users, the timeseries data can be converted to a dataframe:

[25]:
pyaro.timeseries_data_to_pd(ts_data)
[25]:
values stations latitudes longitudes altitudes start_times end_times flags standard_deviations
0 44.377964 station1 10.5 172.500000 0.0 1997-01-01 1997-01-02 0 NaN
1 73.236717 station1 10.5 172.500000 0.0 1997-01-02 1997-01-03 0 NaN
2 66.839973 station1 10.5 172.500000 0.0 1997-01-03 1997-01-04 0 NaN
3 75.973015 station1 10.5 172.500000 0.0 1997-01-04 1997-01-05 0 NaN
4 54.252964 station1 10.5 172.500000 0.0 1997-01-05 1997-01-06 0 NaN
... ... ... ... ... ... ... ... ... ...
99 85.183685 station2 45.5 -103.199997 0.0 1997-02-17 1997-02-18 0 NaN
100 93.348305 station2 45.5 -103.199997 0.0 1997-02-18 1997-02-19 0 NaN
101 97.579193 station2 45.5 -103.199997 0.0 1997-02-19 1997-02-20 0 NaN
102 19.217777 station2 45.5 -103.199997 0.0 1997-02-20 1997-02-21 0 NaN
103 11.676097 station2 45.5 -103.199997 0.0 1997-02-21 1997-02-22 0 NaN

104 rows × 9 columns