_images/calypso.png

Input

Functions to read time series from file into totoframe.

The input functions allow abstracting away the format the data are stored on disk and loading them into a standard Panda DataFrame object. The methods adds attribute to the dataframe such as unit, latitude,longitude.

Reading functions are defined in modules within toto.input subpackage. The functions can be accessed as:

from toto.inputs.nc import NCfile
dset = NCfile('myfile.nc')_toDataFrame()

The following convention is expected for defining reading functions:

  • Funcions for different file types are defined in different modules within toto.input subpackage.

  • Modules are named as filetype.py, e.g., nc.py.

  • Classes are named as filetype`file, e.g., ``NCfile`.

  • Each class must have a _toDataFrame() function

The following input functions are currently available:

Generic NetCDF:

Read generic netcdf file This import function works well is NetCDF or Zarr files created by XARRAY. This class returns a Panda Dataframe with some extra attributes such as Latitude,Longitude,Units.

Parameters

filename(files,) str or list_like

A list of filename to process.

Examples

>>> from toto.inputs.nc import NCfile
>>> nc=NCfile('filename.nc')._toDataFrame()

MSL NetCDF:

Read MSL netcdf file This import function works with NetCDF files created by MetOcean Solution Ltd. This NetCDF file have been extracted by the UDS. This class returns a Panda Dataframe with some extra attributes such as Latitude,Longitude,Units.

Parameters

filename(files,) str or list_like

A list of filename to process.

Examples

>>> from toto.inputs.msl import MSLfile
>>> nc=MSLfile('filename.nc')._toDataFrame()

LINZ NetCDF:

Read LINZ netcdf file This import function works with NetCDF files created from tidal gauge from LINZ. It reads both sensors as welll as the README file which should be in the same directory. This class returns a Panda Dataframe with some extra attributes such as Latitude,Longitude,Units.

Parameters

filename(files,) str or list_like

A list of filename to process. This can be either a NetCDF file made by linz.downdload or a csv file directly downloaded from Linz website

Examples

>>> from toto.inputs.linz import LINZfile
>>> nc=LINZfile('filename.nc')._toDataFrame()

MOET NetCDF:

Read MOET netcdf file This import function works with NetCDF files created by MetOcean Solution Ltd. This NetCDF file have a special format to be read by the MOET software. This class returns a Panda Dataframe with some extra attributes such as Latitude,Longitude,Units.

Parameters

filename(files,) str or list_like

A list of filename to process.

Examples

>>> from toto.inputs.moet import MOETfile
>>> nc=MOETfile('filename.nc')._toDataFrame()

MATLAB

Read MATLAB file This import mat file. This class returns a Panda Dataframe with some extra attributes such as Latitude,Longitude,Units.

Parameters

filename(files,) str or list_like

A list of filename to process.

Notes

The file MUST contain a variable called time, t or timestamp with matlab datenum time steps

Examples

>>> from toto.inputs.mat import MATfile
>>> nc=MATfile('filename.mat')._toDataFrame()

TRYAXIS

Read TRYAXIS file This import raw file for a TRYAXIS wave Buoy. This class returns a Panda Dataframe with some extra attributes such as Latitude,Longitude,Units.

Parameters

filename(files,) str or list_like

A list of filename to process.

Notes

The function only works with the NONDIRSPEC and DIRSPEC files

Examples

>>> from toto.inputs.tryaxis import TRYAXISfile
>>> nc=TRYAXISfile('filename.NONDIRSPEC')._toDataFrame()

TEXT

Read txt,csv file This import text file. The function uses the read_csv function from panda <https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html>_. This class returns a Panda Dataframe with some extra attributes such as Latitude,Longitude,Units.

Parameters

filename(files,) str or list_like

A list of filename to process.

sepstr, default {_default_sep}

Delimiter to use. If sep is None, the C engine cannot automatically detect the separator, but the Python parsing engine can, meaning the latter will be used and automatically detect the separator by Python’s builtin sniffer tool, csv.Sniffer. In addition, separators longer than 1 character and different from '\s+' will be interpreted as regular expressions and will also force the use of the Python parsing engine. Note that regex delimiters are prone to ignoring quoted data. Regex example: ' '.

skiprowslist-like, int or callable, optional

Line numbers to skip (0-indexed) or number of lines to skip (int) at the start of the file. If callable, the callable function will be evaluated against the row indices, returning True if the row should be skipped and False otherwise. An example of a valid callable argument would be lambda x: x in [0, 2].

skipfooterint, default 0

Number of lines at bottom of file to skip (Unsupported with engine=’c’).

miss_valscalar, str, list-like, or dict, optional

Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values.

colNamesLineint, default 1

Line number where the header are defined

unitNamesLineint, default 1

Line number where the units are defined

single_columnbool, default False

The time is represented in a single column

customUnitstr, default ‘%d-%m-%Y %H:%M:%S’

String reprensenting the time format

unitstr default ‘s’, can be ‘auto’,’custom’,’matlab’ or ‘s’ and ‘D’

unit of the single column time. Only matter if single_column is True

time_col_name: dict, default {‘Year’:’year’,’Month’:’month’,’Day’:’day’,’Hour’:’hour’,’Min’:’Minute’,’Sec’:’Second’}

Dictonary for renaming the each column, so Panda can interprate the time. Only matter if single_column is False

colNamesList, default []

List of column names to use.

unitNamesList, default []

List of unit to use.

Notes

Whe openning the TOTOVIEW gui this function will be called with totoview.inputs.txtGUI

Examples

>>> from toto.inputs.txt import TXTfile 
>>> tx=TXTfile([filename],colNamesLine=1,miss_val='NaN',    sep=',',skiprows=1,unit='custom',time_col_name='time',unitNamesLine=0,    single_column=True,customUnit='%d/%m/%Y %H:%M')
>>> tx.reads()
>>> tx.read_time()
>>> df=tx._toDataFrame()

CONSTITUENTS FILE

Read constituens file This import file containing amplitude and phase for each tidal constituents. The function uses the read_csv function from panda <https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html>_ to read three columns:

  • Constituents name

  • Constituents phase

  • Constituents amplitudes

This class returns a Panda Dataframe with some extra attributes such as Latitude,Longitude,Units.

This uses the module Utide. <https://github.com/wesleybowman/UTide>_

Parameters

filename(files,) str or list_like

A list of filename to process.

sepstr, default {_default_sep}

Delimiter to use. If sep is None, the C engine cannot automatically detect the separator, but the Python parsing engine can, meaning the latter will be used and automatically detect the separator by Python’s builtin sniffer tool, csv.Sniffer. In addition, separators longer than 1 character and different from '\s+' will be interpreted as regular expressions and will also force the use of the Python parsing engine. Note that regex delimiters are prone to ignoring quoted data. Regex example: ' '.

skiprowslist-like, int or callable, optional

Line numbers to skip (0-indexed) or number of lines to skip (int) at the start of the file. If callable, the callable function will be evaluated against the row indices, returning True if the row should be skipped and False otherwise. An example of a valid callable argument would be lambda x: x in [0, 2].

skipfooterint, default 0

Number of lines at bottom of file to skip (Unsupported with engine=’c’).

colNamesLineint, default 1

Line number where the header are defined

unitstr default ‘degrees’, can be ‘radians’

unit of the phases

min_datedatetime, default datetime.datetime(2020,1,1)

Start time of the timeseries

max_datedatetime, default datetime.datetime(2020,1,1)

End time of the timeseries

dtint, default 3600

Time step in seconds to use when creating the timeserie

latitudeint, default -40

Latitude use to calculate the timeserie

Notes

Whe openning the TOTOVIEW gui this function will be called with totoview.inputs.consGUI

Examples

>>> from toto.inputs.cons import CONSfile
>>> nc=CONSfile(['cons_list.csv'],sep=',',
                           colNames=[],
                           unit='degrees',
                           miss_val='NaN',
                           colNamesLine=1,
                           skiprows=1,
                           skipfooter=0,
                           col_name={'cons':'Cons','amp':'Amplitude','pha':'Phase'},                               )
>>> nc.reads()
>>> nc.read_cons() 
>>> df=nc._toDataFrame()

EXCEL FILE

Read xls,xlsx file This import Excel type file. The function uses the read_excel function from panda <https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html>_. This class returns a Panda Dataframe with some extra attributes such as Latitude,Longitude,Units.

Parameters

filename(files,) str or list_like

A list of filename to process.

sheet_namestr, int, list, or None, default 0

Strings are used for sheet names. Integers are used in zero-indexed sheet positions. Lists of strings/integers are used to request multiple sheets. Specify None to get all sheets.

Available cases:

  • Defaults to 0: 1st sheet as a DataFrame

  • 1: 2nd sheet as a DataFrame

  • "Sheet1": Load sheet with name “Sheet1”

  • [0, 1, "Sheet5"]: Load first, second and sheet named “Sheet5” as a dict of DataFrame

  • None: All sheets.

colNamesList, default []

List of column names to use.

unitNamesList, default []

List of unit to use.

miss_valscalar, str, list-like, or dict, optional

Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values.

colNamesLineint, default 1

Line number where the header are defined

skiprowslist-like, int or callable, optional

Line numbers to skip (0-indexed) or number of lines to skip (int) at the start of the file. If callable, the callable function will be evaluated against the row indices, returning True if the row should be skipped and False otherwise. An example of a valid callable argument would be lambda x: x in [0, 2].

skipfooterint, default 0

Number of lines at bottom of file to skip (Unsupported with engine=’c’).

unitNamesLineint, default 1

Line number where the units are defined

single_columnbool, default False

The time is represented in a single column

customUnitstr, default ‘%d-%m-%Y %H:%M:%S’

String reprensenting the time format

unitstr default ‘s’, can be ‘auto’,’custom’,’matlab’ or ‘s’ and ‘D’

unit of the single column time. Only matter if single_column is True

time_col_name: dict, default {‘Year’:’year’,’Month’:’month’,’Day’:’day’,’Hour’:’hour’,’Min’:’Minute’,’Sec’:’Second’}

Dictonary for renaming the each column, so Panda can interprate the time. Only matter if single_column is False

Examples

>>> from toto.inputs.xls import XLSfile
>>> tx=XLSfile([filename],sheetnames='test3', colNames= [], unitNames= [],miss_val='NaN', colNamesLine= 1, skiprows= 2, unitNamesLine= 0,    skipfooter= 0, single_column= True, unit= 's',    customUnit= '%d-%m-%Y %H:%M:%S', time_col_name= {})
>>> tx.reads()
>>> tx.read_time()
>>> df=tx._toDataFrame()

RSK FILE

Read RSK file from RBR Ltd This import raw file for a RBR pressure sensor. This class returns a Panda Dataframe.

Parameters

filename(files,) str or list_like

A list of filename to process.

Examples

>>> from toto.inputs.rsk import RSKfile
>>> nc=RSKfile('filename.rsk')._toDataFrame()