API Documentation

pangaeapy is a package allowing to download and analyse metadata as well as data from tabular PANGAEA (https://www.pangaea.de) datasets.

class pangaeapy.PanDataSet(id=None, paramlist=None, deleteFlag='', enable_cache=False, cachedir=None, include_data=True, expand_terms=[], auth_token=None, cache_expiry_days=1)[source]

PANGAEA DataSet The PANGAEA PanDataSet class enables the creation of objects which hold the necessary information, including data as well as metadata, to analyse a given PANGAEA dataset.

Parameters:
  • id (str) – The identifier of a PANGAEA dataset. An integer number or a DOI is accepted here

  • deleteFlag (str) – in case quality flags are avialable, this parameter defines a flag for which data should not be included in the data dataFrame. Possible values are listed here: https://wiki.pangaea.de/wiki/Quality_flag

  • enable_cache (boolean) – If set to True, PanDataSet objects are cached as pickle files either on the local home directory within a directory called ‘.pangaeapy_cache’ or in cachedir given by the user in order to avoid unnecessary downloads.

  • include_data (boolean) – determines if data table is downloaded and added to the self.data dataframe. If you are interested in metadata only set this to False

  • expand_terms (list or int) – indicates if found ontology terms for parameters shall be expanded for the given list of terminology ids, i.p. add their hierarchy terms / classification

id

The identifier of a PANGAEA dataset. An integer number or a DOI is accepted here

Type:

int

uri

The PANGAEA DOI (alternative label)

Type:

str

doi

The PANGAEA DOI

Type:

str

title

The title of the dataset

Type:

str

abstract

the abstract or summary of the dataset

Type:

str

year

The publication year of the dataset

Type:

int

authors

a list containing the PanAuthot objects (author info) of the dataset

Type:

list of PanAuthor

citation

the full citation of the dataset including e.g. author, year, title etc..

Type:

str

params

a list of all PanParam objects (the parameters) used in this dataset

Type:

list of PanParam

parameters

synonym for self.params

Type:

list of PanParam

events

a list of all PanEvent objects (the events) used in this dataset

Type:

list of PanEvent

projects

a list containing the PanProjects objects referenced by this dataset

Type:

list of PanProject

mintimeextent

a string containing the min time of data set extent

Type:

str

maxtimeextent

a string containing the max time of data set extent

Type:

str

data

a pandas dataframe holding all the data

Type:

pandas.DataFrame

loginstatus

a label which indicates if the data set is protected or not default value: ‘unrestricted’

Type:

str

isCollection

indicates if this dataset is a collection data set within a collection of child data sets

Type:

boolean

collection_members

a list of DOIs of all child data sets in case the data set is a collection data set

Type:

list

moratorium

a label which provides the date until the dataset is under moratorium

Type:

str

datastatus

a label which provides the detail about the status of the dataset whether it is published or in review or deleted

Type:

str

registrystatus

a string which indicates the registration status of a dataset

Type:

str

licence

a licence object, usually creative commons

Type:

PanLicence

auth_token

the PANGAEA auhentication token, you can find it at https://www.pangaea.de/user/

Type:

str

cache_expiry_days

the duration a cached pickle/cache is accepted, after this pangaeapy will load it again and ignor ethe cache

Type:

int

cachedir

the full path to the cache directory, will be created if it doesn’t exist

Type:

str

keywords

A list of keyword names. Only actual keywords, technical and auto-generated ones are ignored right now.

Type:

list[str]

check_pickle()[source]

Verifies if a cached pickle files needs to be refreshed (reloaded) Files are checked after 24 hrs earliest but only updated in case the metadata indicates changes occured

Parameters:
  • expirydays

  • bool (Returns)

  • -------

download(indices: list = None, columns: list[str] = None)[source]

Download binary data if available; otherwise, save dataframe as CSV.

Downloads can be very large. Consider explicitly defining the pangaeapy cache when calling PanDataSet.

Parameters:
  • indices (list) – Row indices of the data to download (e.g. [1, 2, 6]).

  • columns (list of strings) – Column names of the data to download (e.g. [“Binary”, “netCDF”]).

Return type:

List of downloaded or saved filenames

from_pickle()[source]

Reads a PanDataSet object from a pickle file

getEventsAsFrame()[source]

For more convenient handling of event info, this method returns a dataframe containing all events with their attributes as columns Please note that this version just takes campaign names, not other campaign attributes

getGeometry()[source]

Sometimes the topotype attribute has not been set correctly during the curation process. This method returns the real geometry (topographic type) of the dataset based on the x,y,z and t information of the data frame content. Still a bit experimental..

getParamDict()[source]

The method returns translates the parameter object list into a dictionary

info()[source]

The method returns a set of basic information about the PANGAEA dataset

setData(addEventColumns=True)[source]

This method populates the data DataFrame with data from a PANGAEA dataset. In addition to the data given in the tabular ASCII file delivered by PANGAEA.

Parameters:

addEventColumnsboolean

In case Latitude, Longitude, Elevation, Date/Time and Event are not given in the ASCII matrix, which sometimes is possible in single Event datasets, the setData could add these columns to the dataframe using the information given in the metadata for Event. Default is ‘True’

setID(id)[source]

Initialize the ID of a data set in case it was not defined in the constructur :param id: The identifier of a PANGAEA dataset. An integer number or a DOI is accepted here :type id: str

setMetadata()[source]

The method initializes the metadata of the PanDataSet object using the information of a PANGAEA metadata XML file.

to_dwca(save=True)[source]

This method creates a Darwin Core Archive file using PANGAEA metadata and data. A package will be saved as directory The method created directories are named as follows: [PANGAEA ID]_dwca

Parameters:

filelocationstr

Indicates the location (directory) where the DwC-A file will be saved

saveBoolean

If the file shall be saved on disk (filelocation or home directory/pan_export by default)

to_frictionless(filelocation=None, save=True)[source]

This method creates a frictionless data package (https://specs.frictionlessdata.io/data-package) file using PANGAEA metadata and data. A package will be saved as directory The method created directories are named as follows: [PANGAEA ID]_frictionless

Parameters:

filelocationstr

Indicates the location (directory) where the frictionless file will be saved

saveBoolean

If the file shall be saved on disk (filelocation or home directory/pan_export by default)

to_netcdf(filelocation=None, save=True, type='sdn')[source]

This method creates a NetCDF file using PANGAEA data. It offers two different flavors: SeaDataNet NetCDF and an experimental internal format using NetCDF 4 groups. Currently the method only supports simple types such as timeseries and profiles. The method created files are named as follows: [PANGAEA ID]_[type].cf

Parameters:

filelocationstr

Indicates the location (directory) where the NetCDF file will be saved

typestr

This parameter sets the NetCDF profile type. Allowed values are ‘sdn’ (SeaDataNet) and ‘pan’ (PANGAEA style)

saveBoolean

If the file shall be saved on disk (filelocation or home directory/pan_export by default)

to_pickle()[source]

Writes a PanDataSet object to a pickle file

class pangaeapy.PanQuery(query, bbox=None, limit=10, offset=0)[source]

Run and analyze results of PANGAEA search queries.

Parameters:
  • query (str) – The query string following the specs at www.pangaea.de.

  • bbox (tuple of floats, optional) – The bounding box to define geographical search constraints following the GeoJSON specs – (minlon, minlat, maxlon, maxlat).

  • limit (int, default 10) – The maximum number of results returned (cannot be higher than 500).

  • offset (int, default 0) – The offset of the search results.

totalcount

The number of total search results.

Type:

int

error

In case an error occurs this attribute holds the latest one.

Type:

str

query

The query provided by the user.

Type:

str

result

A list of retrieved search results.

Type:

list of dictionaries

get_dois()[source]

Get the list of DOIs contained in the search result.

Returns:

A list of DOIs.

Return type:

list of str