metaspace.sm_annotation_utils

class metaspace.sm_annotation_utils.SMInstance(host=None, verify_certificate=True, email=None, password=None, api_key=None, config_path=None)[source]

Client class for communication with the Metaspace API.

save_login(overwrite=False)[source]

Saves login credentials to the config file so that they will be automatically loaded in future uses of SMInstance()

property projects

Sub-object containing methods for interacting with projects.

Return type:

metaspace.projects_client.ProjectsClient

dataset(name=None, id=None)[source]

Retrieve a dataset by id (preferred) or name.

You can get a dataset’s ID by viewing its annotations online and looking at the URL, e.g. in this URL: metaspace2020.eu/annotations?ds=2016-09-22_11h16m17s the dataset ID is 2016-09-22_11h16m17s

Return type:

SMDataset

datasets(nameMask=None, idMask=None, *, submitter_id=None, group_id=None, project_id=None, polarity=None, ionisation_source=None, analyzer_type=None, maldi_matrix=None, organism=None, **kwargs)[source]

Search for datasets that match the given criteria. If no criteria are given, it will return all accessible datasets on METASPACE.

Parameters:
  • nameMask (Optional[str]) – Search string to be applied to the dataset name

  • idMask (Union[str, List[str], None]) – Dataset ID or list of IDs

  • submitter_id (Optional[str]) – User ID of the submitter

  • group_id (Optional[str]) –

  • project_id (Optional[str]) –

  • polarity (Optional[Literal[‘Positive’, ‘Negative’]]) – ‘Positive’ or ‘Negative’

  • ionisation_source (Optional[str]) –

  • analyzer_type (Optional[str]) –

  • maldi_matrix (Optional[str]) –

  • organism (Optional[str]) –

Return type:

List[SMDataset]

Returns:

submit_dataset(imzml_fn, ibd_fn, name, metadata, is_public, databases=[('HMDB', 'v4')], *, project_ids=None, adducts=None, neutral_losses=None, chem_mods=None, ppm=None, num_isotopic_peaks=None, decoy_sample_size=None, analysis_version=None, input_path=None, description=None, perform_enrichment=False)[source]

Submit a dataset for processing in METASPACE.

Parameters:
  • imzml_fn (Optional[str]) – Path to the imzML file to upload

  • ibd_fn (Optional[str]) – Path to the ibd file to upload

  • name (str) – New dataset name

  • metadata (Union[str, dict]) – A JSON string or Python dict containing metadata. This must exactly follow the expected format - see the submit dataset example notebook.

  • is_public (bool) – If True, the dataset will be publicly visible. If False, it will only be visible to yourself, other members of your Group, METASPACE administrators, and members of any Projects you add it to

  • databases (List[Union[int, str, Tuple[str, str]]]) – List of databases to process with, either as IDs or (name, version) tuples, e.g. [22, (‘LipidMaps’, ‘2017-12-12’)]

  • project_ids (Optional[List[str]]) – A list of project IDs to add this dataset to.

  • adducts (Optional[List[str]]) – List of adducts. e.g. [‘-H’, ‘+Cl’] Normal adducts should be plus or minus followed by an element. For radical ions/cations, use the special strings ‘[M]+’ or ‘[M]-‘.

  • neutral_losses (Optional[List[str]]) – List of neutral losses, e.g. [‘-H2O’, ‘-CO2’]

  • chem_mods (Optional[List[str]]) –

  • ppm (Optional[float]) – m/z tolerance (in ppm) for generating ion images (default 3.0)

  • num_isotopic_peaks (Optional[int]) – Number of isotopic peaks to search for (default 4)

  • decoy_sample_size (Optional[int]) – Number of implausible adducts to use for generating the decoy search database (default 20)

  • analysis_version (Optional[int]) –

  • input_path (Optional[str]) – To clone an existing dataset, specify input_path using the value of the existing dataset’s “s3dir”. When input_path is suppled, imzml_fn and ibd_fn can be set to None.

  • description (Optional[str]) – Optional text to describe the dataset

  • perform_enrichment (Optional[bool]) – Optional enable LION for dataset.

Return type:

str

Returns:

The newly created dataset ID

update_dataset_dbs(dataset_id, molDBs=None, adducts=None)[source]
reprocess_dataset(dataset_id, force=False)[source]
update_dataset(id, *, name=None, metadata=None, databases=None, adducts=None, neutral_losses=None, chem_mods=None, is_public=None, ppm=None, num_isotopic_peaks=None, decoy_sample_size=None, analysis_version=None, reprocess=None, force=False, perform_enrichment=False)[source]

Updates a dataset’s metadata and/or processing settings. Only specify the fields that should change. All arguments should be specified as keyword arguments, e.g. to update a dataset’s adducts:

>>> sm.update_dataset(
>>>     id='2018-11-07_14h15m28s',
>>>     adducts=['[M]+', '+H', '+K', '+Na'],
>>> )
Parameters:
  • id (str) – (Required) ID of an existing dataset

  • name (Optional[str]) – New dataset name

  • metadata (Optional[Any]) – A JSON string or Python dict containing updated metadata

  • databases – List of databases to process with, either as IDs or (name, version) tuples, e.g. [22, (‘LipidMaps’, ‘2017-12-12’)]

  • adducts (Optional[List[str]]) – List of adducts. e.g. [‘-H’, ‘+Cl’] Normal adducts should be plus or minus followed by an element. For radical ions/cations, use the special strings ‘[M]+’ or ‘[M]-‘.

  • neutral_losses (Optional[List[str]]) – List of neutral losses, e.g. [‘-H2O’, ‘-CO2’]

  • chem_mods (Optional[List[str]]) –

  • is_public (Optional[List[str]]) – If True, the dataset will be publicly visible. If False, it will only be visible to yourself, other members of your Group, METASPACE administrators, and members of any Projects you add it to

  • ppm (Optional[float]) – m/z tolerance (in ppm) for generating ion images (default 3.0)

  • num_isotopic_peaks (Optional[int]) – Number of isotopic peaks to search for (default 4)

  • decoy_sample_size (Optional[int]) – Number of implausible adducts to use for generating the decoy search database (default 20)

  • analysis_version (Optional[int]) –

  • reprocess (Optional[bool]) – None (default): Reprocess if needed True: Force reprocessing, even if not needed False: Raise an error if the changes would require reprocessing

  • force (bool) – True: Allow changes to datasets that are already being processed. This should be used with caution, as it can cause errors or inconsistent results.

  • perform_enrichment (Optional[bool]) – Optional enable LION for dataset.

delete_dataset(ds_id, **kwargs)[source]
database(name=None, version=None, id=None)[source]

Fetch molecular database by id.

Return type:

Optional[MolecularDB]

databases()[source]
Return type:

List[MolecularDB]

create_database(local_path, name, version, is_public=False)[source]
Return type:

dict

update_database(id, is_public=None, archived=None)[source]
Return type:

dict

delete_database(id)[source]
Return type:

bool

current_user_id()[source]

Note that the current user must be the submitter of the dataset being edited.

Parameters:
  • dataset_id (str) –

  • provider (str) – Must be a known 3rd party link provider name. Contact us if you’re interested in integrating with METASPACE.

  • link (str) –

  • replace_existing – pass True to overwrite existing links from the same provider

Returns:

The updated list of external links

upload_raw_opt_image_to_s3(local_path, dataset_id)[source]

Upload optical raw image local file to s3 bucket

>>> sm.upload_opt_file_to_s3(
>>>     local_path='/tmp/image.png',
>>>     dataset_id='2018-11-07_14h15m28s',
>>> )
Parameters:
  • local_path (Union[str, Path]) –

  • dataset_id (str) –

Return type:

str

Returns:

Returns file s3 key

upload_optical_image(local_path, dataset_id, transformation_matrix=array([[1, 0, 0], [0, 1, 0], [0, 0, 1]]))[source]

Upload optical image from local file

>>> sm.upload_optical_image(
>>>     local_path='/tmp/image.png',
>>>     dataset_id='2018-11-07_14h15m28s',
>>>     transformation_matrix=np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1], ]),
>>> )
Parameters:
  • local_path (Union[str, Path]) –

  • dataset_id (str) –

  • transformation_matrix (ndarray) –

Return type:

str

Returns:

Returns file s3 key

get_optical_image_transform(dataset_id)[source]

Get optical image transform matrix

matrix3d(${t[0][0]}, ${t[1][0]}, 0, ${t[2][0]},
${t[0][1]}, ${t[1][1]}, 0, ${t[2][1]},

0, 0, 1, 0,

${t[0][2]}, ${t[1][2]}, 0, ${t[2][2]})

>>> sm.get_optical_image_transform(
>>>     dataset_id='2018-11-07_14h15m28s',
>>> )
Parameters:

dataset_id (str) –

Returns:

Returns matrix 3d values

get_optical_image_path(dataset_id)[source]

Get optical image file path

>>> sm.get_optical_image_path(
>>>     dataset_id='2018-11-07_14h15m28s',
>>> )
Parameters:

dataset_id (str) –

Returns:

Returns src path

copy_optical_image(origin_dataset_id, destiny_dataset_id)[source]

Copies an optical image from a dataset to another

>>> sm.copy_optical_image(
>>>     origin_dataset_id='2018-11-07_14h15m28s',
>>>     destiny_dataset_id='2018-11-07_14h15m30s',
>>> )
Parameters:
  • origin_dataset_id (str) – The dataset ID of the origin dataset with the optical image to be copied

  • destiny_dataset_id (str) – The dataset ID from the dataset to where the optical image will be copied to

Returns:

The updated list of external links

Note that the current user must be the submitter of the dataset being edited.

Parameters:
  • dataset_id (str) –

  • provider (str) –

  • link (Optional[str]) – If None, all links from the provider will be removed

Returns:

The updated list of external links

class metaspace.sm_annotation_utils.SMDataset(_info, gqclient)[source]
property id
property name
property s3dir

The location of the uploaded imzML file. Not publicly accessible, but this can be used in the input_path parameter to SMInstance.submit_dataset to clone a dataset.

annotations(fdr=0.1, database=('HMDB', 'v4'), return_vals=('sumFormula', 'adduct'), **annotation_filter)[source]

Fetch dataset annotations.

Parameters:
  • fdr (float) – Max FDR level.

  • database (Union[int, str, Tuple[str, str]]) – Database name or id.

  • return_vals (Iterable) – Tuple of fields to return.

Return type:

List[list]

Returns:

List of annotations with requested fields.

results(database=('HMDB', 'v4'), fdr=None, coloc_with=None, include_chem_mods=False, include_neutral_losses=False, **annotation_filter)[source]

Fetch all dataset annotations as dataframe.

Parameters:
  • database (Union[int, str, Tuple[str, str]]) – Molecular database name or id.

  • fdr (Optional[float]) – Max FDR level.

  • coloc_with (Optional[str]) – Fetch only results colocalized with formula.

  • include_chem_mods (bool) – Include results with chemical modifications.

  • include_neutral_losses (bool) – Include results with neutral losses.

Return type:

DataFrame

Returns:

List of annotations with requested fields.

property metadata
Return type:

Metadata

property config
Return type:

DSConfig

property adducts
Return type:

List[str]

property polarity
Return type:

Literal[‘Positive’, ‘Negative’]

property database_details

A list of all databases that have been used to annotate this dataset

Return type:

List[MolecularDB]

property status

‘QUEUED’, ‘ANNOTATING’, ‘FINISHED’, or ‘FAILED’

property submitter

Details about the submitter of the dataset

Return type:

DatasetUser

property group

The group (lab/institute/team/etc.) that this dataset belongs to

Return type:

Optional[DatasetGroup]

property projects

The list of projects that include this project

Return type:

List[DatasetProject]

property principal_investigator

This field is usually only used for attributing the submitter’s PI when the submitter is not associated with any group

Return type:

Optional[str]

property image_size

Image size in pixels along the X, Y axes if this data exists otherwise an empty dict

isotope_images(sf, adduct, only_first_isotope=False, scale_intensity=True, hotspot_clipping=False, neutral_loss='', chem_mod='', image_metadata=[])[source]

Retrieve ion images for a specific sf and adduct.

Parameters:
  • sf (str) –

  • adduct (str) –

  • only_first_isotope (bool) – Only retrieve the first (most abundant) isotopic ion image. Typically this is all you need for data analysis, as the less abundant isotopes are usually lower quality copies of the first isotopic ion image.

  • scale_intensity (bool) – When True, the output values will be scaled to the intensity range of the original data. When False, the output values will be in the 0.0 to 1.0 range. When ‘TIC’, the output values will be scaled by the TIC and will be in the 0.0 to 1.0 range.

  • hotspot_clipping (bool) – When True, apply hotspot clipping. Recommended if the images will be used for visualisation. This is required to get ion images that match the METASPACE website

  • neutral_loss (str) –

  • chem_mod (str) –

  • image_metadata (list) –

Return IsotopeImages:

all_annotation_images(fdr=0.1, database=('HMDB', 'v4'), only_first_isotope=False, scale_intensity=True, hotspot_clipping=False, **annotation_filter)[source]

Retrieve all ion images for the dataset and given annotation filters.

Parameters:
  • fdr (float) – Maximum FDR level of annotations.

  • database (Union[int, str, Tuple[str, str]]) – Molecular database name or id.

  • only_first_isotope (bool) – Only retrieve the first (most abundant) isotopic ion image for each annotation. Typically this is all you need for data analysis, as the less abundant isotopes are usually lower quality copies of the first isotopic ion image.

  • scale_intensity (Union[bool, str, ndarray]) – When True, the output values will be scaled to the intensity range of the original data. When False, the output values will be in the 0.0 to 1.0 range. When ‘TIC’, the output values will be scaled by the TIC and will be in the 0.0 to 1.0 range.

  • hotspot_clipping (bool) – When True, apply hotspot clipping. Recommended if the images will be used for visualisation. This is required to get ion images that match the METASPACE website

  • annotation_filter – Additional filters passed to SMDataset.annotations.

Return type:

List[IsotopeImages]

Returns:

list of isotope images

optical_images()[source]

Returns a data structure containing links to download the dataset’s input files

Return type:

Optional[DatasetDownload]

download_to_dir(path, base_name=None)[source]

Downloads the dataset’s input files to the specified directory.

Parameters:
  • path – Destination directory

  • base_name – If specified, overrides the base name (excluding extension) of each file. e.g. base_name=’foo’ will name the files as ‘foo.imzML’ and ‘foo.ibd’

Returns:

diagnostics(include_images=True)[source]

Retrieves all diagnostic information and additional metadata for the dataset.

Parameters:

include_images – (default True) whether to download and include images in the results

Return type:

List[DatasetDiagnostic]

diagnostic(type, database=None, include_images=True)[source]

Retrieves a specific item from the dataset’s diagnostic information / additional metadata or raises an exception if it wasn’t found :type type: str :param type: The type of diagnostic/metadata. Valid values: type=’TIC’

data contains information about the Total Ion Current across the dataset images contains an image with the TIC for each spectrum

type=’IMZML_METADATA’

data contains a summary of metadata from the ImzML file header images contains a boolean image of which pixels had spectra in the input data. Useful for non-square acquisition areas.

Parameters:
  • database – The ID or (name, version) of the database. Needed for database-specific metadata types (currently not used)

  • include_images – (default True) whether to download and include images in the results

Return type:

DatasetDiagnostic

tic_image()[source]

Returns a numpy array with the TIC value for each spectrum

Return type:

ndarray

class metaspace.sm_annotation_utils.MolecularDB(info)[source]
property id
Return type:

int

property name
Return type:

str

property version
Return type:

str

property is_public
Return type:

bool

property archived
Return type:

bool

class metaspace.sm_annotation_utils.IsotopeImages(images, sf, chem_mod, neutral_loss, adduct, centroids, urls)[source]
peak(index)[source]
plot(n_images=- 1)[source]
class metaspace.sm_annotation_utils.OpticalImage(image, registered_image)[source]
to_ion_image(index, ion_image_shape)[source]
ion_image_to_optical(ion_image, index=0)[source]
class metaspace.sm_annotation_utils.GraphQLClient(config)[source]

Client for low-level access to the METASPACE API, for advanced operations that aren’t supported by metaspace.sm_annotation_utils.SMInstance.

Use query for calling GraphQL directly. An editor for composing GraphQL API queries can be found at https://metaspace2020.eu/graphql

query(query, variables={})[source]
get_jwt()[source]
get_primary_group_id()[source]
iterQuery(query, variables={}, batch_size=50000)[source]

Assumes query has $offset and $limit parameters, and yields query results with these set to (k*batch_size, batch_size)

listQuery(field_name, query, variables={}, batch_size=50000, limit=None)[source]

Gets all results of an iterQuery as a list. Field name must be provided in addition to the query (e.g. ‘allDatasets’)

MOLECULAR_DB_FIELDS = 'id name version isPublic archived default'
getDataset(datasetId)[source]
getDatasetByName(datasetName)[source]
getAnnotations(annotationFilter=None, datasetFilter=None, colocFilter=None, limit=None)[source]
countAnnotations(annotationFilter=None, datasetFilter=None)[source]
getDatasets(datasetFilter=None)[source]
getRawOpticalImage(dsid)[source]
getRegisteredImage(dsid, zoom_level=8)[source]
get_visible_databases()[source]
static map_database_name_to_name_version(name)[source]
Return type:

Tuple[str, str]

map_database_to_id(database)[source]
create_dataset(input_params, perform_enrichment=False, ds_id=None)[source]
delete_dataset(ds_id, force=False)[source]
update_dataset(ds_id, input={}, reprocess=False, force=False, perform_enrichment=False, priority=1)[source]
create_database(local_path, name, version, is_public=False)[source]
Return type:

dict

update_database(id, is_public=None, archived=None)[source]
Return type:

dict

delete_database(id)[source]
Return type:

bool

get_dataset_diagnostics(ds_id)[source]
exception metaspace.sm_annotation_utils.MetaspaceException[source]
exception metaspace.sm_annotation_utils.DatasetNotFound[source]
exception metaspace.sm_annotation_utils.GraphQLException(json, message, type=None)[source]
exception metaspace.sm_annotation_utils.BadRequestException(json, message, type=None)[source]
exception metaspace.sm_annotation_utils.InvalidResponseException(json, http_response)[source]