metaspace.sm_annotation_utils¶
-
class
metaspace.sm_annotation_utils.
SMInstance
(host=None, verify_certificate=True, email=None, password=None, api_key=None, config_path=None)[source]¶ Client class for communication with the Metaspace API.
-
save_login
(overwrite=False)[source]¶ Saves login credentials to the config file so that they will be automatically loaded in future uses of SMInstance()
-
property
projects
¶ Sub-object containing methods for interacting with projects.
- Return type:
-
dataset
(name=None, id=None)[source]¶ Retrieve a dataset by id (preferred) or name.
You can get a dataset’s ID by viewing its annotations online and looking at the URL, e.g. in this URL:
metaspace2020.eu/annotations?ds=2016-09-22_11h16m17s
the dataset ID is2016-09-22_11h16m17s
- Return type:
-
datasets
(nameMask=None, idMask=None, *, submitter_id=None, group_id=None, project_id=None, polarity=None, ionisation_source=None, analyzer_type=None, maldi_matrix=None, organism=None, **kwargs)[source]¶ Search for datasets that match the given criteria. If no criteria are given, it will return all accessible datasets on METASPACE.
- Parameters:
nameMask (
Optional
[str
]) – Search string to be applied to the dataset nameidMask (
Union
[str
,List
[str
],None
]) – Dataset ID or list of IDssubmitter_id (
Optional
[str
]) – User ID of the submittergroup_id (
Optional
[str
]) –project_id (
Optional
[str
]) –polarity (
Optional
[Literal
[‘Positive’, ‘Negative’]]) – ‘Positive’ or ‘Negative’ionisation_source (
Optional
[str
]) –analyzer_type (
Optional
[str
]) –maldi_matrix (
Optional
[str
]) –organism (
Optional
[str
]) –
- Return type:
List
[SMDataset
]- Returns:
-
submit_dataset
(imzml_fn, ibd_fn, name, metadata, is_public, databases=[('HMDB', 'v4')], *, project_ids=None, adducts=None, neutral_losses=None, chem_mods=None, ppm=None, num_isotopic_peaks=None, decoy_sample_size=None, analysis_version=None, input_path=None, description=None, perform_enrichment=False)[source]¶ Submit a dataset for processing in METASPACE.
- Parameters:
imzml_fn (
Optional
[str
]) – Path to the imzML file to uploadibd_fn (
Optional
[str
]) – Path to the ibd file to uploadname (
str
) – New dataset namemetadata (
Union
[str
,dict
]) – A JSON string or Python dict containing metadata. This must exactly follow the expected format - see the submit dataset example notebook.is_public (
bool
) – If True, the dataset will be publicly visible. If False, it will only be visible to yourself, other members of your Group, METASPACE administrators, and members of any Projects you add it todatabases (
List
[Union
[int
,str
,Tuple
[str
,str
]]]) – List of databases to process with, either as IDs or (name, version) tuples, e.g. [22, (‘LipidMaps’, ‘2017-12-12’)]project_ids (
Optional
[List
[str
]]) – A list of project IDs to add this dataset to.adducts (
Optional
[List
[str
]]) – List of adducts. e.g. [‘-H’, ‘+Cl’] Normal adducts should be plus or minus followed by an element. For radical ions/cations, use the special strings ‘[M]+’ or ‘[M]-‘.neutral_losses (
Optional
[List
[str
]]) – List of neutral losses, e.g. [‘-H2O’, ‘-CO2’]chem_mods (
Optional
[List
[str
]]) –ppm (
Optional
[float
]) – m/z tolerance (in ppm) for generating ion images (default 3.0)num_isotopic_peaks (
Optional
[int
]) – Number of isotopic peaks to search for (default 4)decoy_sample_size (
Optional
[int
]) – Number of implausible adducts to use for generating the decoy search database (default 20)analysis_version (
Optional
[int
]) –input_path (
Optional
[str
]) – To clone an existing dataset, specify input_path using the value of the existing dataset’s “s3dir”. When input_path is suppled, imzml_fn and ibd_fn can be set to None.description (
Optional
[str
]) – Optional text to describe the datasetperform_enrichment (
Optional
[bool
]) – Optional enable LION for dataset.
- Return type:
str
- Returns:
The newly created dataset ID
-
update_dataset
(id, *, name=None, metadata=None, databases=None, adducts=None, neutral_losses=None, chem_mods=None, is_public=None, ppm=None, num_isotopic_peaks=None, decoy_sample_size=None, analysis_version=None, reprocess=None, force=False, perform_enrichment=False)[source]¶ Updates a dataset’s metadata and/or processing settings. Only specify the fields that should change. All arguments should be specified as keyword arguments, e.g. to update a dataset’s adducts:
>>> sm.update_dataset( >>> id='2018-11-07_14h15m28s', >>> adducts=['[M]+', '+H', '+K', '+Na'], >>> )
- Parameters:
id (
str
) – (Required) ID of an existing datasetname (
Optional
[str
]) – New dataset namemetadata (
Optional
[Any
]) – A JSON string or Python dict containing updated metadatadatabases – List of databases to process with, either as IDs or (name, version) tuples, e.g. [22, (‘LipidMaps’, ‘2017-12-12’)]
adducts (
Optional
[List
[str
]]) – List of adducts. e.g. [‘-H’, ‘+Cl’] Normal adducts should be plus or minus followed by an element. For radical ions/cations, use the special strings ‘[M]+’ or ‘[M]-‘.neutral_losses (
Optional
[List
[str
]]) – List of neutral losses, e.g. [‘-H2O’, ‘-CO2’]chem_mods (
Optional
[List
[str
]]) –is_public (
Optional
[List
[str
]]) – If True, the dataset will be publicly visible. If False, it will only be visible to yourself, other members of your Group, METASPACE administrators, and members of any Projects you add it toppm (
Optional
[float
]) – m/z tolerance (in ppm) for generating ion images (default 3.0)num_isotopic_peaks (
Optional
[int
]) – Number of isotopic peaks to search for (default 4)decoy_sample_size (
Optional
[int
]) – Number of implausible adducts to use for generating the decoy search database (default 20)analysis_version (
Optional
[int
]) –reprocess (
Optional
[bool
]) – None (default): Reprocess if needed True: Force reprocessing, even if not needed False: Raise an error if the changes would require reprocessingforce (
bool
) – True: Allow changes to datasets that are already being processed. This should be used with caution, as it can cause errors or inconsistent results.perform_enrichment (
Optional
[bool
]) – Optional enable LION for dataset.
-
database
(name=None, version=None, id=None)[source]¶ Fetch molecular database by id.
- Return type:
Optional
[MolecularDB
]
-
databases
()[source]¶ - Return type:
List
[MolecularDB
]
-
add_dataset_external_link
(dataset_id, provider, link, replace_existing=False)[source]¶ Note that the current user must be the submitter of the dataset being edited.
- Parameters:
dataset_id (
str
) –provider (
str
) – Must be a known 3rd party link provider name. Contact us if you’re interested in integrating with METASPACE.link (
str
) –replace_existing – pass True to overwrite existing links from the same provider
- Returns:
The updated list of external links
-
upload_raw_opt_image_to_s3
(local_path, dataset_id)[source]¶ Upload optical raw image local file to s3 bucket
>>> sm.upload_opt_file_to_s3( >>> local_path='/tmp/image.png', >>> dataset_id='2018-11-07_14h15m28s', >>> )
- Parameters:
local_path (
Union
[str
,Path
]) –dataset_id (
str
) –
- Return type:
str
- Returns:
Returns file s3 key
-
upload_optical_image
(local_path, dataset_id, transformation_matrix=array([[1, 0, 0], [0, 1, 0], [0, 0, 1]]))[source]¶ Upload optical image from local file
>>> sm.upload_optical_image( >>> local_path='/tmp/image.png', >>> dataset_id='2018-11-07_14h15m28s', >>> transformation_matrix=np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1], ]), >>> )
- Parameters:
local_path (
Union
[str
,Path
]) –dataset_id (
str
) –transformation_matrix (
ndarray
) –
- Return type:
str
- Returns:
Returns file s3 key
-
get_optical_image_transform
(dataset_id)[source]¶ Get optical image transform matrix
- matrix3d(${t[0][0]}, ${t[1][0]}, 0, ${t[2][0]},
- ${t[0][1]}, ${t[1][1]}, 0, ${t[2][1]},
0, 0, 1, 0,
${t[0][2]}, ${t[1][2]}, 0, ${t[2][2]})
>>> sm.get_optical_image_transform( >>> dataset_id='2018-11-07_14h15m28s', >>> )
- Parameters:
dataset_id (
str
) –- Returns:
Returns matrix 3d values
-
get_optical_image_path
(dataset_id)[source]¶ Get optical image file path
>>> sm.get_optical_image_path( >>> dataset_id='2018-11-07_14h15m28s', >>> )
- Parameters:
dataset_id (
str
) –- Returns:
Returns src path
-
copy_optical_image
(origin_dataset_id, destiny_dataset_id)[source]¶ Copies an optical image from a dataset to another
>>> sm.copy_optical_image( >>> origin_dataset_id='2018-11-07_14h15m28s', >>> destiny_dataset_id='2018-11-07_14h15m30s', >>> )
- Parameters:
origin_dataset_id (
str
) – The dataset ID of the origin dataset with the optical image to be copieddestiny_dataset_id (
str
) – The dataset ID from the dataset to where the optical image will be copied to
- Returns:
The updated list of external links
-
remove_dataset_external_link
(dataset_id, provider, link=None)[source]¶ Note that the current user must be the submitter of the dataset being edited.
- Parameters:
dataset_id (
str
) –provider (
str
) –link (
Optional
[str
]) – If None, all links from the provider will be removed
- Returns:
The updated list of external links
-
-
class
metaspace.sm_annotation_utils.
SMDataset
(_info, gqclient)[source]¶ -
property
id
¶
-
property
name
¶
-
property
s3dir
¶ The location of the uploaded imzML file. Not publicly accessible, but this can be used in the input_path parameter to SMInstance.submit_dataset to clone a dataset.
-
annotations
(fdr=0.1, database=('HMDB', 'v4'), return_vals=('sumFormula', 'adduct'), **annotation_filter)[source]¶ Fetch dataset annotations.
- Parameters:
fdr (
float
) – Max FDR level.database (
Union
[int
,str
,Tuple
[str
,str
]]) – Database name or id.return_vals (
Iterable
) – Tuple of fields to return.
- Return type:
List
[list
]- Returns:
List of annotations with requested fields.
-
results
(database=('HMDB', 'v4'), fdr=None, coloc_with=None, include_chem_mods=False, include_neutral_losses=False, **annotation_filter)[source]¶ Fetch all dataset annotations as dataframe.
- Parameters:
database (
Union
[int
,str
,Tuple
[str
,str
]]) – Molecular database name or id.fdr (
Optional
[float
]) – Max FDR level.coloc_with (
Optional
[str
]) – Fetch only results colocalized with formula.include_chem_mods (
bool
) – Include results with chemical modifications.include_neutral_losses (
bool
) – Include results with neutral losses.
- Return type:
DataFrame
- Returns:
List of annotations with requested fields.
-
property
adducts
¶ - Return type:
List
[str
]
-
property
polarity
¶ - Return type:
Literal
[‘Positive’, ‘Negative’]
-
property
database_details
¶ A list of all databases that have been used to annotate this dataset
- Return type:
List
[MolecularDB
]
-
property
status
¶ ‘QUEUED’, ‘ANNOTATING’, ‘FINISHED’, or ‘FAILED’
-
property
submitter
¶ Details about the submitter of the dataset
- Return type:
-
property
group
¶ The group (lab/institute/team/etc.) that this dataset belongs to
- Return type:
Optional
[DatasetGroup
]
-
property
projects
¶ The list of projects that include this project
- Return type:
List
[DatasetProject
]
-
property
principal_investigator
¶ This field is usually only used for attributing the submitter’s PI when the submitter is not associated with any group
- Return type:
Optional
[str
]
-
property
image_size
¶ Image size in pixels along the X, Y axes if this data exists otherwise an empty dict
-
isotope_images
(sf, adduct, only_first_isotope=False, scale_intensity=True, hotspot_clipping=False, neutral_loss='', chem_mod='', image_metadata=[])[source]¶ Retrieve ion images for a specific sf and adduct.
- Parameters:
sf (str) –
adduct (str) –
only_first_isotope (bool) – Only retrieve the first (most abundant) isotopic ion image. Typically this is all you need for data analysis, as the less abundant isotopes are usually lower quality copies of the first isotopic ion image.
scale_intensity (bool) – When True, the output values will be scaled to the intensity range of the original data. When False, the output values will be in the 0.0 to 1.0 range. When ‘TIC’, the output values will be scaled by the TIC and will be in the 0.0 to 1.0 range.
hotspot_clipping (bool) – When True, apply hotspot clipping. Recommended if the images will be used for visualisation. This is required to get ion images that match the METASPACE website
neutral_loss (str) –
chem_mod (str) –
image_metadata (list) –
- Return IsotopeImages:
-
all_annotation_images
(fdr=0.1, database=('HMDB', 'v4'), only_first_isotope=False, scale_intensity=True, hotspot_clipping=False, **annotation_filter)[source]¶ Retrieve all ion images for the dataset and given annotation filters.
- Parameters:
fdr (
float
) – Maximum FDR level of annotations.database (
Union
[int
,str
,Tuple
[str
,str
]]) – Molecular database name or id.only_first_isotope (
bool
) – Only retrieve the first (most abundant) isotopic ion image for each annotation. Typically this is all you need for data analysis, as the less abundant isotopes are usually lower quality copies of the first isotopic ion image.scale_intensity (
Union
[bool
,str
,ndarray
]) – When True, the output values will be scaled to the intensity range of the original data. When False, the output values will be in the 0.0 to 1.0 range. When ‘TIC’, the output values will be scaled by the TIC and will be in the 0.0 to 1.0 range.hotspot_clipping (
bool
) – When True, apply hotspot clipping. Recommended if the images will be used for visualisation. This is required to get ion images that match the METASPACE websiteannotation_filter – Additional filters passed to SMDataset.annotations.
- Return type:
List
[IsotopeImages
]- Returns:
list of isotope images
-
download_links
()[source]¶ Returns a data structure containing links to download the dataset’s input files
- Return type:
Optional
[DatasetDownload
]
-
download_to_dir
(path, base_name=None)[source]¶ Downloads the dataset’s input files to the specified directory.
- Parameters:
path – Destination directory
base_name – If specified, overrides the base name (excluding extension) of each file. e.g. base_name=’foo’ will name the files as ‘foo.imzML’ and ‘foo.ibd’
- Returns:
-
diagnostics
(include_images=True)[source]¶ Retrieves all diagnostic information and additional metadata for the dataset.
- Parameters:
include_images – (default True) whether to download and include images in the results
- Return type:
List
[DatasetDiagnostic
]
-
diagnostic
(type, database=None, include_images=True)[source]¶ Retrieves a specific item from the dataset’s diagnostic information / additional metadata or raises an exception if it wasn’t found :type type:
str
:param type: The type of diagnostic/metadata. Valid values: type=’TIC’data contains information about the Total Ion Current across the dataset images contains an image with the TIC for each spectrum
- type=’IMZML_METADATA’
data contains a summary of metadata from the ImzML file header images contains a boolean image of which pixels had spectra in the input data. Useful for non-square acquisition areas.
- Parameters:
database – The ID or (name, version) of the database. Needed for database-specific metadata types (currently not used)
include_images – (default True) whether to download and include images in the results
- Return type:
-
property
-
class
metaspace.sm_annotation_utils.
MolecularDB
(info)[source]¶ -
property
id
¶ - Return type:
int
-
property
name
¶ - Return type:
str
-
property
version
¶ - Return type:
str
-
property
is_public
¶ - Return type:
bool
-
property
archived
¶ - Return type:
bool
-
property
-
class
metaspace.sm_annotation_utils.
IsotopeImages
(images, sf, chem_mod, neutral_loss, adduct, centroids, urls)[source]¶
-
class
metaspace.sm_annotation_utils.
GraphQLClient
(config)[source]¶ Client for low-level access to the METASPACE API, for advanced operations that aren’t supported by
metaspace.sm_annotation_utils.SMInstance
.Use
query
for calling GraphQL directly. An editor for composing GraphQL API queries can be found at https://metaspace2020.eu/graphql-
iterQuery
(query, variables={}, batch_size=50000)[source]¶ Assumes query has $offset and $limit parameters, and yields query results with these set to (k*batch_size, batch_size)
-
listQuery
(field_name, query, variables={}, batch_size=50000, limit=None)[source]¶ Gets all results of an iterQuery as a list. Field name must be provided in addition to the query (e.g. ‘allDatasets’)
-
MOLECULAR_DB_FIELDS
= 'id name version isPublic archived default'¶
-