ImzMLParser

pyimzml.ImzMLParser module

class pyimzml.ImzMLParser.ImzMLParser(filename, parse_lib=None, ibd_file=<object object>, include_spectra_metadata=None)[source]

Parser for imzML 1.1.0 files (see specification here: https://ms-imaging.org/wp-content/uploads/2009/08/specifications_imzML1.1.0_RC1.pdf ).

Iteratively reads the .imzML file into memory while pruning the per-spectrum metadata (everything in <spectrumList> elements) during initialization. Returns a spectrum upon calling getspectrum(i). The binary file is read in every call of getspectrum(i). Use enumerate(parser.coordinates) to get all coordinates with their respective index. Coordinates are always 3-dimensional. If the third spatial dimension is not present in the data, it will be set to zero.

The global metadata fields in the imzML file are stored in parser.metadata. Spectrum-specific metadata fields are not stored by default due to avoid memory issues, use the include_spectra_metadata parameter if spectrum-specific metadata is needed.

get_physical_coordinates(i)[source]

For a pixel index i, return the real-world coordinates in nanometers.

This is equivalent to multiplying the image coordinates of the given pixel with the pixel size.

Parameters

i – the pixel index

Returns

a tuple of x and y coordinates.

Return type

Tuple[float]

Raises

KeyError – if the .imzML file does not specify the attributes “pixel size x” and “pixel size y”

get_spectrum_as_string(index)[source]

Reads m/z array and intensity array of the spectrum at specified location from the binary file as a byte string. The string can be unpacked by the struct module. To get the arrays as numbers, use getspectrum

Parameters

index – Index of the desired spectrum in the .imzML file

Return type

Tuple[str, str]

Output:

mz_string:

string where each character represents a byte of the mz array of the spectrum

intensity_string:

string where each character represents a byte of the intensity array of the spectrum

getspectrum(index)[source]

Reads the spectrum at specified index from the .ibd file.

Parameters

index – Index of the desired spectrum in the .imzML file

Output:

mz_array: numpy.ndarray

Sequence of m/z values representing the horizontal axis of the desired mass spectrum

intensity_array: numpy.ndarray

Sequence of intensity values corresponding to mz_array

portable_spectrum_reader()[source]

Builds a PortableSpectrumReader that holds the coordinates list and spectrum offsets in the .ibd file so that the .ibd file can be read without opening the .imzML file again.

The PortableSpectrumReader can be safely pickled and unpickled, making it useful for reading the spectra in a distributed environment such as PySpark or PyWren.

class pyimzml.ImzMLParser.PortableSpectrumReader(coordinates, mzPrecision, mzOffsets, mzLengths, intensityPrecision, intensityOffsets, intensityLengths)[source]

A pickle-able class for holding the minimal set of data required for reading, without holding any references to open files that wouldn’t survive pickling.

read_spectrum_from_file(file, index)[source]

Reads the spectrum at specified index from the .ibd file.

Parameters
  • file – File or file-like object for the .ibd file

  • index – Index of the desired spectrum in the .imzML file

Output:

mz_array: numpy.ndarray

Sequence of m/z values representing the horizontal axis of the desired mass spectrum

intensity_array: numpy.ndarray

Sequence of intensity values corresponding to mz_array

pyimzml.ImzMLParser.browse(p)[source]

Create a per-spectrum metadata browser for the parser. Usage:

# get a list of the instrument configurations used in the first pixel
instrument_configurations = browse(p).for_spectrum(0).get_ids("instrumentConfiguration")

Currently, instrumentConfiguration, dataProcessing and referenceableParamGroup are supported.

For browsing all spectra iteratively, you should by all means use ascending indices. Doing otherwise can result in quadratic runtime. The following example shows how to retrieve all unique instrumentConfigurations used:

browser = browse(p)
all_config_ids = set()
for i, _ in enumerate(p.coordinates):
    all_config_ids.update(browser.for_spectrum(i).get_ids("instrumentConfiguration"))

This is a list of ids with which you can find the corresponding <instrumentConfiguration> tag in the xml tree.

Parameters

p – the parser

Returns

the browser

pyimzml.ImzMLParser.choose_iterparse(parse_lib=None)[source]
pyimzml.ImzMLParser.getionimage(p, mz_value, tol=0.1, z=1, reduce_func=<built-in function sum>)[source]

Get an image representation of the intensity distribution of the ion with specified m/z value.

By default, the intensity values within the tolerance region are summed.

Parameters
  • p – the ImzMLParser (or anything else with similar attributes) for the desired dataset

  • mz_value – m/z value for which the ion image shall be returned

  • tol – Absolute tolerance for the m/z value, such that all ions with values mz_value-|tol| <= x <= mz_value+|tol| are included. Defaults to 0.1

  • z – z Value if spectrogram is 3-dimensional.

  • reduce_func – the bahaviour for reducing the intensities between mz_value-|tol| and mz_value+|tol| to a single value. Must be a function that takes a sequence as input and outputs a number. By default, the values are summed.

Returns

numpy matrix with each element representing the ion intensity in this pixel. Can be easily plotted with matplotlib

pyimzml.metadata module

This module contains the data structures used for the pyimzml.ImzMLParser.ImzMLParser.metadata and pyimzml.ImzMLParser.ImzMLParser.full_spectrum_metadata fields.

class pyimzml.metadata.Metadata(root)[source]
pretty()[source]

Returns a nested dict summarizing all contained sections, intended to help human inspection.

class pyimzml.metadata.ParamGroup(elem, **extra_data)[source]

This class exposes a group of imzML parameters at two layers of abstraction:

High-level examples: param_group[‘MS:0000000’]

Access a controlled vocabulary parameter by accession ID or name, or a user-defined parameter by name. Controlled vocabulary parameters will take priority. This also inherits values from referenced referenceable param groups.

‘particle beam’ in param_group

Check if a parameter exists by name / accession ID.

param_group.targets

Access a subelement directly by name.

Low-level examples: param_group.cv_params - A list of all cvParams defined in this group. Includes raw values,

units, and multiple items if one accession is used multiple times. Does not include values inherited from referenceable param groups.

param_group.user_params - A list of all userParams. param_group.attrs - A dict of all XML attributes. param_group.subelements - A dict of all subelements.

apply_referenceable_param_groups(rpgs)[source]
pretty()[source]

Flattens attributes, params and extra fields into a single dict keyed by name. This function is intended to help human inspection. For programmatic access to specific fields, always use the attrs, param_by_name, param_by_accession, etc. instance attributes instead.

class pyimzml.metadata.SpectrumData(root, referenceable_param_groups)[source]

pyimzml.ontology module

This module contains exports of the controlled vocabulary ontologies used by the ImzML format, used for ensuring that ImzML metadata items can always be accessed by their canonical names or accessions.

pyimzml.ontology.ontology.convert_cv_param(accession, value)[source]

Looks up a term by accession number, and convert the provided value to the expected type.

pyimzml.ontology.ontology.convert_term_name(accession)[source]
pyimzml.ontology.ontology.convert_xml_value(dtype, value)[source]
pyimzml.ontology.ontology.lookup_and_convert_cv_param(accession, raw_name, value, unit_accession=None)[source]

Looks up a term by accession number, and returns the term name, its value converted into the expected datatype, and the unit name (if a unit accession number is also given).