ImzMLParser¶
pyimzml.ImzMLParser module¶
- class pyimzml.ImzMLParser.ImzMLParser(filename, parse_lib=None, ibd_file=<object object>, include_spectra_metadata=None)[source]¶
Parser for imzML 1.1.0 files (see specification here: https://ms-imaging.org/wp-content/uploads/2009/08/specifications_imzML1.1.0_RC1.pdf ).
Iteratively reads the .imzML file into memory while pruning the per-spectrum metadata (everything in <spectrumList> elements) during initialization. Returns a spectrum upon calling getspectrum(i). The binary file is read in every call of getspectrum(i). Use enumerate(parser.coordinates) to get all coordinates with their respective index. Coordinates are always 3-dimensional. If the third spatial dimension is not present in the data, it will be set to zero.
The global metadata fields in the imzML file are stored in parser.metadata. Spectrum-specific metadata fields are not stored by default due to avoid memory issues, use the include_spectra_metadata parameter if spectrum-specific metadata is needed.
- get_physical_coordinates(i)[source]¶
For a pixel index i, return the real-world coordinates in nanometers.
This is equivalent to multiplying the image coordinates of the given pixel with the pixel size.
- Parameters
i – the pixel index
- Returns
a tuple of x and y coordinates.
- Return type
Tuple[float]
- Raises
KeyError – if the .imzML file does not specify the attributes “pixel size x” and “pixel size y”
- get_spectrum_as_string(index)[source]¶
Reads m/z array and intensity array of the spectrum at specified location from the binary file as a byte string. The string can be unpacked by the struct module. To get the arrays as numbers, use getspectrum
- Parameters
index – Index of the desired spectrum in the .imzML file
- Return type
Tuple[str, str]
Output:
- mz_string:
string where each character represents a byte of the mz array of the spectrum
- intensity_string:
string where each character represents a byte of the intensity array of the spectrum
- getspectrum(index)[source]¶
Reads the spectrum at specified index from the .ibd file.
- Parameters
index – Index of the desired spectrum in the .imzML file
Output:
- mz_array: numpy.ndarray
Sequence of m/z values representing the horizontal axis of the desired mass spectrum
- intensity_array: numpy.ndarray
Sequence of intensity values corresponding to mz_array
- portable_spectrum_reader()[source]¶
Builds a PortableSpectrumReader that holds the coordinates list and spectrum offsets in the .ibd file so that the .ibd file can be read without opening the .imzML file again.
The PortableSpectrumReader can be safely pickled and unpickled, making it useful for reading the spectra in a distributed environment such as PySpark or PyWren.
- class pyimzml.ImzMLParser.PortableSpectrumReader(coordinates, mzPrecision, mzOffsets, mzLengths, intensityPrecision, intensityOffsets, intensityLengths)[source]¶
A pickle-able class for holding the minimal set of data required for reading, without holding any references to open files that wouldn’t survive pickling.
- read_spectrum_from_file(file, index)[source]¶
Reads the spectrum at specified index from the .ibd file.
- Parameters
file – File or file-like object for the .ibd file
index – Index of the desired spectrum in the .imzML file
Output:
- mz_array: numpy.ndarray
Sequence of m/z values representing the horizontal axis of the desired mass spectrum
- intensity_array: numpy.ndarray
Sequence of intensity values corresponding to mz_array
- pyimzml.ImzMLParser.browse(p)[source]¶
Create a per-spectrum metadata browser for the parser. Usage:
# get a list of the instrument configurations used in the first pixel instrument_configurations = browse(p).for_spectrum(0).get_ids("instrumentConfiguration")
Currently,
instrumentConfiguration
,dataProcessing
andreferenceableParamGroup
are supported.For browsing all spectra iteratively, you should by all means use ascending indices. Doing otherwise can result in quadratic runtime. The following example shows how to retrieve all unique instrumentConfigurations used:
browser = browse(p) all_config_ids = set() for i, _ in enumerate(p.coordinates): all_config_ids.update(browser.for_spectrum(i).get_ids("instrumentConfiguration"))
This is a list of ids with which you can find the corresponding
<instrumentConfiguration>
tag in the xml tree.- Parameters
p – the parser
- Returns
the browser
- pyimzml.ImzMLParser.getionimage(p, mz_value, tol=0.1, z=1, reduce_func=<built-in function sum>)[source]¶
Get an image representation of the intensity distribution of the ion with specified m/z value.
By default, the intensity values within the tolerance region are summed.
- Parameters
p – the ImzMLParser (or anything else with similar attributes) for the desired dataset
mz_value – m/z value for which the ion image shall be returned
tol – Absolute tolerance for the m/z value, such that all ions with values mz_value-|tol| <= x <= mz_value+|tol| are included. Defaults to 0.1
z – z Value if spectrogram is 3-dimensional.
reduce_func – the bahaviour for reducing the intensities between mz_value-|tol| and mz_value+|tol| to a single value. Must be a function that takes a sequence as input and outputs a number. By default, the values are summed.
- Returns
numpy matrix with each element representing the ion intensity in this pixel. Can be easily plotted with matplotlib
pyimzml.metadata module¶
This module contains the data structures used for the
pyimzml.ImzMLParser.ImzMLParser.metadata
and pyimzml.ImzMLParser.ImzMLParser.full_spectrum_metadata
fields.
- class pyimzml.metadata.ParamGroup(elem, **extra_data)[source]¶
This class exposes a group of imzML parameters at two layers of abstraction:
High-level examples: param_group[‘MS:0000000’]
Access a controlled vocabulary parameter by accession ID or name, or a user-defined parameter by name. Controlled vocabulary parameters will take priority. This also inherits values from referenced referenceable param groups.
- ‘particle beam’ in param_group
Check if a parameter exists by name / accession ID.
- param_group.targets
Access a subelement directly by name.
Low-level examples: param_group.cv_params - A list of all cvParams defined in this group. Includes raw values,
units, and multiple items if one accession is used multiple times. Does not include values inherited from referenceable param groups.
param_group.user_params - A list of all userParams. param_group.attrs - A dict of all XML attributes. param_group.subelements - A dict of all subelements.
pyimzml.ontology module¶
This module contains exports of the controlled vocabulary ontologies used by the ImzML format, used for ensuring that ImzML metadata items can always be accessed by their canonical names or accessions.