Welcome to pyimzML documentation!

This package provides a parser of imzML format as well as a simple imzML writer.

Typical usage pattern is as follows:

from pyimzml.ImzMLParser import ImzMLParser

p = ImzMLParser('Example.imzML')
for idx, (x,y,z) in enumerate(p.coordinates):
    mzs, intensities = p.getspectrum(idx)
    # ...

from pyimzml.ImzMLWriter import ImzMLWriter

with ImzMLWriter('output.imzML', polarity='positive') as w:
    for mzs, intensities, coords in my_spectra:
        # writes data to the .ibd file
        writer.addSpectrum(mzs, intensities, coords)
# at this point imzML file is written and files are closed

API Reference

ImzMLParser

pyimzml.ImzMLParser module

class pyimzml.ImzMLParser.ImzMLParser(filename, parse_lib=None, ibd_file=<object object>, include_spectra_metadata=None)[source]

Parser for imzML 1.1.0 files (see specification here: http://imzml.org/download/imzml/specifications_imzML1.1.0_RC1.pdf).

Iteratively reads the .imzML file into memory while pruning the per-spectrum metadata (everything in <spectrumList> elements) during initialization. Returns a spectrum upon calling getspectrum(i). The binary file is read in every call of getspectrum(i). Use enumerate(parser.coordinates) to get all coordinates with their respective index. Coordinates are always 3-dimensional. If the third spatial dimension is not present in the data, it will be set to zero.

The global metadata fields in the imzML file are stored in parser.metadata. Spectrum-specific metadata fields are not stored by default due to avoid memory issues, use the include_spectra_metadata parameter if spectrum-specific metadata is needed.

get_physical_coordinates(i)[source]

For a pixel index i, return the real-world coordinates in nanometers.

This is equivalent to multiplying the image coordinates of the given pixel with the pixel size.

Parameters

i – the pixel index

Returns

a tuple of x and y coordinates.

Return type

Tuple[float]

Raises

KeyError – if the .imzML file does not specify the attributes “pixel size x” and “pixel size y”

get_spectrum_as_string(index)[source]

Reads m/z array and intensity array of the spectrum at specified location from the binary file as a byte string. The string can be unpacked by the struct module. To get the arrays as numbers, use getspectrum

Parameters

index – Index of the desired spectrum in the .imzML file

Return type

Tuple[str, str]

Output:

mz_string:

string where each character represents a byte of the mz array of the spectrum

intensity_string:

string where each character represents a byte of the intensity array of the spectrum

getspectrum(index)[source]

Reads the spectrum at specified index from the .ibd file.

Parameters

index – Index of the desired spectrum in the .imzML file

Output:

mz_array: numpy.ndarray

Sequence of m/z values representing the horizontal axis of the desired mass spectrum

intensity_array: numpy.ndarray

Sequence of intensity values corresponding to mz_array

portable_spectrum_reader()[source]

Builds a PortableSpectrumReader that holds the coordinates list and spectrum offsets in the .ibd file so that the .ibd file can be read without opening the .imzML file again.

The PortableSpectrumReader can be safely pickled and unpickled, making it useful for reading the spectra in a distributed environment such as PySpark or PyWren.

class pyimzml.ImzMLParser.PortableSpectrumReader(coordinates, mzPrecision, mzOffsets, mzLengths, intensityPrecision, intensityOffsets, intensityLengths)[source]

A pickle-able class for holding the minimal set of data required for reading, without holding any references to open files that wouldn’t survive pickling.

read_spectrum_from_file(file, index)[source]

Reads the spectrum at specified index from the .ibd file.

Parameters
  • file – File or file-like object for the .ibd file

  • index – Index of the desired spectrum in the .imzML file

Output:

mz_array: numpy.ndarray

Sequence of m/z values representing the horizontal axis of the desired mass spectrum

intensity_array: numpy.ndarray

Sequence of intensity values corresponding to mz_array

pyimzml.ImzMLParser.browse(p)[source]

Create a per-spectrum metadata browser for the parser. Usage:

# get a list of the instrument configurations used in the first pixel
instrument_configurations = browse(p).for_spectrum(0).get_ids("instrumentConfiguration")

Currently, instrumentConfiguration, dataProcessing and referenceableParamGroup are supported.

For browsing all spectra iteratively, you should by all means use ascending indices. Doing otherwise can result in quadratic runtime. The following example shows how to retrieve all unique instrumentConfigurations used:

browser = browse(p)
all_config_ids = set()
for i, _ in enumerate(p.coordinates):
    all_config_ids.update(browser.for_spectrum(i).get_ids("instrumentConfiguration"))

This is a list of ids with which you can find the corresponding <instrumentConfiguration> tag in the xml tree.

Parameters

p – the parser

Returns

the browser

pyimzml.ImzMLParser.choose_iterparse(parse_lib=None)[source]
pyimzml.ImzMLParser.getionimage(p, mz_value, tol=0.1, z=1, reduce_func=<built-in function sum>)[source]

Get an image representation of the intensity distribution of the ion with specified m/z value.

By default, the intensity values within the tolerance region are summed.

Parameters
  • p – the ImzMLParser (or anything else with similar attributes) for the desired dataset

  • mz_value – m/z value for which the ion image shall be returned

  • tol – Absolute tolerance for the m/z value, such that all ions with values mz_value-|tol| <= x <= mz_value+|tol| are included. Defaults to 0.1

  • z – z Value if spectrogram is 3-dimensional.

  • reduce_func – the bahaviour for reducing the intensities between mz_value-|tol| and mz_value+|tol| to a single value. Must be a function that takes a sequence as input and outputs a number. By default, the values are summed.

Returns

numpy matrix with each element representing the ion intensity in this pixel. Can be easily plotted with matplotlib

pyimzml.metadata module

This module contains the data structures used for the pyimzml.ImzMLParser.ImzMLParser.metadata and pyimzml.ImzMLParser.ImzMLParser.full_spectrum_metadata fields.

class pyimzml.metadata.Metadata(root)[source]
pretty()[source]

Returns a nested dict summarizing all contained sections, intended to help human inspection.

class pyimzml.metadata.ParamGroup(elem, **extra_data)[source]

This class exposes a group of imzML parameters at two layers of abstraction:

High-level examples: param_group[‘MS:0000000’]

Access a controlled vocabulary parameter by accession ID or name, or a user-defined parameter by name. Controlled vocabulary parameters will take priority. This also inherits values from referenced referenceable param groups.

‘particle beam’ in param_group

Check if a parameter exists by name / accession ID.

param_group.targets

Access a subelement directly by name.

Low-level examples: param_group.cv_params - A list of all cvParams defined in this group. Includes raw values,

units, and multiple items if one accession is used multiple times. Does not include values inherited from referenceable param groups.

param_group.user_params - A list of all userParams. param_group.attrs - A dict of all XML attributes. param_group.subelements - A dict of all subelements.

apply_referenceable_param_groups(rpgs)[source]
pretty()[source]

Flattens attributes, params and extra fields into a single dict keyed by name. This function is intended to help human inspection. For programmatic access to specific fields, always use the attrs, param_by_name, param_by_accession, etc. instance attributes instead.

class pyimzml.metadata.SpectrumData(root, referenceable_param_groups)[source]

pyimzml.ontology module

This module contains exports of the controlled vocabulary ontologies used by the ImzML format, used for ensuring that ImzML metadata items can always be accessed by their canonical names or accessions.

pyimzml.ontology.ontology.convert_cv_param(accession, value)[source]

Looks up a term by accession number, and convert the provided value to the expected type.

pyimzml.ontology.ontology.convert_term_name(accession)[source]
pyimzml.ontology.ontology.convert_xml_value(dtype, value)[source]
pyimzml.ontology.ontology.lookup_and_convert_cv_param(accession, raw_name, value, unit_accession=None)[source]

Looks up a term by accession number, and returns the term name, its value converted into the expected datatype, and the unit name (if a unit accession number is also given).

ImzMLWriter

pyimzml.ImzMLWriter module

class pyimzml.ImzMLWriter.ImzMLWriter(output_filename, mz_dtype=<class 'numpy.float64'>, intensity_dtype=<class 'numpy.float32'>, mode='auto', spec_type='centroid', scan_direction='top_down', line_scan_direction='line_left_right', scan_pattern='one_way', scan_type='horizontal_line', mz_compression=<pyimzml.compression.NoCompression object>, intensity_compression=<pyimzml.compression.NoCompression object>, polarity=None)[source]

Create an imzML+ibd file.

Parameters
  • output_filename – is used to make the base name by removing the extension (if any). two files will be made by adding “.ibd” and “.imzML” to the base name

  • intensity_dtype – The numpy data type to use for saving intensity values

  • mz_dtype – The numpy data type to use for saving mz array values

  • mode

    • “continuous” mode will save the first mz array only

    • ”processed” mode save every mz array separately

    • ”auto” mode writes only mz arrays that have not already been written

  • intensity_compression – How to compress the intensity data before saving must be an instance of NoCompression or ZlibCompression

  • mz_compression – How to compress the mz array data before saving

addSpectrum(mzs, intensities, coords, userParams=[])[source]

Add a mass spectrum to the file.

Parameters
  • mz – mz array

  • intensities – intensity array

  • coords

    • 2-tuple of x and y position OR

    • 3-tuple of x, y, and z position

    note some applications want coords to be 1-indexed

close()[source]

Writes the XML file and closes all files. Will be called automatically if with-pattern is used.

finish()[source]

alias of close()

pyimzml.compression module

This module holds adapters for compressing an ImzML file’s binary data, currently only usable with ImzMLWriter.

class pyimzml.compression.NoCompression[source]

No compression.

compress(bytes)[source]
decompress(bytes)[source]
name = 'no compression'
rounding(data)[source]
class pyimzml.compression.ZlibCompression(round_amt=None)[source]

Zlib compression with optional rounding of values. Rounding helps the compression, but is lossy.

Parameters

round_amt – Number of digits after comma. None means no rounding.

compress(bytes)[source]
decompress(bytes)[source]
name = 'zlib compression'
rounding(data)[source]

Index

Module Index