version: 1.0.0
mzmatch version: 1.0.2
author: RA Scheltema (

Combines the contents of a set of PeakML files, containing either mass chromatograms or backgroundions at the lowest level (the signal of which is needed in order to make a correct assesment of similarness). The approach starts from the most intense, unprocessed peak in the complete set signals, covering all the measurements, and attempts to find all those signals from the other measurements falling within the mass window (option 'ppm') and the retention time window (option 'rtwindow'). The correct match from each measurement to the currently most intense signal is consequently identified by optimizing on the difference in area under the curve. Signals caused by the same analyte are expected to roughly have a similar shape and retention time. All matched signals are then clustered for the output and marked as processed, after which a new iteration is begun.

This tool can be used to iteratively compound files. For example, when analyzing a time series experiment, firstly the biological replicates of each timepoint can first be combined into a single file, the set of which can be labeled as biological replicates with the option 'combination'. Additional filtering operations can be applied in order to the timepoint combinations before proceeding to combine all the timepoints in a final set.


Windows batch-file:
SET JAVA=java -cp mzmatch.jar -da -dsa -Xmn1g -Xms1425m -Xmx1425m -Xss128k -XX:+UseParallelGC -XX:ParallelGCThreads=10

REM extract all the mass chromatograms
%JAVA% mzmatch.ipeak.ExtractMassChromatograms -v -i raw\*.mzXML -o peaks\ -ppm 3

REM combine the individual timepoints
%JAVA% mzmatch.ipeak.Combine -v -i peaks\24hr_*.peakml -o 24hr.peakml -ppm 3 -rtwindow 30 -combination biological
%JAVA% mzmatch.ipeak.Combine -v -i peaks\28hr_*.peakml -o 28hr.peakml -ppm 3 -rtwindow 30 -combination biological
%JAVA% mzmatch.ipeak.Combine -v -i peaks\32hr_*.peakml -o 32hr.peakml -ppm 3 -rtwindow 30 -combination biological

REM combine all timepoints in a single file
%JAVA% mzmatch.ipeak.Combine -v -i *hr.peakml -o timeseries.peakml -ppm 3 -rtwindow 30 -combination set


Commandline options*
-i [filename] Option for the input files. Multiple files can be passed by separating them with a comma (ie ,) or the use of a name with a wildcard (eg samples_*hrs.xml). The only allowed file format is PeakML containing either mass chromatograms or backgroundions at the lowest level (ie the result of another Combine can be used).
-o <filename> Option for the ouput file. The resulting matches are written to this file in the PeakML file format.
When this option has not been set the output is written to the standard output. Be sure to unset the verbose option when setting up a pipeline reading and writing from the standard in- and outputs.
-label <filename> Optional label for the set being made. The labels are stored in the header of the resulting file and used for display purposes.
-labels [filename] Optional labels for the input files. When these are used make sure to give as many labels as there are input files. The labels are stored in the header of the resulting file and used for display purposes.
-ppm <double> The accuracy of the measurement in parts-per-milion. This value is used for the matching of mass chromatogram (collections) and needs to bereasonable for the equipment used to make the measurement (the LTQ-Orbitrap manages approximately 3 ppm).
-rtwindow <double> The retention time window in seconds, defining the range where to look for matches.
-combination <see description> - set
The files are to be combined as a true set.
- technical
The files are to be combined as technical replicates.
- biological
The files are to be combined as biological replicates.
-h   When this is set, the help is shown.
-v   When this is set, the progress is shown on the standard output.
* per option: [] denotes multiple input values; <> denotes a single input value