mzmatch version: 1.0.2
author: RA Scheltema (firstname.lastname@example.org)
Identifies the contents of the given PeakML file with the given databases. The databases are expected to have the format of the example below (a standard file format for this would be preferable) and contain all of the compounds to be tested for. Within the tool-chain, in the package 'mzmatch.ipeak.db', several tools are provided for converting downloadable files (usually on an ftp-server) from the major metabolite databases. Before identifying your files, please make sure you have the most recent version.
Matching is performed only on mass, which is taken from the topmost structure. In other words, if the PeakML file contains a list of mass chromatograms the mass of each individual mass chromatogram is matched to the database. However, if the PeakML file contains a list of matched mass chromatograms, the mean mass of the matched mass chromatograms is used for identification.
When a peak is positively identified the annotation 'identification' is extended with the unique database ID corresponding to the match. This will keep the clutter in the PeakML file to a minimum and the information associated to the tag up-to-date when the database is updated with a new version. Additionally, the tag provides a convenient wayof removing false identifications from the PeakML file from a UI environment.
SET JAVA=java -cp mzmatch.jar -da -dsa -Xmn1g -Xms1425m -Xmx1425m -Xss128k -XX:+UseParallelGC -XX:ParallelGCThreads=10
REM identify the data in the PeakML file
%JAVA% mzmatch.ipeak.util.Identify -v -ppm 3 -i data.peakml -o data_identified.peakml -databases "hmdb.xml,lipidmaps.xml"
Database xml example:
<?xml version="1.0" encoding="UTF-8"?>
* per option:  denotes multiple input values; <> denotes a single input value
||Option for the input file. The only allowed file format is PeakML and no limitations are set to its contents. When this is not set the input is read from the standard in.
||Option for the output file. This file is writen in the same PeakML file format as the input file, with the addition of identification annotations (tag: 'identification' containing the database id's).
||The accuracy of the measurement in parts-per-milion. This value is used for matching the masses to those found in the supplied databases. This value is obligitory.
||Option for the molecule databases to match the contents of the input file to. These files should adhere to the compound-xml format.
||Optional minimum retention time for excluding signals from the input-file. This is for example convenient for excluding lipids on an LC/MS setup with a HILIC column.
||Optional maximum retention time for excluding signals from the input-file. This is for example convenient for including lipids on an LC/MS setup with a HILIC column.
||Optional retention time window for finding matches from the databases. If this is set and the database contains entries with previously registered retention times for molecules. When this value is not set, the stored retention times are ignored.
||When this is set, the help is shown.
||When this is set, the progress is shown on the standard output.