Skip to content

Interface

MatthewThe edited this page Jun 13, 2023 · 22 revisions

Percolator can handle two types of input files: the tab-delimited PIN.tsv format (recommended) and PIN.xml format. Input files can be generated from search engine outputs using our converters.

Getting started

To run Percolator on a tab-delimited input, use the following options:


$ percolator input.tsv -X output.xml

where input.tsv is a valid tab-delimited file.

To run Percolator on an XML file in PIN format, use the -k flag:


$ percolator -k pin.xml -X output.xml

PIN.tsv tab-delimited file format

Percolator accepts input in a simple tab-delimited format where each row contains features associated with a single PSM:


id <tab> label <tab> scannr <tab> feature1 <tab> ... <tab> featureN <tab> peptide <tab> proteinId1 <tab> .. <tab> proteinIdM

label is a flag set to 1 for target PSMs, and -1 for decoys, and scannum is an integer value.

These lines should be preceded by one line specifying a column header with the exact String ScanNr, followed by the names of the individual features separated by tabs.

Optionally, the spectrum filename can be specified in a column directly after the ScanNr columns with the header filename or spectrafile, which will be propagated to the result file(s).

An optional second line specifying the default scoring vector should contain the String DefaultDirection in its first column, e.g.


PSMId <tab> Label <tab> ScanNr <tab> feature1name <tab> ... <tab> featureNname <tab> Peptide <tab> Proteins
DefaultDirection <tab> - <tab> - <tab> feature1weight <tab> ... <tab> featureNweight [optional]

If pin.xml is a valid XML file, it is possible to use Percolator as a converter and generate tab-delimited files from XML files by using the following options:


$ percolator -k pin.xml -J pin.tsv

After successful termination, pin.tsv will contain a tab-delimited file that can be fed to Percolator as described above; the file will be overwritten, or created if it does not already exist.

Converters

The percolator-converters package contain a set of converters from the output format of sequest/crux (sqt2pin), x!tandem (tandem2pin) and ms-gf+ (msgf2pin) format to tab delimited-file format.

Usage:
   sqt2pin [options] -o output.tsv target.sqt decoy.sqt 

Where output.tsv is where the percolator input file will be written (ensure to have read and write access on the file). target.sqt is the target sqt-file, and decoy.sqt is the decoy sqt-file. Small data sets may be merged by replace the sqt-files with meta files. Meta files are text files containing the paths of sqt-files, one path per line. For successful result, the different runs should be generated under similar condition.

The same applies to msgf2pin and tandem2pin, with mzid-files or X!tandem-files instead of sqt-files respectively.

It is also still possible to output XML files in PIN format by using a -k flag instead of the -o flag for the tab delimited-file format.

The converters create an identifier for each PSM of the form <file_identifier>_<scan_number>_<charge>_<rank>, e.g. my_interesting_raw_file_24326_2_1.

PIN and POUT file formats

Since version 1.15, Percolator has had its own XML input format, whose structure is defined by the schema percolator_in.xml.

Similarly, Percolator’s output (called POUT for Percolator-OUT) is defined by the schema percolator_out.xml.

If pin.xml is a valid Percolator XML file, Percolator can be run using the following options:


$ percolator [options] -k pin.xml -X output.xml

After a successful termination, output.xml will contain Percolator’s output formatted in POUT format; the file will be overwritten, or created if it does not already exist.

Clone this wiki locally