psm_utils.io
Parsers for proteomics search results from various search engines.
This module provides a unified interface for reading and writing peptide-spectrum match (PSM) files from various proteomics search engines and analysis tools. It supports automatic file type detection and conversion between different formats.
The module includes:
Reader and writer classes for various PSM file formats
Automatic file type inference from filename patterns
File conversion utilities
Progress tracking for long operations
Type-safe interfaces with comprehensive error handling
Supported file formats include MaxQuant, MS²PIP, Percolator, mzIdentML, pepXML, and many others. See the documentation for a complete list of supported formats.
Examples
Read a PSM file with automatic format detection:
>>> from psm_utils.io import read_file
>>> psm_list = read_file("results.tsv")
Convert between file formats:
>>> from psm_utils.io import convert
>>> convert("input.msms", "output.mzid")
Write a PSMList to file:
>>> from psm_utils.io import write_file
>>> write_file(psm_list, "output.tsv")
- class psm_utils.io.FileType
Type definition for filetype properties.
- psm_utils.io.read_file(filename: str | Path, *args, filetype: str = 'infer', **kwargs) PSMList
Read PSM file into
PSMList.- Parameters:
filename – Path to the PSM file to read.
filetype – File type specification. Can be any PSM file type with read support or “infer” to automatically detect from filename pattern. See documentation for supported file formats.
*args – Additional positional arguments passed to the PSM file reader.
**kwargs – Additional keyword arguments passed to the PSM file reader.
- Return type:
List of PSM objects parsed from the input file.
- Raises:
PSMUtilsIOException – If filetype cannot be inferred or if the specified filetype is unknown or not supported for reading.
- psm_utils.io.write_file(psm_list: PSMList, filename: str | Path, *args, filetype: str = 'infer', show_progressbar: bool = False, **kwargs) None
Write
PSMListto PSM file.- Parameters:
psm_list – List of PSM objects to be written to file.
filename – Path to the output file.
filetype – File type specification. Can be any PSM file type with write support or “infer” to automatically detect from filename pattern. See documentation for supported file formats.
show_progressbar – Whether to display a progress bar during the writing process.
*args – Additional positional arguments passed to the PSM file writer.
**kwargs – Additional keyword arguments passed to the PSM file writer.
- Raises:
PSMUtilsIOException – If filetype cannot be inferred or if the specified filetype is unknown or not supported for writing.
IndexError – If psm_list is empty and cannot provide an example PSM.
- psm_utils.io.convert(input_filename: str | Path, output_filename: str | Path, input_filetype: str = 'infer', output_filetype: str = 'infer', show_progressbar: bool = False) None
Convert a PSM file from one format into another.
- Parameters:
input_filename – Path to the input PSM file.
output_filename – Path to the output PSM file.
input_filetype – Input file type specification. Can be any PSM file type with read support or “infer” to automatically detect from filename pattern. See documentation for supported file formats.
output_filetype – Output file type specification. Can be any PSM file type with write support or “infer” to automatically detect from filename pattern. See documentation for supported file formats.
show_progressbar – Whether to display a progress bar during the conversion process.
- Raises:
PSMUtilsIOException – If input or output filetypes cannot be inferred, if the specified filetypes are unknown or not supported, or if the input file is empty.
KeyError – If the specified filetype is not found in READERS or WRITERS dictionaries.
Examples
Convert a MaxQuant msms.txt file to a MS²PIP peprec file, while inferring the applicable file types from the file extensions:
>>> from psm_utils.io import convert >>> convert("msms.txt", "filename_out.peprec")
Convert a MaxQuant msms.txt file to a MS²PIP peprec file, while explicitly specifying both file types:
>>> convert( ... "filename_in.msms", ... "filename_out.peprec", ... input_filetype="msms", ... output_filetype="peprec" ... )
Notes
Filetypes can only be inferred for select specific file names and/or extensions, such as
msms.txtor*.peprec.
psm_utils.io.alphadia
Reader for PSM files from the AlphaDIA search engine.
- class psm_utils.io.alphadia.AlphaDIAReader(filename: str | Path, *args: Any, **kwargs: Any)
Reader for AlphaDIA
precursor.tsvfile.- Parameters:
filename (pathlib.Path) – Path to PSM file.
*args – Additional positional arguments for parent class.
**kwargs – Additional keyword arguments for parent class.
psm_utils.io.cbor
Reader and writer for a simple, lossless psm_utils CBOR format.
Similar to the psm_utils.io.json module, this module provides a reader and
writer for PSMList objects in a lossless manner using
CBOR (Concise Binary Object Representation) format. CBOR provides better performance
and smaller file sizes compared to JSON while maintaining similar data structures.
The CBOR format stores PSMs as an array of objects, where each object represents a PSM with its attributes. Peptidoforms are written in the HUPO-PSI ProForma 2.0 notation. Fields that are not set (i.e., have a value of None) are omitted from the CBOR output to reduce file size.
Note: This module requires the cbor2 package to be installed.
- class psm_utils.io.cbor.CBORReader(filename: str | Path, *args, **kwargs)
Reader for psm_utils CBOR format.
- Parameters:
filename (str, Pathlib.Path) – Path to PSM file.
*args – Additional positional arguments passed to the base class.
**kwargs – Additional keyword arguments passed to the base class.
- class psm_utils.io.cbor.CBORWriter(filename: str | Path, *args, **kwargs)
Writer for psm_utils CBOR format.
- Parameters:
filename (str, Pathlib.Path) – Path to PSM file.
*args – Additional positional arguments passed to the base class.
**kwargs – Additional keyword arguments passed to the base class.
psm_utils.io.diann
Reader for PSM files from DIA-NN.
Reads the ‘.tsv’ file as defined on the DIA-NN documentation page.
Notes
DIA-NN calculates q-values at both the run and library level. The run-level q-value is used as the PSM q-value.
DIA-NN currently does not return precursor m/z values.
DIA-NN currently does not support C-terminal modifications in its searches.
- class psm_utils.io.diann.DIANNTSVReader(filename: str | Path, *args: Any, **kwargs: Any)
Reader for DIA-NN ‘.tsv’ file.
- Parameters:
filename (str or Path) – Path to PSM file.
*args – Additional positional arguments passed to the base class.
**kwargs – Additional keyword arguments passed to the base class.
psm_utils.io.flashlfq
Reader and writer for the FlashLFQ generic TSV format.
See the FlashLFQ documentation for more information on the format.
Notes
The FlashLFQ format does not contain the actual spectrum identifier. When reading a FlashLFQ file, the spectrum identifier is set to the row number in the file.
The FlashLFQ format does not contain the precursor m/z, but the theoretical monoisotopic mass. This value is not read into the PSM object, but can be calculated from the peptidoform.
To read from a FlashLFQ file, the
Full Sequencecolumn is expected to contain a ProForma v2 compatible peptidoform notation.
- class psm_utils.io.flashlfq.FlashLFQReader(filename: str | Path, *args, **kwargs)
Initialize PSM file reader.
- Parameters:
filename (str or pathlib.Path) – Path to PSM file.
*args – Additional positional arguments for subclasses.
**kwargs – Additional keyword arguments for subclasses.
- class psm_utils.io.flashlfq.FlashLFQWriter(filename: str | Path, *args: Any, fdr_threshold: float = 0.01, only_targets: bool = True, **kwargs: Any)
Reader for psm_utils TSV format.
- Parameters:
filename (pathlib.Path) – Path to PSM file.
*args – Additional positional arguments passed to the base class.
fdr_threshold – FDR threshold for filtering PSMs.
only_targets – If True, only target PSMs are written to file. If False, both target and decoy PSMs are written.
**kwargs – Additional keyword arguments passed to the base class.
psm_utils.io.fragpipe
Reader for PSM files from the Fragpipe platform.
Reads the Philosopher psm.tsv file as defined on the
Fragpipe documentation page.
Notes
Decoy PSMs and q-values are not returned by FragPipe.
- class psm_utils.io.fragpipe.FragPipeReader(filename: str | Path, use_calibrated_mz: bool = True, *args: Any, **kwargs: Any)
Reader for MSFragger
psm.tsvfile.- Parameters:
filename (pathlib.Path) – Path to PSM file.
use_calibrated_mz (bool) – Whether to use
Calibrated Observed M/Z(true) or non-calibratedObserved m/z(false), by default True.*args – Additional positional arguments passed to the base class.
**kwargs – Additional keyword arguments passed to the base class.
psm_utils.io.idxml
Interface with OpenMS idXML PSM files.
Notes
idXML supports multiple peptide hits (identifications) per spectrum. Each peptide hit is parsed as an individual
PSMobject.
- class psm_utils.io.idxml.IdXMLReader(filename: Path | str, *args: Any, **kwargs: Any)
Reader for idXML files.
- Parameters:
filename (str, pathlib.Path) – Path to idXML file.
*args – Additional positional arguments passed to the base class.
**kwargs – Additional keyword arguments passed to the base class.
Examples
>>> from psm_utils.io import IdXMLReader >>> reader = IdXMLReader("example.idXML") >>> psm_list = [psm for psm in reader]
- class psm_utils.io.idxml.IdXMLWriter(filename: str | Path, *args: Any, protein_ids: Any | None = None, peptide_ids: Any | None = None, **kwargs: Any)
Writer for idXML files.
- Parameters:
filename (pathlib.Path) – Path to PSM file.
*args – Additional positional arguments passed to the base class.
protein_ids (Any | None) – Optional list of
ProteinIdentificationobjects to be written to the idXML file.peptide_ids (Any | None) – Optional list of
PeptideIdentificationobjects to be written to the idXML file.**kwargs – Additional keyword arguments passed to the base class.
Notes
Unlike other psm_utils.io writer classes,
IdXMLWriterdoes not support writing a single PSM to a file with thewrite_psm()method. Only writing a full PSMList to a file at once with thewrite_file()method is currently supported.If protein_ids and peptide_ids are provided, each
PeptideIdentificationobject in the list peptide_ids will be updated with new rescoring_features from the PSMList. Otherwise, new pyopenms objects will be created, filled with information of PSMList and written to the idXML file.
Examples
Example with pyopenms objects:
>>> from psm_utils.io.idxml import IdXMLReader, IdXMLWriter >>> reader = IdXMLReader("psm_utils/tests/test_data/test_in.idXML") >>> psm_list = reader.read_file() >>> for psm in psm_list: ... psm.rescoring_features = {**psm.rescoring_features, **{"feature": 1}} >>> writer = IdXMLWriter("psm_utils/tests/test_data//test_out.idXML", reader.protein_ids, reader.peptide_ids) >>> writer.write_file(psm_list)
Example without pyopenms objects:
>>> from psm_utils.psm_list import PSMList >>> psm_list = PSMList(psm_list=[PSM(peptidoform="ACDK", spectrum_id=1, score=140.2, retention_time=600.2)]) >>> writer = IdXMLWriter("psm_utils/tests/test_data//test_out.idXML") >>> writer.write_file(psm_list)
- write_psm(psm: PSM)
Write a single PSM to the PSM file.
This method is currently not supported (see Notes).
- Parameters:
psm – PSM object to write
- Raises:
NotImplementedError – IdXMLWriter currently does not support write_psm.
psm_utils.io.ionbot
Interface with ionbot PSM files.
Currently only supports the ionbot.first.csv files.
- class psm_utils.io.ionbot.IonbotReader(filename: str | Path, *args, **kwargs)
Reader for
ionbot.first.csvPSM files.- Parameters:
filename (pathlib.Path) – Path to PSM file.
*args – Additional positional arguments passed to parent class.
**kwargs – Additional keyword arguments passed to parent class.
Examples
IonbotReader supports iteration:
>>> from psm_utils.io.ionbot import IonbotReader >>> for psm in IonbotReader("ionbot.first.csv"): ... print(psm.peptidoform.proforma) ACDEK AC[Carbamidomethyl]DEFGR [Acetyl]-AC[Carbamidomethyl]DEFGHIK
Or a full file can be read at once into a
psm_utils.psm_list.PSMListobject:>>> ionbot_reader = IonbotReader("ionbot.first.csv") >>> psm_list = ionbot_reader.read_file()
- exception psm_utils.io.ionbot.InvalidIonbotModificationError
Exception raised when ionbot modification parsing fails.
This exception is raised when: - Modification format is invalid - Position values are out of range - Modification string structure is malformed
- add_note()
Exception.add_note(note) – add a note to the exception
- with_traceback()
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
psm_utils.io.json
Reader and writer for a simple, lossless psm_utils JSON format.
Similar to the psm_utils.io.tsv and psm_utils.io.parquet modules,
this module provides a reader and writer for PSMList
objects in a lossless manner using JSON format. JSON provides human-readable output
and is widely compatible across platforms and programming languages.
The JSON format stores PSMs as an array of objects, where each object represents a PSM with its attributes. Peptidoforms are written in the HUPO-PSI ProForma 2.0 notation. Fields that are not set (i.e., have a value of None) are omitted from the JSON output to reduce file size.
- class psm_utils.io.json.JSONReader(filename: str | Path, *args, **kwargs)
Reader for psm_utils JSON format.
- Parameters:
filename (str, Pathlib.Path) – Path to PSM file.
*args – Additional positional arguments passed to the base class.
**kwargs – Additional keyword arguments passed to the base class.
psm_utils.io.maxquant
Interface to MaxQuant msms.txt PSM files.
- class psm_utils.io.maxquant.MSMSReader(filename: str | Path, *args, **kwargs)
Initialize reader for MaxQuant msms.txt PSM files.
- Parameters:
filename (pathlib.Path) – Path to the MaxQuant msms.txt PSM file.
*args – Additional positional arguments passed to parent class.
**kwargs – Additional keyword arguments passed to parent class.
Examples
MSMSReader supports iteration:
>>> from psm_utils.io.maxquant import MSMSReader >>> for psm in MSMSReader("msms.txt"): ... print(psm.peptidoform.proforma) WFEELSK NDVPLVGGK GANLGEMTNAGIPVPPGFC[+57.022]VTAEAYK ...
Or a full file can be read at once into a
PSMListobject:>>> reader = MSMSReader("msms.txt") >>> psm_list = reader.read_file()
psm_utils.io.msamanda
Interface to MS Amanda CSV result files.
- class psm_utils.io.msamanda.MSAmandaReader(filename: str | Path, *args, **kwargs)
Initialize reader for MS Amanda CSV result files.
- Parameters:
filename (pathlib.Path) – Path to the MS Amanda CSV file.
*args – Additional positional arguments passed to parent class.
**kwargs – Additional keyword arguments passed to parent class.
psm_utils.io.mzid
Reader and writer for HUPO-PSI mzIdentML format PSM files.
See psidev.info/mzidentml for more info on the format.
- class psm_utils.io.mzid.MzidReader(filename: str | Path, *args: Any, score_key: str | None = None, **kwargs: Any)
Reader for HUPO-PSI mzIdentML format PSM files.
- Parameters:
filename (pathlib.Path) – Path to PSM file.
*args – Additional positional arguments passed to parent class.
score_key – Name of the score metric to use as PSM score. If not provided, the score metric is inferred from the file if one of the child parameters of
MS:1001143is present.**kwargs – Additional keyword arguments passed to parent class.
Examples
MzidReader supports iteration:
>>> from psm_utils.io.mzid import MzidReader >>> for psm in MzidReader("peptides_1_1_0.mzid"): ... print(psm.peptidoform.proforma) ACDEK AC[Carbamidomethyl]DEFGR [Acetyl]-AC[Carbamidomethyl]DEFGHIK
Or a full file can be read at once into a
psm_utils.psm_list.PSMListobject:>>> mzid_reader = MzidReader("peptides_1_1_0.mzid") >>> psm_list = mzid_reader.read_file()
Notes
MzidReaderlooks for theretention timeorscan start timecvParams in both SpectrumIdentificationResult and SpectrumIdentificationItem levels. Note that according to the mzIdentML specification document (v1.1.1) neither cvParams are expected to be present at either levels.For the
PSM.spectrum_idproperty, thespectrum titlecvParam is preferred over thespectrumIDattribute, as these titles always match the titles in the peak list files.spectrumIDis then saved inPSM.metadata["mzid_spectrum_id"]. Ifspectrum titleis absent,spectrumIDis saved toPSM.spectrum_id.
- class psm_utils.io.mzid.MzidWriter(filename: str | Path, *args: Any, show_progressbar: bool = False, **kwargs: Any)
Writer for mzIdentML PSM files.
- Parameters:
Notes
Unlike other psm_utils.io writer classes,
MzidWriterdoes not support writing a single PSM to a file with thewrite_psm()method. Only writing a full PSMList to a file at once with thewrite_file()method is currently supported.While not required according to the mzIdentML specification document (v1.1.1), the retention time is written as cvParam
retention timeto the SpectrumIdentificationItem element. As the actual unit is not known in psm_utils, the unit is written as seconds.As the actual PSM score type is not known in psm_utils, the score is written as cvParam
MS:1001153to the SpectrumIdentificationItem element.
- write_psm(psm: PSM)
Write a single PSM to the PSM file.
This method is currently not supported (see Notes).
- Raises:
NotImplementedError – MzidWriter currently does not support write_psm.
psm_utils.io.parquet
Reader and writer for a simple, lossless psm_utils Parquet format.
Similar to the psm_utils.io.tsv module, this module provides a reader and writer
for PSMList objects in a lossless manner. However, Parquet provides
better performance and storage efficiency compared to TSV, and is recommended for large datasets.
- class psm_utils.io.parquet.ParquetReader(filename: str | Path, *args, **kwargs)
Reader for Parquet files.
- Parameters:
filename (pathlib.Path) – Path to the Parquet file.
*args – Additional positional arguments passed to the base class.
**kwargs – Additional keyword arguments passed to the base class.
- class psm_utils.io.parquet.ParquetWriter(filename: str | Path, *args, chunk_size: int = 1000000, **kwargs)
Writer for Parquet files.
- Parameters:
filename (pathlib.Path) – Path to the Parquet file.
*args – Additional positional arguments passed to the base class.
chunk_size – Number of PSMs to write in a single batch. Default is 1e6.
**kwargs – Additional keyword arguments passed to the base class.
psm_utils.io.peptide_record
Interface with Peptide Record PSM files.
Peptide Record (or PEPREC) is a legacy PSM file type developed at CompOmics as input format for MS²PIP. It is a simple and flexible delimited text file where each row represents a single PSM. Required columns are:
spec_id: Spectrum identifier; usually the identifier used in the spectrum file.peptide: Simple, stripped peptide sequence (e.g.,ACDE).modifications: Amino acid modifications in a custom format (see below).
Depending on the use case, more columns can be required or optional:
charge: Peptide precursor charge.observed_retention_time: Observed retention time.predicted_retention_time: Predicted retention time.label: Target/decoy:1for target PSMs,-1for decoy PSMs.score: Primary search engine score (e.g., the score used for q-value calculation).
Peptide modifications are denoted as a pipe-separated list of pipe-separated
location → label pairs for each modification. The location is an integer counted
starting at 1 for the first amino acid. 0 is reserved for N-terminal modifications
and -1 for C-terminal modifications. Unmodified peptides can be marked with a hyphen
(-). For example:
PEPREC modification(s) |
Explanation |
|---|---|
|
Unmodified |
|
|
|
|
|
|
|
|
Full PEPREC example:
spec_id,modifications,peptide,charge
peptide1,-,ACDEK,2
peptide2,2|Carbamidomethyl,ACDEFGR,3
peptide3,0|Acetyl|2|Carbamidomethyl,ACDEFGHIK,2
Attention
Labile, unlocalized, and fixed modifications are not encoded in the Peptide Record
notation. To encode fixed modifications, use
apply_fixed_modifications() before writing to
Peptide Record.
- class psm_utils.io.peptide_record.PeptideRecordReader(filename: str | Path, *args, **kwargs)
Reader for Peptide Record PSM files.
- Parameters:
filename (pathlib.Path) – Path to PSM file.
*args – Additional positional arguments passed to parent class.
**kwargs – Additional keyword arguments passed to parent class.
Examples
PeptideRecordReader supports iteration:
>>> from psm_utils.io.peptide_record import PeptideRecordReader >>> for psm in PeptideRecordReader("peprec.txt"): ... print(psm.peptidoform.proforma) ACDEK AC[Carbamidomethyl]DEFGR [Acetyl]-AC[Carbamidomethyl]DEFGHIK
Or a full file can be read at once into a
PSMListobject:>>> peprec_reader = PeptideRecordReader("peprec.txt") >>> psm_list = peprec_reader.read_file()
- class psm_utils.io.peptide_record.PeptideRecordWriter(filename: str | Path, *args, **kwargs)
Writer for Peptide Record PSM files.
- Parameters:
filename (pathlib.Path) – Path to PSM file.
*args – Additional positional arguments passed to parent class.
**kwargs – Additional keyword arguments passed to parent class.
- write_psm(psm: PSM) None
Write a single PSM to new or existing Peptide Record PSM file.
- Parameters:
psm – PSM object to write.
Examples
To write single PSMs to a file,
PeptideRecordWritermust be opened as a context manager. Then, within the context,write_psm()can be called:>>> with PeptideRecordWriter("peprec.txt") as writer: >>> writer.write_psm(psm)
- psm_utils.io.peptide_record.peprec_to_proforma(peptide: str, modifications: str, charge: int | None = None) Peptidoform
Convert Peptide Record notation to
Peptidoform.- Parameters:
peptide – Stripped peptide sequence.
modifications – Modifications in Peptide Record notation (e.g.,
4|Oxidation)charge – Precursor charge state
- Returns:
Peptidoform
- Return type:
peptidoform
- Raises:
InvalidPeprecModificationError – If a PEPREC modification cannot be parsed.
- psm_utils.io.peptide_record.proforma_to_peprec(peptidoform: Peptidoform) tuple[str, str, int | None]
Convert
Peptidoformto Peptide Record notation.- Parameters:
peptidoform – Input peptidoform object.
- Returns:
peptide – Stripped peptide sequence
modifications – Modifications in Peptide Record notation
charge – Precursor charge state, if available, else
None
Notes
Labile, unlocalized, and fixed modifications are not encoded in the Peptide Record notation. To encode fixed modifications, use
apply_fixed_modifications()before writing to Peptide Record.
- psm_utils.io.peptide_record.from_dataframe(peprec_df: DataFrame) PSMList
Convert Peptide Record Pandas DataFrame into PSMList.
- Parameters:
peprec_df – Peptide Record DataFrame
- Returns:
PSMList object
- Return type:
psm_list
- psm_utils.io.peptide_record.to_dataframe(psm_list: PSMList) DataFrame
Convert PSMList object into Peptide Record Pandas DataFrame.
- Parameters:
psm_list – Input PSMList object.
- Returns:
Peptide Record DataFrame.
- Return type:
pd.DataFrame
Examples
>>> psm_list = PeptideRecordReader("peprec.csv").read_file() >>> psm_utils.io.peptide_record.to_dataframe(psm_list) spec_id peptide modifications charge label ... 0 peptide1 ACDEK - 2 1 ... 1 peptide2 ACDEFGR 2|Carbamidomethyl 3 1 ... 2 peptide3 ACDEFGHIK 0|Acetyl|2|Carbamidomethyl 2 1 ...
psm_utils.io.pepxml
Interface with TPP pepXML PSM files.
- class psm_utils.io.pepxml.PepXMLReader(filename: str | Path, *args: Any, score_key: str | None = None, **kwargs: Any)
Reader for pepXML PSM files.
- Parameters:
filename (pathlib.Path) – Path to PSM file.
*args – Additional positional arguments passed to parent class.
score_key – Name of the score metric to use as PSM score. If not provided, the score metric is inferred from a list of known search engine scores.
**kwargs – Additional keyword arguments passed to parent class.
psm_utils.io.percolator
Reader and writers for Percolator Tab PIN/POUT PSM files.
The tab-delimited input and output format for Percolator are defined on the Percolator GitHub Wiki pages.
Notes
While
PercolatorTabReadersupports reading the peptide notation with preceding and following amino acids (e.g.R.ACDEK.F), these amino acids are not stored and are not written byPercolatorTabWriter.
- class psm_utils.io.percolator.PercolatorTabReader(filename: str | Path, *args: Any, score_column: str | None = None, retention_time_column: str | None = None, mz_column: str | None = None, **kwargs: Any)
Reader for Percolator Tab PIN/POUT PSM file.
As the score, retention time, and precursor m/z are often embedded as feature columns, but not with a fixed column name, their respective column names need to be provided as parameters to the class. If not provided, these properties will not be added to the resulting PSM. Nevertheless, they will still be added to its rescoring_features property dictionary, along with the other features.
- Parameters:
filename (pathlib.Path) – Path to PSM file.
*args – Additional positional arguments passed to parent class.
score_column – Name of the column that holds the primary PSM score.
retention_time_column – Name of the column that holds the retention time.
mz_column – Name of the column that holds the precursor m/z.
**kwargs – Additional keyword arguments passed to parent class.
- class psm_utils.io.percolator.PercolatorTabWriter(filename: str | Path, *args: Any, style: str | None = None, feature_names: list[str] | None = None, add_basic_features: bool = False, **kwargs: Any)
Writer for Percolator TSV “PIN” and “POUT” PSM files.
- Parameters:
filename (pathlib.Path) – Path to PSM file.
*args – Additional positional arguments passed to parent class.
style – Percolator Tab style. One of {
pin,pout}. Ifpin, the columnsSpecId,Label,ScanNr,ChargeN,PSMScore,Peptide, andProteinsare written alongside the requested feature names (seefeature_names). Ifpout, the columnsPSMId,Label,score,q-value,posterior_error_prob,peptide, andproteinIdsare written. By default, the style is inferred from the file name extension.feature_names – List of feature names to extract from PSMs and write to file. List values should correspond to keys in the rescoring_features property. If None, no rescoring features will be written to the file. If appending to an existing file, the existing header will be used to determine the feature names. Only has effect with
pinstyle.add_basic_features – If True, add
PSMScoreandChargeNfeatures to the file. Only has effect withpinstyle.**kwargs – Additional keyword arguments passed to parent class.
- psm_utils.io.percolator.join_pout_files(target_filename: str | Path, decoy_filename: str | Path, output_filename: str | Path) None
Join target and decoy Percolator Out (POUT) files into single PercolatorTab file.
- Parameters:
target_filename – Path to target POUT file.
decoy_filename – Path to decoy POUT file.
output_filename – Path to output combined POUT file.
psm_utils.io.proteome_discoverer
Reader for Proteome Discoverer MSF PSM files.
This module provides functionality to read PSM data from Proteome Discoverer MSF SQLite database files.
The reader supports both target and decoy peptides, handles various modification types (amino acid and terminal modifications), and extracts complete scoring information from the MSF database structure.
Examples
>>> from psm_utils.io.proteome_discoverer import MSFReader
>>> reader = MSFReader("results.msf")
>>> psm_list = reader.read_file()
>>> for psm in reader:
... print(f"{psm.peptidoform} - Score: {psm.score}")
Notes
MSF file versions 79, 53, and 8 are currently supported.
- class psm_utils.io.proteome_discoverer.MSFReader(filename: str | Path, *args, **kwargs)
Initialize MSF reader with database connection and version validation.
- Parameters:
filename (pathlib.Path) – Path to Proteome Discoverer MSF file.
*args – Additional positional arguments passed to parent class.
**kwargs – Additional keyword arguments passed to parent class.
psm_utils.io.proteoscape
Reader for ProteoScape Parquet files.
- class psm_utils.io.proteoscape.ProteoScapeReader(filename: str | Path, *args: Any, **kwargs: Any)
Reader for ProteoScape Parquet files.
- Parameters:
filename (pathlib.Path) – Path to ProteoScape Parquet file.
*args – Additional positional arguments passed to the base class.
**kwargs – Additional keyword arguments passed to the base class.
psm_utils.io.sage
Reader for PSM files from the Sage search engine.
Reads the results.sage.tsv file as defined on the
Sage documentation page.
- class psm_utils.io.sage.SageTSVReader(filename: str | Path, *args: Any, score_column: str = 'sage_discriminant_score', **kwargs: Any)
Reader for Sage results file.
- Parameters:
filename (pathlib.Path) – Path to PSM file.
*args – Additional positional arguments passed to parent class.
score_column – Name of the column that holds the primary PSM score. Default is
sage_discriminant_score,hyperscorecould also be used.**kwargs – Additional keyword arguments passed to parent class.
- psm_utils.io.sage.SageReader
alias of
SageTSVReader
- class psm_utils.io.sage.SageParquetReader(filename: str | Path, *args: Any, score_column: str = 'sage_discriminant_score', **kwargs: Any)
Reader for Sage results file.
- Parameters:
filename (pathlib.Path) – Path to PSM file.
*args – Additional positional arguments passed to parent class.
score_column – Name of the column that holds the primary PSM score. Default is
sage_discriminant_score,hyperscorecould also be used.**kwargs – Additional keyword arguments passed to parent class.
psm_utils.io.tsv
Reader and writer for a simple, lossless psm_utils TSV format.
Most PSM file formats will introduce a loss of some information when reading,
writing, or converting with psm_utils.io due to differences between file
formats. In contrast, PSMList objects can be written
to — or read from — this simple TSV format without any information loss (with exception
of the free-form spectrum attribute).
The format follows basic TSV rules, using tab as delimiter, and supports quoting when a field contains the delimiter. Peptidoforms are written in the HUPO-PSI ProForma 2.0 notation.
Required and optional columns equate to the required and optional attributes of
PSM. Dictionary items in
provenance_data, metadata, and rescoring_features
are flattened to separate columns, each with their column names prefixed with
provenance:, meta:, and rescoring:, respectively.
Examples
psm_utils TSV file, compatible with HUPO-PSI Universal Spectrum Identifierpeptidoform spectrum_id run collection
VLHPLEGAVVIIFK/2 17555 Adult_Frontalcortex_bRP_Elite_85_f09 PXD000561
...
peptidoform spectrum_id run collection spectrum is_decoy score precursor_mz retention_time protein_list source provenance:filename rescoring:ExpMass rescoring:CalcMass rescoring:hyperscore rescoring:deltaScore rescoring:frac_ion_b rescoring:frac_ion_y rescoring:Mass rescoring:dM rescoring:absdM rescoring:PepLen rescoring:Charge2 rescoring:Charge3 rescoring:Charge4 rescoring:enzN rescoring:enzC rescoring:enzInt
RNVIDKVAK/2 _3_2_1 False 20.3 1042.64 ['DECOY_sp|Q8U0H4_REVERSED|RTCB_PYRFU-tRNA-splicing-ligase-RtcB-OS=Pyrococcus-furiosus...'] percolator pyro.t.xml.pin 1042.64 1042.64 20.3 6.6 0.444444 0.333333 1042.64 0.0003 0.0003 9 1 0 0 1 0 1
KHLEQHPK/2 _4_2_1 False 26.5 1016.56 ['sp|Q8TZD9|RS15_PYRFU-30S-ribosomal-protein-S15-OS=Pyrococcus-furiosus-(strain-ATCC...'] percolator pyro.t.xml.pin 1016.56 1016.56 26.5 18.5 0.375 0.75 1016.56 0.001 0.001 8 1 0 0 1 0 0
...
- class psm_utils.io.tsv.TSVReader(filename: str | Path, *args, **kwargs)
Initialize PSM file reader.
- Parameters:
filename (str or pathlib.Path) – Path to PSM file.
*args – Additional positional arguments for subclasses.
**kwargs – Additional keyword arguments for subclasses.
- class psm_utils.io.tsv.TSVWriter(filename: str | Path, *args: Any, example_psm: PSM | None = None, **kwargs: Any)
Writer for psm_utils TSV format.
- Parameters:
filename (pathlib.Path) – Path to PSM file.
*args – Additional positional arguments passed to the base class.
example_psm – Example PSM, required to extract the column names when writing to a new file. Should contain all fields that are to be written to the PSM file, i.e., all items in the
provenance_data,metadata, andrescoring_featuresattributes. In other words, items that are not present in the example PSM will not be written to the file, even though they are present in other PSMs passed towrite_psm()orwrite_file().**kwargs – Additional keyword arguments passed to the base class.
psm_utils.io.xtandem
Interface with X!Tandem XML PSM files.
Notes
In X!Tandem XML, N/C-terminal modifications are encoded as normal modifications and are therefore parsed accordingly. Any information on which modifications are N/C-terminal is therefore lost.
N-terminal modification in X!Tandem XML:
<aa type="M" at="1" modified="42.01057" />
Consecutive modifications, i.e., a modified residue that is modified further, is encoded in X!Tandem XML as two distinctive modifications on the same site. However, in
psm_utils, multiple modifications on the same site are not supported. While parsing X!Tandem XML PSMs, the mass shift labels of these two modifications will therefore be summed into a single modification.For example, carbamidomethylation of cystein (57.02200) plus ammonia-loss (-17.02655) will be parsed as one modification with mass shift 39.994915, which matches the combined modification Pyro-carbamidomethyl:
<aa type="C" at="189" modified="57.02200" /> <aa type="C" at="189" modified="-17.02655" />
[+39,99545]
- class psm_utils.io.xtandem.XTandemReader(filename: str | Path, *args: Any, decoy_prefix: str = 'DECOY_', score_key: str = 'expect', **kwargs: Any)
Reader for X!Tandem XML PSM files.
- Parameters:
filename (pathlib.Path) – Path to PSM file.
*args – Additional positional arguments passed to parent class.
decoy_prefix – Protein name prefix used to denote decoy protein entries. Default:
"DECOY_".score_key – Key of score to use as PSM score. One of
"expect","hyperscore","delta", or"nextscore". Default:"expect". The"expect"score (e-value) is converted to its negative natural logarithm to facilitate downstream analysis.**kwargs – Additional keyword arguments passed to parent class.
Examples
XTandemReadersupports iteration:>>> from psm_utils.io.xtandem import XTandemReader >>> for psm in XTandemReader("pyro.t.xml"): ... print(psm.peptidoform.proforma) WFEELSK NDVPLVGGK GANLGEMTNAGIPVPPGFC[+57.022]VTAEAYK ...
Or a full file can be read at once into a
PSMListobject:>>> reader = XTandemReader("pyro.t.xml") >>> psm_list = reader.read_file()