0% found this document useful (0 votes)
33 views31 pages

BeStSel Tutorial

Uploaded by

Gabriel Guerrero
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views31 pages

BeStSel Tutorial

Uploaded by

Gabriel Guerrero
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

TUTORIAL

latest update: 27-03-2022

Circular Dichroism (CD) spectroscopy is a widely used technique for the study of protein structure.
Numerous algorithms have been developed for the estimation of the secondary structure composition
from CD spectra. These methods often fail to provide acceptable results on α/β mixed or β-structure-
rich proteins. The problem arises from the spectral diversity of β-structures. In Micsonai et al., (2015)
Proc. Natl. Acad. Sci. USA 112, E3095-E3103, we have shown that the parallel/antiparallel orientation
and the twisting of β-sheets account for the observed spectral diversity. We developed the Beta
Structure Selection (BeStSel) method for the secondary structure estimation that takes the twist of β-
structures into account. This method can reliably distinguish parallel and antiparallel β-sheets and
provides an improved secondary structure estimation for a broad range of proteins. Moreover, the
secondary structure components applied by the method are characteristic to the protein fold and thus
the fold can be predicted to the level of topology in the CATH classification (Orengo et al., (1997)
Structure 5(8):1093-1108.) from a single CD spectrum.

In publications using BeStSel method for secondary structure analysis, please kindly cite Micsonai et
al., (2015) Proc. Natl. Acad. Sci. USA 112, E3095-E3103 and Micsonai et al., (2018) Nucleic Acids Res.
46, W315–W322.

Here, we provide a brief introduction for the use of the BeStSel web server [Link] The
server is under development. Although we make all efforts for its perfect functioning, we do not take
the responsibility for any prediction error or software problems. We highly appreciate any questions
or suggestions on the use of the server or reports on bugs found. Please, feel free to send us a message
through the homepage (Contact page) or by email to kardos@[Link] or micsonai@[Link].

For all details on the BeStsel method, beyond this tutorial, please, see the Information provided on
the web server pages and refer to the original publications of Micsonai et al.

1
Table of contents
Introduction............................................................................................................................................. 3
Single spectrum analysis ......................................................................................................................... 4
Input units ........................................................................................................................................... 4
Results format ..................................................................................................................................... 8
Data in text .......................................................................................................................................... 9
Wavelength range, scale factor, best factor ..................................................................................... 11
Fold recognition................................................................................................................................. 13
Multiple spectra analysis ....................................................................................................................... 14
Fold recognition..................................................................................................................................... 19
Guide to CD and data analysis ............................................................................................................... 24
Secondary structure from PDB files ...................................................................................................... 25
Extinction coefficient calculator ............................................................................................................ 28
Disordered-ordered classification ......................................................................................................... 29
Cited by… ............................................................................................................................................... 31

2
Introduction
First, one of the 8 modules of the server can be chosen, listed on the left side of the starting page:
Single spectrum analysis, Multiple spectra analysis, Fold recognition, Guide to CD and data analysis,
Secondary structure from PDB files, Extinction coefficient calculator, Disordered-ordered classification
and Cited by.

A language selector located in the top left corner is currently under development. In the future, the
user will be able to select from English, Hungarian, French, German, Japanese, Korean or Chinese.
Currently the web page is only available in English.

3
Single spectrum analysis
In Single spectrum analysis, a single CD spectrum can be analyzed for the secondary structure
composition and the protein fold can be predicted.

Data can be uploaded from a text file or can be copied into the window in two data columns, separator
can be space, tab, comma or semicolon. Please use dot as decimal point. In case of browsed data file
in text format, the system automatically recognizes the header and the data columns.

Input units

You can choose the appropriate Input units from the pop-up menu:

Delta epsilon (M-1 cm-1)

Mean residue molar ellipticity (deg cm2 dmol-1). ([θ]MRW = θ/(10 x cr x l), where cr is the molar
concentration per residue, l is the pathlength.

Measured ellipticity data can be directly uploaded. In that case, the protein concentration in μM, the
number of residues per protein molecule and the pathlength in cm should be provided by the user.

At the bottom, please, provide the captcha or use a password to submit your data. This is only to avoid
the attack of robots, there is no need for registration to use the server.

4
A Data examination window will appear to check if the data was uploaded properly. Data is converted
to delta epsilon.

Please check carefully the wavelength range and amplitude of the CD data.

Secondary structure calculation can be initiated by clicking on “Calculate the secondary structure”
bottom. The system makes an automatic data examination and gives a message in case of unexpected
CD amplitudes (calculation is still possible).

5
In the results window, the results will appear in a graphical image with all the useful information
provided (including wavelength range and user-provided information). At first, data is analyzed in the
possible widest wavelength range of the uploaded data. However, we strongly suggest to choose an
appropriate wavelength range where the PMT voltage was below the instrument limit (e.g., 600 volts)
upon the measurement.

BeStSel uses 8 precalculated and fixed basis spectra sets - which are optimized for the chosen
wavelength range - to analyze the submitted spectrum and estimate the secondary structure content.
For all details on the optimization and fitting processes, refer to the original publications of Micsonai
et al.

6
7
Results format

Below the results (please, roll down if it is not on the screen), the output format can be changed for
the convenience of the user.

By choosing “Show!” the Results page can be reformatted. “Save image” will open the results in a
separate browser window and can be saved as an image.

1 2
RMSD: root mean square deviation. √𝑤 ∑𝑤
𝑖=1(𝐶𝐷𝑒𝑥𝑝,𝑖 − 𝐶𝐷𝑓𝑖𝑡,𝑖 )

1 1 2
NRMSD: normalized root mean square deviation. √ ∑𝑤 (𝐶𝐷𝑒𝑥𝑝,𝑖 − 𝐶𝐷𝑓𝑖𝑡,𝑖 )
max(𝐶𝐷𝑒𝑥𝑝 )−min(𝐶𝐷𝑒𝑥𝑝 ) 𝑤 𝑖=1

8
Data in text

For further data processing by the users, result can be shown in text format with the predicted results
at the top and the experimental, fitted, and the residual data in columns below. By copying, the data
can be transferred to any data processing software to make your own plots, etc.

9
At the bottom of the Results page, brief information on the BeStSel fitting and some advices to consider
are provided.

10
Wavelength range, scale factor, best factor

On the left side of the Results page, the wavelength range can be
chosen and the analysis can be recalculated. A scale factor can be
chosen for the recalculation as well. The CD amplitude is
multiplied with this factor.

The “Best factor” function carries out a series of analyses by


changing the current scaling factor automatically in the range of
0.5-2.

The factor related to the lowest NRMSD is highlighted (see next


page). The dependence of the individual secondary structure
components on the CD amplitude is plotted. This can be
informative in the case of uncertainties in the protein
concentration or pathlength. For CD data in a wide wavelength
range (down to at least 180 nm), a change in the factor from 1 to
the lowest fitting NRMSD is an indicator of incorrect concentration
or pathlength values.

Please note that the automatic scaling calculation of Best factor


shows the dependence of the secondary structure estimation and
NRMSD on the amplitude of your spectrum. The factor with the
lowest NRMSD should not be taken as correction for your
normalized spectrum when used in the 190-250 or 200-250 nm
range. The correct concentration determination is essential for
accurate analysis. When 175-250 or 180-250 range is used and the
Best factor is significantly different from 1.0, it indicates possible
normalization problems, and the factor can be taken as
suggestion.

The “Best factor” results can be saved as an image or in text format by giving the format of the results
at the bottom of the page.

11
12
Fold recognition

Protein fold can be predicted from the results of the CD spectrum analysis. For information on this
method please see the “Fold recognition” module in this tutorial, or the Information on the main
BeStSel page.

13
Multiple spectra analysis
A series of spectra can be uploaded from a file or copied into the window from a worksheet. The first
row should contain the values of the parameter that was varied in the measurements (e. g.
temperature values, if the spectra were recorded at different temperatures). Below, there are
columns. The first column contains the wavelength values and the others columns contain the
corresponding spectral data. Therefore, the total number of columns should be equal to the number
of values in the first row plus one. Data separator can be either tab, comma, semicolon or space.

14
First, a data examination page comes up to check if the upload was correct. Then, all the spectra are
evaluated at the same time and shown as a function of the chosen parameter.

15
16
After clicking on the “Calculate the secondary structure” button, the result window will appear.

17
At the bottom, the image can be chosen to be saved and is opened in a separate window. Also, results
in text format can be chosen for further data processing by the user.

On the left side, the wavelength range can be changed or a scaling factor can be set and the data can
be re-analyzed.

18
Fold recognition

The „Fold recognition” module of the server is used to predict the fold of a protein structure from the
secondary structure contents. The calculation can be initiated if the eight secondary structure
components sum up to 100.0 % and the chain length is provided. These data may come from previous
BeStSel analysis of a CD spectrum (see „Single spectrum analysis” module) or from the analysis of a
PDB structure (see „Secondary structure from PDB files” module).

19
4 different analyses are provided: (1) a search for similar structures on the entire PDB, (2) a fold search
on the closest structures on a non-redundant single domain PDB subset, (3) a search on single domains
with secondary structure composition within the expected error of the CD secondary structure analysis
and (4) a weighted K-nearest neighbors search method.

20
For the weighted K- nearest neighbors method, the number of residues is required for the analysis.

21
The weighted K- nearest neighbors method predict the Class, Architecture, Topology, and Homology
of the protein using the single domain subset of CATH 4.3 (see the number of domains and categories
in the table below). In each layer (Class, Architecture, Topology, Homology) the predicted categories
are ordered by their calculated WKNN scores excluding every structure that belongs to an already
predicted category (lower numbered hits). The WKNN score is defined by the sum of the weighted
distance of every structure (from the query point) among the K- nearest neighbors which belong to
that particular category.

Number of CATH 4.3


Domains 61932
Class 5
Architecture 43
Topology 1467
Homology 6540

22
At the bottom of the Fold recognition results, information on the analysis methods is provided.

23
Guide to CD and data analysis
This module comprises a practical guide on performing CD measurements and data analysis the correct
way. The guide provides useful information on sample preparation, cuvettes, instrument status,
measurement parameters and data analysis.

24
Secondary structure from PDB files

The „Secondary structure and beta-sheet decomposition for PDB structure” module is used for the
calculation of the secondary structure composition of protein structures on the basis of the eight
structural elements of BeStSel. For comparison, DSSP data [Kabsch and Sander, Biopolymers, 22:2577
(1983)] and Selcon3 [Sreerama et al., Protein Sci., 8:370 (1999)] composition is also calculated. Either
structures deposited in PDB can be submitted or PDB files can be uploaded. In case of submitting a
PDB ID, the ID should be given in four letters code format (case-insensitive).

25
At first, results are provided for the entire structure in the Result page of the „Secondary structure
from PDB files” module. At the bottom of the page, the labeled polypeptide chains in the structure are
listed for selection to display (see below). For the selected individual chains, the CATH classification
(Orengo et al., (1997) Structure 5(8):1093-1108.) will also be provided (if any).

At the bottom of the page the secondary structure decomposition methods can be selected
independently to display in a downloadable image or in text format (Data in text).

26
The secondary structure composition of the entire structure or the selected chains (see below) is
displayed separately. The detailed descriptions of the structural elements are described in the original
papers, and a brief summary can also be found in the “Information” part of the page.

27
Extinction coefficient calculator

The “Extinction coefficient calculator” enables the user to determine protein concentrations using
absorbance at 205 or 214 nm. This is especially useful when absorbance at 280 nm cannot be used due
to the lack of Trp and the small number of Tyr residues.

The absorbance of the CD samples can be directly measured at these wavelengths by the
spectropolarimeter. If the instrument is capable of converting HT values to absorbance values, the
protein concentration of the sample can be determined from the CD measurement after subtracting
the baseline absorptions.

Extinction coefficients at 205 and 214 nm are calculated from the amino acid sequence (see the
references for more information) and the number of disulfide bonds. Results are provided at the
bottom of the page and also include the number of residues and the molecular weight.

28
Disordered-ordered classification
The “Disordered-ordered classification” module categorizes proteins as ordered or disordered based
on their CD spectra. This feature is particularly useful to differentiate between disordered and right-
hand twisted antiparallel β proteins that have quite similar spectra.

CD data can be provided for a single protein or for multiple proteins. The first column should contain
wavelength values and each following column should comprise CD data for a particular protein.
Wavelength values must include either 197, 206 and 233 nm or 212, 217 and 225 nm for the
predictions.

29
The results of the classification are provided in a table along with the data used for the predictions.

30
Cited by…
On the “Cited by” page, the user will find a collection of the publications that cited any one of the
publications about BeStSel (Micsonai et al. PNAS (2015), Micsonai et al. Nucleic Acids Res. (2018),
Micsonai et al. Methods Mol Biol. (2021)). This page features a search engine to allow users to browse
amongst ~1000 articles to find examples and useful information on the applications of CD spectroscopy
and BeStSel.

31

You might also like