0% found this document useful (0 votes)
39 views16 pages

Modeling Glycansw:alphafold 3

The document discusses the modeling of glycans using AlphaFold 3, highlighting its capabilities and limitations. It presents a hybrid syntax for input that effectively generates stereochemically valid glycan models and emphasizes the importance of glycans in biological processes. The study aims to support the evaluation of AlphaFold 3 in understanding glycan-related mechanisms in various biological contexts.

Uploaded by

owenswanny
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views16 pages

Modeling Glycansw:alphafold 3

The document discusses the modeling of glycans using AlphaFold 3, highlighting its capabilities and limitations. It presents a hybrid syntax for input that effectively generates stereochemically valid glycan models and emphasizes the importance of glycans in biological processes. The study aims to support the evaluation of AlphaFold 3 in understanding glycan-related mechanisms in various biological contexts.

Uploaded by

owenswanny
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Glycobiology, 2025, 35, cwaf048

https://doi.org/10.1093/glycob/cwaf048
Advance access publication date 28 August 2025
Glyco-Informatics

Modeling glycans with AlphaFold 3: capabilities, caveats,


and limitations

Downloaded from https://academic.oup.com/glycob/article/35/10/cwaf048/8242499 by Grand Canyon University user on 25 September 2025


Chin Huang1 ,2 ,* , Natarajan Kannan1 ,3 , Kelley W. Moremen1 ,2 ,*
1 Department of Biochemistry and Molecular Biology, University of Georgia, 120 Green Street, Athens, GA 30602, United States, 2 Complex
Carbohydrate Research Center, University of Georgia, 315 Riverbend Road, Athens, GA 30602, United States, 3 Institute of Bioinformatics,
University of Georgia, 120 Green Street, Athens, GA 30602, United States
*Corresponding authors: Kelley W. Moremen, [email protected]; Chin Huang, [email protected]

Abstract
Glycans are complex carbohydrates that exhibit extraordinary structural complexity and stereochemical diversity while playing
essential roles in many biological processes, including immune regulation, pathogen recognition, and cell communication. In humans,
more than half of all proteins are glycosylated, particularly those in secretory and membrane-associated pathways, highlighting the
importance of glycans in health and disease. The recent release of the AlphaFold 3 source code enables customizable modeling not
only of proteins but also glycan-containing biomolecular complexes. We assessed the capacity of AlphaFold 3 to model glycans using
several input formats and identified a hybrid syntax employing Chemical Component Dictionary (CCD)-based molecular building blocks
linked by “bondedAtomPairs” (BAP) as most effective in generating stereochemically valid glycan models. This workflow was used to
create a library of AlphaFold 3 input templates and corresponding structural models for various glycan classes. We further explored
capabilities, limitations, and remediation strategies for modeling problematic structures. Glycan interactions were also modeled with
glycosylation enzymes and lectins with benchmarking and validation against known crystal structures. This protocol-driven approach
is valuable for generating stereochemically valid, static models of glycan-protein interactions to support hypothesis development
and subsequent structural and functional validation. However, caution should be observed in overinterpretation of the static models
since glycans are known to exhibit considerable conformational dynamics that can be further captured by equilibrium sampling using
molecular dynamics-based approaches. By sharing benchmarked examples using the BAP syntax we aim to support broader evaluation
of AlphaFold 3 in studying glycan-related mechanisms in biosynthesis, signaling, infection, and disease.
Keywords: AlphaFold 3; glycan modeling; glycoprotein modeling; glycan stereochemistry; glycan-protein interaction.

Introduction numerous possibilities for glycan branching, two possible


Glycans are ubiquitous in all domains of life (Varki 2017) and anomeric linkages, multiple sugar ring configurations and
play essential roles in processes such as immune modulation puckering, and diverse additional modifications (Woods
(Pinho et al. 2023), cancer progression (Smith and Bertozzi 2018). Rotational flexibility of glycans at the anomeric
2021), and neural development (Schnaar et al. 2014). Glycan linkage also leads to conformational dynamics that present
structures are assemblies of monosaccharide building blocks challenges in structure determination. This glycan structural
linked through the C1 or C2 anomeric carbon in glycosidic diversity, complexity, and dynamics remain a formidable chal-
linkages to hydroxyl groups of other monosaccharides in lenge in glycobiology research. Computational approaches,
linear or branched configurations or to functionalities in other including molecular docking (Grant and Woods 2014),
molecule types (Moremen and Haltiwanger 2019). These molecular dynamics (MD) (Case et al. 2005; Kirschner
structures can be found free in solution (e.g. human milk et al. 2008; Fadda 2022), and quantum mechanics/molecular
oligosaccharides (HMOs) and hyaluronic acid (HA)), cova- mechanics (QM/MM) simulations (Mendoza and Masgrau
lently linked to proteins (N- and O-linked glycans), lipids (gly- 2021; Perez and Makshakova 2022), can generate predictive
cosphingolipids (GSLs), dolichol phosphate (Dol-P)-linked models that complement empirical structural studies. Recent
oligosaccharides), in bridging linkages between lipids and developments in AI approaches, including RoseTTAFold
proteins (GPI anchors) (Moremen et al. 2012), to other macro- All-Atom (Krishna et al. 2024), AlphaFold 3 (Abramson
molecules (xenobiotics) (Allain et al. 2020), and numerous et al. 2024), DeepGlycanSite (He et al. 2024), Chai-1 (Chai
other classes of glycoconjugates. Glycan structures are assem- Discovery et al. 2024), CLIMBS (Luo and Parmeggiani 2025),
bled by glycosyltransferases (Moremen and Haltiwanger and Boltz-2 (Passaro et al. 2025) now present an additional
2019) and deconstructed by glycoside hydrolases (Moremen and complementary set of in silico tools for modeling glycan-
et al. 1994) and lyases (Garron and Cygler 2010), and play protein interaction.
numerous roles in modulation of function including recogni- In 2018, AlphaFold initially demonstrated highly accurate
tion by glycan-binding proteins (lectins) (Stowell et al. 2010). predictive modeling capabilities for proteins, particularly
The structural complexities of glycans arise from the when homologous protein templates were available in
diversity of glycan linkages between monosaccharides, the PDB (Senior et al. 2020). AlphaFold 2 expanded its

Received: May 09, 2025. Revised: July 28, 2025. Accepted: August 17, 2025
© The Author(s) 2025. Published by Oxford University Press.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/),
which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
2 Huang et al.

capabilities to novel proteins and dimers (Jumper et al. 2021; intended to enable universal ligand input, their performance
Evans et al. 2022). AlphaFold 3 (AF3), with an updated in handling stereochemically complex glycans has yet to be
diffusion-based generative framework, enabled the modeling systematically evaluated.
of diverse biological molecules, including a growing number We first assessed the modeling quality in AF3 using a simple
of bound ligands. Initially released as a web server format linear HMO, lacto-N-neotetraose (LNnT) (Fig. 1a). Given the
(https://alphafoldserver.com/), AF3 provides support for simplicity of the SMILES format, we initially tested its feasibil-
modeling multimers, a limited library of ligand structures ity. AF3 correctly represented the absolute configurations, ring
(e.g. nucleic acids, selected metal ions, and cofactors), and forms, linkage order, and anomeric centers of LNnT. However,
post-translational modifications (PTMs) (Abramson et al. a Gal residue (residue 2) was incorrectly modeled as Glc, due

Downloaded from https://academic.oup.com/glycob/article/35/10/cwaf048/8242499 by Grand Canyon University user on 25 September 2025


2024). While protein glycosylation was included among to misassignment of the C4 hydroxyl from axial to equatorial
PTMs, the ambiguity in specifying linkages and higher-order (Fig. 1b). Furthermore, the SMILES input format does not
stereochemistry presented limitations in evaluating the appli- support atom indexing, thereby limiting its applicability for
cation of AF3 in modeling protein-glycan interactions. AF3 specifying covalent linkages in multicomponent assemblies.
was also released as a command-line interface, standalone To circumvent this, the AF3 development team introduced
version in late 2024 (https://github.com/google-deepmind/a rdkit_utlis (https://github.com/google-deepmind/alphafold3/
lphafold3) that presented opportunities for universal ligand blob/main/src/alphafold3/data/tools/rdkit_utils.py), which
input, but the syntax for specifying glycan stereochemistry converts SMILES into a CCD-like format compatible with
remained a challenge. Concurrently, the Protein Data Bank userCCD input. This tool generates idealized coordinates
(PDB) underwent an effort in carbohydrate remediation designed to enhance model performance. However, models
(Shao et al. 2021) (7.7% of entries contain glycan structures generated with this approach still exhibited stereochemical
(Dashti et al. 2020, Prestegard 2021)). Monosaccharides problems. Specifically, the LNnT Gal residue 2 retained the
have been standardized, accompanied by the removal of erroneous C4 hydroxyl orientation, and the glycosidic bond
obsolete polysaccharide entries, implementation of consistent between Glc and Gal (residues 1 and 2) was incorrectly
linkage annotations, and the integration of cross-references to rendered as an α- rather than β-linkage (Fig. 1c). Additional
external glycoinformatics resources. Although not originally attempts to convert an energy-minimized PDB structure of
intended for deep learning applications, this curated dataset LNnT (computed by GLYCAM (Grant et al. 2025); Fig. 1e)
now serves as a valuable resource for model training and into userCCD format using rdkit_utils introduced further
evaluation of AF3 predictions. errors (data not shown). Alternative programs for CCD
Here, we investigate the input syntax for specifying glycan generation (Agirre 2017; Steiner and Tucker 2017) such
structures using standalone AF3 and its impact in modeling as AceDRG (Long et al. 2017) yielded similarly incorrect
key glycan conformational features, including anomeric con- configurations in AF3 (data not shown).
figurations, epimeric orientations (axial versus equatorial), An alternative strategy for glycan structure input in AF3
and ring puckering. We compare modeling outcomes across is to specify individual monosaccharide building blocks in
multiple input formats and identify the bondedAtomPairs a glycan structure using their unique CCD identifiers. By
(BAP) syntax that yields the most consistent and stereo- leveraging the curated monosaccharide CCD library, larger
chemically plausible glycan models. Fidelity of modeling is glycan assemblies can be generated by explicitly defining
assessed through comparison with empirical structures. The glycosidic linkages between the building blocks using the
BAP syntax framework supports the generation of starting bondedAtomPairs (BAP) syntax in the input JavaScript Object
glycan structures for downstream analysis and expands the Notation (JSON) file (described in detail below). Using this
applicability of AF3 in modeling complex macromolecules approach, we were able to model the correct stereochemical
beyond glycoconjugates. structure of LNnT, including all anomeric configurations and
axial/equatorial orientations (Fig. 1d).
To evaluate whether this approach generalizes to more com-
Results plicated branched structures, we modeled a complex bianten-
nary N-glycan G2 (Fig. 1f). SMILES input resulted in mul-
Enhanced structural fidelity through tiple structural errors: GlcNAc (residue 1) C4 hydroxyl was
bondedAtomPairs (BAP)-defined glycosidic incorrectly modeled as axial; C2 hydroxyls of Man residues
linkages (residue 3–5) and C4 hydroxyls of Gal residues (residue 8 and
Monosaccharides possess distinct structural features, includ- 9) exhibited erroneous equatorial orientations; residues 3, 4,
ing absolute configurations (D/L nomenclature), ring forms 8 and 9 were depicted in opposite anomeric configurations
(linear, pyranose, or furanose), and anomeric centers (α/β), (α instead of β, or vice versa) (Fig. 1g). Models generated
which pose substantial challenges in cheminformatics (See- from userCCD converted through rdkit_utils were improved,
berger 2022). Linkage of monosaccharide building blocks though not flawless (Fig. 1h); of note, the C2 hydroxyls on
into larger glycan structures presents further complications, Man residues 3–5 were still incorrect. Due to limitations
since multiple hydroxyl positions are available for glycosidic in rdkit_utils, default parameters failed to generate correct
linkages, and combined modification of multiple hydroxyls conformers for G2; setting useRandomCoords = True allowed
on a single monosaccharide can lead to branched structures generation of idealized coordinates. Ultimately, using CCD
(Lebrilla et al. 2022). AF3 generally allows ligand defini- input linked with bondedAtomPairs (BAP) enabled effective
tion through three input formats: Simplified Molecular Input modeling of G2, recapitulating all stereochemical features
Line Entry System (SMILES) (Weininger 1988), Chemical correctly (Fig. 1i). These predictions were in strong agree-
Component Dictionary (CCD) codes (Feng et al. 2004), and ment with energy-minimized structures simulated by GLY-
user-defined CCDs (userCCD). Although these formats are CAM (Grant et al. 2025) (Fig. 1j).
Glycobiology, 2025, Vol, 35, Issue 10 3

Downloaded from https://academic.oup.com/glycob/article/35/10/cwaf048/8242499 by Grand Canyon University user on 25 September 2025


Fig. 1. Comparison of ligand input formats for glycan modeling in AlphaFold 3. a) Symbol nomenclature for glycans (SNFG) cartoon of
lacto-N-neotetraose (LNnT) with annotated residue numbers. b) AF3-modeled LNnT using simplified molecular input line entry system (SMILES) input,
c) user-defined chemical component dictionary (userCCD) converted by rdkit_utlis, d) CCD with bondedAtomPairs, e) energy minimized LNnT simulated
by GLYCAM. f) SNFG cartoon of biantennary G2 N-linked glycan. g) AF3-modeled G2 using SMILES, h) userCCD, i) CCD with bondedAtomPairs, j)
GLYCAM-simulated energy-minimized G2. k) SNFG legend used in this article. Gray and pink circles, or purple square indicate wrong stereochemistry as
labeled.

Formalization of bondedAtomPairs (BAP) syntax To demonstrate bonding logic, we showcased the β1,4
for glycan topology specification linkage between chitobiose and β-Man. The CCD codes for
AF3 accepts protein sequence input in the form of a JSON 2-acetamido-2-deoxy-β-D-glucopyranose (NAG) and β-D-
file. This file also allows specification of additional inputs, mannopyranose (BMA) were assigned as residues 2 and 3,
including modeling random seeds, DNA, RNA, modified respectively, in the ccdCodes section (Fig. 2b, red). A β1,4
amino acid residues, and small-molecule ligands using linkage was defined by connecting the O4 atom of NAG
SMILES, CCD, or userCCD formats (https://github.com/ (residue 2) with the C1 atom of BMA (residue 3) in the
google-deepmind/alphafold3/blob/main/docs/input.md). In bondedAtomPairs (BAP) field (Fig. 2b, blue). In this case, the
particular, the bondedAtomPairs (BAP) syntax can be used to β-linkage is being specified by the CCD code for the β-anomer
define covalent linkages between distinct molecular entities; of mannose (BMA). Atom numbering for NAG and BMA is
either between proteins and ligands or between individual provided in Fig. 2d. By default, AF3 removes the O1 atom of
ligand components. This syntax offers greater detail and the donor monosaccharide during glycosidic bond formation,
flexibility compared to the simplified format used in the as specified by the .pdbx_leaving_atom_flag in the official
AF3 Server interface. Surprisingly, AF3 can also process input macromolecular Crystallographic Information File (mmCIF).
consisting solely of ligands without a protein context, which Extending our investigation to additional monosaccharides,
offers a convenient way to validate ligand syntax and bonding we observed that not all leaving groups are automatically
logic prior to full complex modeling. removed as expected. Sialic acid, specifically N-acetyl-α-
Specifying glycans through bondedAtomPairs (BAP) syntax neuraminic acid (CCD code SIA), a ketose unlike most aldose
requires the use of monosaccharide CCD codes with appro- monosaccharides, links through its C2 position (Meng et al.
priate anomeric specification. For simplicity, a partial list 2013). In this case, the O2 atom of SIA is not removed by
of human-relevant monosaccharide CCD codes is presented default (Fig. S1a), even when the .pdbx_leaving_atom_flag
in Fig. 2a. A comprehensive CCD code table, expanded to is correctly set. We also tested an alternative linkage by
include L/D configurations, linear and cyclic (pyranose/fura- connecting the C6 atom of GAL to the O2 atom of SIA.
nose) forms, and α/β anomeric designations was compiled In this configuration, both the O6 atom of GAL and the O2
based on the SNFG (Varki et al. 2015) and is provided in atom of SIA were retained, generating unrealistic valence
Table S1. An example of bondedAtomPairs (BAP) syntax for states and unfavorable torsion angles at the glycosidic
the G2 N-glycan is illustrated in Fig. 2b. The order of the junction (Fig. S1b). Manual deletion of leaving oxygens
residues in the ccdCodes section of the JSON file corresponds in the monosaccharide CCD entry, O2 in SIA (Fig. S1c
directly to the residue numbering used in the bondedAtom- and d), is necessary to maintain proper valency. The items
Pairs (BAP) field, as visualized in Fig. 2c. To support multiple removed were highlighted in the full standard SIA mmCIF
glycan ligands or multiple glycosylation sites, unique identi- (Supplementary Document 1). Similarly, CCDs converted
fiers can be defined in the id field; in this example, a single into userCCD format without manual edits retain O1 atoms,
copy of G2 was assigned as “NG.” leading to duplicate oxygen atoms at glycosidic junctions.
4 Huang et al.

Downloaded from https://academic.oup.com/glycob/article/35/10/cwaf048/8242499 by Grand Canyon University user on 25 September 2025


Fig. 2. Glycan specification using chemical component dictionary (CCD) and bondedAtomPairs (BAP) syntax in AlphaFold 3. a) Representative
CCD codes for common human monosaccharides. b) Partial JSON script used to generate G2 N-linked glycan. c) SNFG representation of G2 N-linked
glycan with residue numbering equivalent to what is used in the JSON modeling input file. The order of the monosaccharides in the JSON file ccdCodes
list corresponds with the residue numbering in the SNFG representation. d) Demonstration of a β1,4 linkage between GlcNAc and Man, specified by the
second and third CCD entries (NAG and BMA shown in bold red font in panel b). The glycosidic bond is defined by the bondedAtomPairs (BAP) field
(shown in bold blue font in panel b), specifying a bond between the O4 atom of NAG and the C1 atom of BMA (oval in panel d). The entry in the
bondedAtomPairs (BAP) list leads to covalent bond formation between the respective hydroxyl oxygen (O4 of NAG) and the anomeric C1 (C1 of BMA)
with the removal of the β-linked O1 (leaving atom) on the C1 of BMA. The resulting glycosidic linkage retains the anomeric configuration at C1 of the
original CCD entry (β-linkage for BMA).

This behavior also applies to non-carbohydrate ligands. α1,6-Man branch (6-arm) deviated. These latter differences
Glycan modifications such as phosphorylation (Li et al. 2022), from the empirical crystal structure are likely due to their
sulfation (Wu et al. 2023), methylation (Urbanowicz et al. solvent exposure and increased conformational flexibility.
2012), and acetylation (Wang et al. 2021) follow the same The distorted 3,O B boat conformation of the terminal Man
principle: manual removal of leaving oxygen atoms is required (residue 9) in the AF3 model contrasts with an unfavorable,
to ensure chemically plausible connectivity. In light of the high-energy 1 C4 chair conformation for the glycone in the
error-prone nature of glycan linkage specification and the crystal structure (Fig. 3j). Although this may initially appear
advanced domain knowledge it demands, we have curated inconsistent, the AF3 prediction aligns with the Cremer-Pople
a collection of validated JSON input files encompassing all ring puckering itinerary (Cremer and Pople 1975) proposed
common N-glycans (Stanley et al. 2022) and other glycans for a homologous GH47 α-mannosidase (Thompson et al.
modeled herein as reference templates for reproducible 2012) where glycone conformation transitions during catal-
implementation (Supplementary Document 2). ysis from 3,O B/3 S1 through 3 H4 intermediate en route to a
1 C enzymatic product.
4
Benchmarking AlphaFold 3 for high-complexity It is noteworthy that the crystal structure of MAN1A1
glycan modeling was solved in complex with a lanthanum ion (La3+ ) as an
After establishing the syntax for consistent glycan modeling, inhibitor (Xiang et al. 2016). However, AF3 models incorpo-
we benchmarked the capacity of AF3 to model glycans within rating either Ca2+ or La3+ generated the same 3,O B struc-
glycoprotein complexes. We began with the highly branched ture for the glycone residue rather than 1 C4 conformation
M9 N-glycan (Fig. 3a). As a free-reducing end ligand, M9 (data not shown). This observation suggests that even if a
was appropriately modeled with expected conformations both MAN1A1 structure was included in AF3 training set, its
as a free glycan (Fig. 3b) and when bound as ligand in the internal representation does not rigidly duplicate the training
active site of an α-mannosidase involved in trimming M9 to input. Instead, AF3 accommodates alternative, yet chemically
M5 during N-glycan maturation (MAN1A1 (Lal et al. 1994), plausible, conformations including more flexible branches
Fig. 3c). The α1,3-Man branch of M9 (3-arm) extended deep like the 6-arm and distinct ring-puckering states, presum-
into the catalytic pocket, with the terminal α1,2-Man glycone ably due to the diffusion-based architecture and learned rep-
residue (residue 9) coordinated to an enzyme-bound Ca2+ resentations of relevant complexes from the PDB such as
ion. Notably, residue 9 adopted a distorted 3,O B boat con- the co-crystal structure of MAN1A1 with M9 and La3+
formation (Fig. 3i) rather than its solution-phase 4 C1 chair (Xiang et al. 2016) and glycoside hydrolase family 47 (GH47)
ground state (Fig. 3h). Compared with the crystallographic members co-crystalized with a non-hydrolyzed thiomanno-
structure (Xiang et al. 2016) (Fig. 3d), the 3-arm adopted biose (Karaveg et al. 2005; Thompson et al. 2012) or gly-
a similar binding pose (Fig. 3e) (root-mean-square deviation cone mimic inhibitors, 1-deoxymannojirimycin or kifunensine
(RMSD) 0.284 Å), whereas the core GlcNAc (residue 2) and (Vallee et al. 2000; Lobsanov et al. 2002).
Glycobiology, 2025, Vol, 35, Issue 10 5

Downloaded from https://academic.oup.com/glycob/article/35/10/cwaf048/8242499 by Grand Canyon University user on 25 September 2025


Fig. 3. Benchmarking AlphaFold 3 models with MAN1A1/M9/Ca 2+ Michaelis complex. a) SNFG cartoon representation of M9 N-glycan; residue
numbers are consistently annotated across panels. b) AF3-modeled free-reducing end M9 glycan. c) Model of the Mus musculus Golgi mannosidase
MAN1A1 bound to M9 and Ca2+ , representing a Michaelis complex; the active site is highlighted, showing the glycan binding pose and interacting
residues. d) Crystallographic structure of MAN1A1 in complex with M8 and La3+ (PDB: 5KKB). e) Structural alignment of the AF3 MAN1A1/M9/Ca2+
model (colored) with the crystallographic structure (tan); mannose residues on the 3-arm (residues 4, 6, and 9) overlay well in the catalytic pocket, while
solvent-exposed residues display greater divergence from the complex in the crystal structure. The root-mean-square deviation (RMSD) of residues 3, 4,
6, and 9 (colored in red) between the AF3 model and 5KKB is annotated. f) M9 glycan modeled as an N-glycan attached to H. sapiens erythropoietin
(EPO). g) MAN1A1 (green) modeled in complex with Ca2+ and M9 carried on EPO (cyan). h-j) Cremer-Pople ring puckering conformations of
α1,2-mannose (residue 9) in different contexts: h) free-reducing end M9 (panel b) exhibiting a stable 4 C 1 chair; i) MAN1A1-M9-Ca2+ model (panel c)
showing a distorted 3,O B boat; j) crystallographic 5KKB structure (panel d) adopting an unfavorable 1 C 4 chair.

To evaluate generalizability, we modeled M9 as a PTM AF3 to effectively model enzyme-glycoprotein complexes and
on erythropoietin (EPO), which features solvent-exposed N- recapitulate physiologically relevant intermediates.
glycosylation sites with minimal steric hindrance (Adams We next evaluated a more complex case involving a glyco-
et al. 2022) (Fig. 3f). When co-modeled with MAN1A1, the syltransferase bound to a divalent cation, a sugar nucleotide
3-arm adopted a binding pose (Fig. 3g) nearly identical to donor, and a glycan acceptor. The A1 N-glycan (Fig. S2a
the complex that was observed using the free-reducing M9 and b) was modeled in complex with MGAT2 (Kadirvelraj
(RMSD 0.264 Å). These results highlight the capacity of et al. 2018) (Fig. S2c), the GlcNAc transferase responsible
6 Huang et al.

for converting hybrid to complex N-glycans. The 3-arm Glc- We also modeled dolichol pyrophosphate (Dol-PP)-linked
NAc (residue 6) oriented away from the catalytic pocket chitobiose, a key intermediate in N-glycan biosynthesis
toward an “exosite” identified in the MGAT2 crystal struc- (Ramírez and Locher 2023) (Fig. 4d). Octaprenyl pyrophos-
ture, while the 6-arm mannose (residue 5) extended into the phate (CCD code: OTP) was used as a Dol-PP analog. Unlike
active site and aligned with UDP-GlcNAc, which was coordi- the β-linked chitobiose commonly found as a protein PTM,
nated with a manganese ion (Mn2+ ). The modeled complex AF3 correctly modeled the α-linked Dol-PP glycan (Ramírez
closely matched the crystallographic pose (Kadirvelraj et al. et al. 2019) (Fig. 4i). The model generated in complex with
2018), including the glycan (RMSD 0.241 Å), nucleotide, and ALG1, Mn2+ , and GDP-mannose exhibited a plausible
metal ion arrangement (Fig. S2d and e). This model was also geometry for each ligand component, with the C4 hydroxyl

Downloaded from https://academic.oup.com/glycob/article/35/10/cwaf048/8242499 by Grand Canyon University user on 25 September 2025


recapitulated on EPO as a glycoprotein scaffold (Fig. S2f) and of the terminal GlcNAc (residue 2) properly oriented toward
in the ternary complex with MGAT2 (Fig. S2g). the donor GDP-Man.
To further test the performance of AF3, we benchmarked As the most structurally complex example, we mod-
G2F, a fully extended, biantennary, core-fucosylated N-glycan eled a glycosylphosphatidylinositol (GPI)-anchored protein
(Fig. S3a and b), as the ligand for ST6GAL1 (Fig. S3c), an (Kinoshita 2020) using glypican-1 (GPC1) as the scaffold
α2,6-sialyltransferase. The 3-arm of G2F showed extensive (Fig. 4e and 4j). The phosphatidylinositol (PI) anchor was
interactions within the ST6GAL1 catalytic pocket, consistent assembled from glycerol (GOL), palmityl alcohol (PL3),
with the known glycan branch preference for the enzyme stearic acid (STE), and inositol-1-phosphate (IPD) (Kinoshita
(Barb et al. 2009), while the 6-arm remained more disordered and Fujita 2016). The ethanolamine phosphate moiety was
due to solvent exposure and minimal interaction with the pro- built using ETA and PO4. To establish a realistic covalent
tein surface. The modeled conformation corresponded closely linkage between the GPI anchor and the C-terminal serine (Xu
to the crystallographic structure (RMSD 0.236 Å) (Kuhn et al. et al. 2023) of GPC1, we employed the ptmType function to
2013) (Fig. S3d and e), where the G2F glycan originated from define a more plausible amide bond length. While many top-
the N-glycosylation of a crystallographic symmetry mate. ranked models exhibited conformational artifacts, reasonable
Surprisingly, AF3 was able to model this substrate interaction models were obtained by increasing the number of random
as a ligand despite the complex not being explicitly present seeds. This challenging case illustrates the capacity of AF3 to
in the PDB file used for training (Kuhn et al. 2013). We accommodate a wide range of covalent bonding topologies
also placed G2F on EPO as a PTM (Fig. S3f) and mod- across lipid, glycan, and protein domains.
eled an equivalent glycan-protein complex with ST6GAL1
(Fig. S3g). Structural modeling of O-linked glycans in protein
Across all three high-complexity scenarios, M9 with contexts
MAN1A1, A1 with MGAT2, and G2F with ST6GAL1, AF3 O-linked glycans attached to disordered or flexible regions of
generated stereochemically and conformationally plausible proteins pose significant obstacles for structural elucidation.
models that were in close agreement with crystallographic Consequently, these glycoprotein architectures are underrep-
data. In several cases, it also captured subtleties such as ring resented in experimental datasets and, by extension, likely
puckering and flexible branching, offering valuable structural underrepresented in the AF3 training set. To assess whether
insight into glycan-protein interactions. AF3 can generate feasible glycan conformations in such con-
texts, we systematically modeled several representative O-
Modeling lipid-linked glycans for covalent glycan structures.
glycan-lipid-protein assemblies Four common O-GalNAc-linked core structures (Wandall
Building upon the optimized bondedAtomPairs (BAP) syntax et al. 2021), including the branched core 2 and core 4 struc-
and effective benchmarking of protein-glycan complexes, we tures, were modeled on EPO and yielded stereochemically
next extended our evaluation to lipid-linked glycoconjugates. correct conformations (Fig. 5a-d). We further extended core
Glycosphingolipids (GSLs), composed of glycans linked to 1 into a common disialylated configuration (Fig. 5e) and core
ceramide, represent a structurally diverse and biologically sig- 2 into a glycosaminoglycan (GAG) keratan sulfate (KS) (Wu
nificant class of glycolipids (Schnaar et al. 2022). As the CCD et al. 2024) (Fig. 5f). Both extensions were modeled and exhib-
database lacks a direct entry for ceramide, we modeled it by ited chemically and conformationally plausible geometries.
covalently linking sphingosine and stearic acid (CCD codes: We next examined the O-Xyl linker motif that initiates
SPH and STE). Complex GSLs such as the fully extended, sia- heparan sulfate (HS) and chondroitin sulfate (CS) biosynthesis
lylated ganglioside GP1c (Fig. 4a and 4f) and the highly fuco- (Lin et al. 2024). AF3 can model HS-modified GPC1 (Fig. 5g)
sylated Lewis b antigen lactoside (Fig. 4b and 4g) were cor- and CS-modified bikunin (Fig. 5h), albeit with credible con-
rectly modeled without notable conformational distortions. formers emerging from lower-ranked predictions, similar to
To assess receptor interactions, we modeled the globoside the behavior observed for GPI-anchored models. Another
Globo H (Fig. 4c) in complex with Bc2L-C, a homotrimeric structurally complex example was the extended O-Man core
lectin from the opportunistic pathogen Burkholderia ceno- M3 glycan (Fig. 5i), which serves as the glycan core for
cepacia (Bermeo et al. 2020) (Fig. 4h). To reduce lipid tail matriglycan biosynthesis (Sheikh et al. 2022). This structure
disorder, ceramide-containing glycolipids were positioned on incorporates a non-canonical, non-cyclic moiety, ribitol phos-
a mock lipid layer, resulting in more extended and physiolog- phate, within the glycan backbone (Praissman et al. 2016).
ically plausible lipid tail conformations. Despite the inclusion AF3 was able to resolve both the linkages and overall topology
of additional protein domains and full GSL components in of this structure.
the AF3 model, the binding pose remains consistent (RMSD Collectively, these findings suggest that AF3 can be a valu-
0.137 Å) with the known protein structure (PDB: 6TIG) able tool in generating plausible conformations of O-linked
(Bermeo et al. 2020). glycans, despite the limited representation of such structures
Glycobiology, 2025, Vol, 35, Issue 10 7

Downloaded from https://academic.oup.com/glycob/article/35/10/cwaf048/8242499 by Grand Canyon University user on 25 September 2025

Fig. 4. Gallery of lipid-linked glycans modeled by AlphaFold 3. a-e) SNFG cartoon representations of lipid-linked glycans. f) Ganglio-series
glycosphingolipid (GSL) GP1c (panel a) featuring a β-linkage between Glc and ceramide (Cer). g) Lacto-series GSL modified with a Lewis b (Leb ) blood
group antigen (panel b). h) Globo-series GSL Globo H (panel c) modeled in complex with B. cenocepacia lectin Bc2L-C; Bc2L-C homotrimer binds to three
Globo H molecules. A mock lipid layer was modeled to orient the lipid tail positioning. The root-mean-square deviation (RMSD) of residues 4–6 (colored
in red) between the AF3 model and 6TIG is annotated. i) Dolichol pyrophosphate (dol-PP)-linked chitobiose (panel d) forming Michaelis complex with
GDP-Man, Mn2+ , and H. sapiens mannosyltransferase ALG1. Notably, chitobiose is linked to dol-PP with an α anomeric configuration. j) Extended
glycosylphosphatidylinositol (GPI) anchor (panel e) linked to H. sapiens glypican-1 (GPC1) through phosphoethanolamine (PEtN) and the C-terminal
serine (S530 ). The sialyl linkage has not been determined; an α2,3-sialic acid was modeled provisionally.
8 Huang et al.

Downloaded from https://academic.oup.com/glycob/article/35/10/cwaf048/8242499 by Grand Canyon University user on 25 September 2025


Fig. 5. AlphaFold 3 modeling of diverse O-linked glycans. a-d) Core 1–4 O-GalNAc glycans modeled on Homo sapiens erythropoietin (EPO); only the
loop regions are displayed to highlight the glycans. e) Extended disialyl core 1 structure on EPO. f) Extended core 2 glycan featuring keratan sulfate (KS)
on EPO, with 6-O-sulfation on GlcNAc and Gal. g) Heparan sulfate (HS) linker modeled on H. sapiens glypican-1 (GPC1), including 2-O-phosphorylation
on Xyl. h) Chondroitin sulfate (CS) linker modeled on H. sapiens bikunin, also containing 2-O-phosphorylation on Xyl. i) Extended O-Man Core M3 glycan
modeled on H. sapiens α-dystroglycan with 6-O-phosphorylation on Man.

in the training set, particularly when supplemented by input Varki 1998; Collins et al. 2004; Macauley et al. 2015; Wasim
optimization and increased sampling. et al. 2019). This masking can be disrupted upon encountering
high-avidity or high-affinity ligands, such as 6-O-sulfo-6 -
Accurate syntax enables modeling of biologically sialyl LacNAc (Collins et al. 2006; Macauley et al. 2015; Wu
complex glycoprotein architectures et al. 2023; Ma et al. 2024; Jung et al. 2025).
AF3 can model protein-glycan complexes with low compu- To investigate this behavior, we modeled two Siglec-
tational cost and rapid turnaround, making it feasible to 2 molecules, each fully modified with eleven G2S2 N-
investigate larger and more intricate glycoprotein assemblies. glycans (Fig. 6a). The resulting structure (Fig. 6c) recapitu-
To demonstrate its broader applicability, we modeled lated features consistent with prior negative-stain electron
CD22, also known as sialic acid-binding immunoglobulin- microscopy (EM) and small-angle X-ray scattering (SAXS)
like lectin 2 (Siglec-2), a transmembrane lectin comprising studies (Ereño-Orbea et al. 2017), including a ∼ 120◦ angular
six immunoglobulin constant (IgC)-like domains and a displacement between the IgV-IgC2 and IgC3-IgC6 domains.
membrane-distal immunoglobulin variable (IgV)-like lectin The C-terminal intracellular tails were erroneously folded
domain that projects beyond the cell-surface glycocalyx back toward IgC6, an artifact resulting from the lack of an
(Macauley et al. 2014; Duan and Paulson 2020) (Fig. 6d). explicit membrane boundary for the model.
Siglec-2 contains four N-glycosylation sites on the IgV The two IgV lectin domains displayed cis-interactions in
domain and seven additional sites distributed across the IgC which G2S2 glycans at Asn67 of one Siglec-2 interacted
domains (Wasim et al. 2019). While the IgV domain mediates with the opposing IgV domain (Fig. 6e). The glycan-lectin
recognition of α2,6-sialylated glycans to activate downstream binding pose aligned closely with the crystallographic struc-
intracellular signaling (Duan and Paulson 2020), its binding ture (RMSD 0.292 Å, PDB: 5VKM); notably, the crystal
pocket is frequently masked by cis-interactions with α2,6- structure includes only a terminal disaccharide and com-
sialylated glycans on adjacent Siglec-2 molecules (Razi and prises just the IgV-IgC2 domains, with five N-glycosylation
Glycobiology, 2025, Vol, 35, Issue 10 9

Downloaded from https://academic.oup.com/glycob/article/35/10/cwaf048/8242499 by Grand Canyon University user on 25 September 2025


Fig. 6. Cis- and trans-interactions between fully N-glycosylated Siglec-2 (CD22) molecules revealed by AlphaFold 3. a) SNFG cartoon
representations of G2S2 N-glycan and b) G2S2 carrying a 6-O-sulfated GlcNAc (GlcNAc6S) modification. c) AF3 model of two fully N-glycosylated H.
sapiens Siglec-2 molecules; one chain is colored green and the second cyan. Domains on the green chain are annotated. d) Domain architecture of
Siglec-2, comprising an N-terminal immunoglobulin variable (IgV)-like lectin domain, six immunoglobulin constant (IgC)-like domains, a transmembrane
region (TM), and intracellular signaling motifs. N-glycosylation sites on each domain are indicated. e) Close-up of the cis-interaction captured by AF3: The
IgV domain one Siglec-2 chain (green, surface representation) engages the G2S2 N-glycan from the adjoining Siglec-2 chain (cyan, cartoon
representation), with interacting amino acids shown in line and the glycan shown as sticks with SNFG coloring for the respective monosaccharides. The
root-mean-square deviation (RMSD) of residues 9 and 11 (colored in red) between the AF3 model and 5VKM is annotated. f) AF3 model of two Siglec-2
molecules in the presence of a free-reducing end G2S2 (GlcNAc6S) glycan mimicking trans-ligand binding. The original cis-interaction is disrupted and
replaced by binding to the free G2S2 (GlcNAc6S) glycan. An additional electrostatic interaction between lysine (Lys66) and the sulfate group suggests
enhanced affinity relative to the cis-bound G2S2.

sites mutated to alanine, thereby precluding cis-interaction (Abramson et al. 2024). To evaluate if glycan-models guided
(Ereño-Orbea et al. 2017). To probe trans-interactions, we by the bondedAtomPairs (BAP) syntax could recapitulate the
introduced a free-reducing G2S2 glycan carrying GlcNAc-6- respective crystal structures of protein-glycan complexes, we
O-sulfate (Fig. 6b), which mimics a high-affinity trans-ligand. used three criteria to identify potential candidates for testing
In this context, the cis-interaction was disrupted, and Lys66 our modeling strategy. The protein structure should contain a
was positioned in proximity to the 6-O-sulfate group on bound disaccharide or larger, the protein structure should not
GlcNAc, suggesting an electrostatic interaction for the free contain a close homolog containing a bound ligand deposited
G2S2 ligand (Fig. 6f). Although further empirical validation before 2023 January 13, and the modeled glycan should have
is required to confirm the binding mechanism, it is remark- appropriate stereochemistry for all carbohydrate structural
able that AF3 captured such a nuanced glycan-glycoprotein components. Four example complexes were chosen: Homo
interaction within a highly complex system. This example sapiens B3GALT5 (Lo et al. 2025), Pasteurella multocida
highlights the potential of AF3 to explore structural hypothe- heparosan synthase 2 (PmHS2) (Stancanelli et al. 2024), H.
ses involving post-translational modifications and dynamic sapiens aggrecan (Otsuka et al. 2025), and Cucumis sativus
receptor-ligand switching. phloem protein 2 (PP2) (Bobbili et al. 2023).
The human β-galactosyltransferase, B3GALT5, involved in
AlphaFold 3 models glycan complexes not present core 3 O-glycan synthesis and β1,3-galactosylation of glyco-
in the training data set lipids, was co-crystalized with the GlcNAc-β1,3-GalNAc-Thr
The example complexes above were all compared with pro- Core 3 O-GalNAc disaccharide, UDP, and Mn2+ to obtain
tein crystal structures that were deposited in PDB prior to the ligand-bound crystal structure (8ZX3 (Lo et al. 2025),
the cutoff date used in AF3 training and could potentially PDB release: 2025 April 9; resolution 2.09 Å, Fig. S4a and b).
be viewed as directly replicating data used in its training. The corresponding AF3 Michaelis complex model (Fig. S4c)
To determine if AF3 could effectively model glycan-protein recapitulated the binding pose of the terminal GlcNAc with
complexes not present in its training data set, we examined close agreement to the crystal structure (RMSD: 0.096 Å;
protein-glycan complex structures in PDB that were released Fig. S4d).
after 2023 January 13, since the cutoff date for the PDB PmHS2 is a dual-domain glycosyltransferase that forms a
training dataset of AF3 was 2021 September 30, with the homodimer, catalyzing the transfer of GlcNAc and GlcA to
additional validation dataset extending to 2023 January 12 form a (-4-GlcA-β1,4-GlcNAc-α-)n heparosan polymer. The
10 Huang et al.

structure of PmHS2 was solved by cryo-EM with bound 2- and b) showing specific interactions with the 3-arm of the
O-sulfated HS 5-mer, UDP, and Mn2+ (8VIW (Stancanelli glycan, while the solvent-exposed core and fucose residues
et al. 2024), PDB release 2024 July 24, resolution 3.30 Å, remained flexible. The AF3 model reflected this binding pose
Fig. S5a and b). AF3 modeled the dimeric complex with UDP- well (RMSD 0.189 Å; Fig. S12c and d).
GlcNAc, Mn2+ , and HS as a Michaelis complex (Fig. S5c) Botulinum neurotoxin A (BoNT/A) mutant was crystallized
showing good agreement with the empirical structure (RMSD: in complex with GM1b ganglioside (8RVG, PDB release 2025
0.360 Å; Fig. S5d). February 12, resolution 1.9 Å, Fig. S13a and b), highlighting
The G1 domain of aggrecan shows hyaluronan (HA) bind- interactions with residues near the reducing end. Amino acid
ing activity and the crystal structure of the protein complex residues within 4 Å of the glycan were used for comparison.

Downloaded from https://academic.oup.com/glycob/article/35/10/cwaf048/8242499 by Grand Canyon University user on 25 September 2025


with bound HA 10-mer was determined (9DFF (Otsuka et al. The AF3 model reproduced the binding pose closely (RMSD
2025), PDB release 2025 April 30, resolution 2.59 Å, Fig. S6a 0.223 Å; Fig. S13c and d).
and b). The non-reducing end unsaturated GlcA, a lyase- Together, these examples span diverse protein classes,
derived product, was excluded from the comparison. The including glycosyltransferases, glycoside hydrolases, lectins,
AF3 model recapitulated the observed binding pose for the and polysaccharide lyases. In each case, the corresponding
remainder of the bound HA product with good agreement modeled complex closely matches the bound pose of the
(RMSD: 0.700 Å; Fig. S6c and d). glycan ligand observed in the crystal structure, despite these
PP2 is a plant lectin recognizing GlcNAc moiety on a glycan specific complexes not being present in the AF3 training set.
chain. The structure of chitotriose-bound PP2 was character- These results suggest a broader applicability of AF3 in mod-
ized (7W4B (Bobbili et al. 2023), PDB release 2023 March 8, eling glycan-protein complexes than previously anticipated.
resolution 2.50 Å, Fig. S7a and b). The AF3 model replicated Notably, the modeling of complex polysaccharides such as
the binding pose with high accuracy (RMSD: 0.252 Å; Fig. S7c heparan sulfate (HS), hyaluronic acid (HA), and alginate
and d). highlights the capability of AF3 to model such challenging
We further broadened our filtering criteria to include polymer ligands when bound to enzymes and lectins.
newly released protein structures that contain homologous However, we emphasize that glycan structure modeling in
sequences or are complexed with ligands not represented in AF3 is highly context dependent. We observed numerous
the AlphaFold 3 training set. Examples include Bacteroides instances where modeled glycan structures did not maintain
thetaiotamicron endoglycosidase BT1285 (Sastre et al. 2024), appropriate glycan stereochemistry, or the modeled complexes
Bacteroides ovatus polysaccharide lyase family 38 (BoPL38), failed to replicate interactions present in the corresponding
Pedobacter terrae glucanase (Tandrup et al. 2025), Bacillus crystal structures. Therefore, while this approach offers a
circulans xylanase (PDB: 7WWC), Escherichia coli fimbrial valuable tool for hypothesis generation in biologically rele-
adhesin FimH (Krammer et al. 2023), and botulinum vant glycan-protein interactions, each model must be carefully
neurotoxin A (PDB: 8RVG). evaluated, and appropriate structural and functional valida-
The endoglycosidase BT1285 was crystallized with an M9 tion remains essential.
N-glycan (8U48 (Sastre et al. 2024), PDB release 2024 May
29, resolution 1.90 Å, Fig. S8a and b), showing extensive inter-
action with glycan residues 1 through 7, where the cleavage
site is between residues 1 and 2. AF3 reproduced the binding Discussion
pose with high accuracy (RMSD: 0.324 Å; Fig. S8c and d). AF3 presents new opportunities for ligand modeling, and
The crystal structure of BoPL38 was determined with a our evaluation across multiple glycan classes demonstrates its
bound alginate substrate (-4-D-mannuronic acid (ManA)- potential as a powerful tool for developing structural hypothe-
β-)4 (9FHU (Tandrup et al. 2025), PDB release 2025 July ses that can be further tested empirically. While our study
9, resolution 2.09 Å, Fig. S9a and b). We modeled BoPL38 focused on human glycans, the modeling strategies, input syn-
with the same alginate 4-mer (Fig. S9b) and the AF3 model tax, and curated templates provided here can be generalized
(Fig. S9c) aligns well with the binding pose of residues 2 to to glycan systems across all domains of life. Notably, we
4 within the active site, where the cleavage occurs between present pre-built input files for structurally complex ligands,
residues 3 and 4 (RMSD: 0.261 Å; Fig. S9d). including GPI anchors, featuring covalent linkages between
P. terrae glucanase was crystallized with laminarin (7WWC, glycan, lipid, and protein components.
PDB release 2023 February 15, resolution 2.20 Å, Fig. S10a), Among the available ligand input formats, SMILES is
revealing a cleavage site between glycan residues 2 and 3 the most straightforward, requiring minimal pre- or post-
(Fig. S10b). As no accompanying publication was available, processing. However, when applied to stereochemically
we highlighted side chains for potential protein-glycan inter- complex molecules such as glycans, SMILES frequently
actions within 4 Å. The AF3 model demonstrated good agree- fails to reproduce correct conformations, even for rela-
ment with the crystal structure (RMSD 0.214 Å; Fig. S10c tively simple linear oligosaccharides like LNnT. Converting
and d). SMILES or structural files into userCCD using tools such
The catalytically inactive B. circulans xylanase was co- as rdkit_utils offers more control over conformer generation
crystallized with xylotriose (8QXY, PDB release 2024 and optimization, but this approach also lacks reliability in
August 21, resolution 1.41 Å, Fig. S11a and b). Extensive glycan modeling. This may be attributed to the reliance of
contacts were observed between the enzyme and all three AF3 on RDKit (https://www.rdkit.org) for internal ligand
glycan residues. AF3 successfully reproduced the binding handling, which struggles with the conformational diversity
configuration (RMSD 0.386 Å; Fig. S11c and d). and stereochemical precision required for glycans.
The crystal structure of E. coli fimbrial adhesin FimH-M3F By contrast, idealized coordinates (Westbrook et al. 2015)
N-glycan complex was characterized (7BHD (Krammer et al. generated through proprietary cheminformatics engines such
2023), PDB release 2022 July 20, resolution 1.40 Å, Fig. S12a as Corina (Molecular Networks) (Gasteiger et al. 1990;
Glycobiology, 2025, Vol, 35, Issue 10 11

Sadowski et al. 1994) or OMEGA (OpenEye) (Hawkins We also observed that certain glycan linkages, particularly
et al. 2010; Hawkins and Nicholls 2012), are routinely those underrepresented in the PDB or the AF3 training set,
used in novel PDB ligand deposition and exhibit higher required higher sampling (i.e. more seeds) to recover con-
structural fidelity. These coordinates are incorporated into formationally plausible models. Interestingly, false confor-
CCD entries that undergo rigorous biocuration under the mations were often found among top sample ranking score
Worldwide Protein Data Bank (wwPDB) (Young et al. models, whereas effective geometries frequently appeared in
2018), ensuring consistency and accuracy. Accordingly, lower-ranked outputs. It is unclear how chirality penalties are
idealized CCD input for monosaccharide building blocks enforced in AF3, especially for stereochemically complicated
with appropriate anomeric configurations coupled with glycans. As such, we recommend evaluating both high- and

Downloaded from https://academic.oup.com/glycob/article/35/10/cwaf048/8242499 by Grand Canyon University user on 25 September 2025


bondedAtomPairs (BAP) syntax yielded the most reliable and low-scoring models to identify structurally plausible predic-
stereochemically valid glycan conformations in our evaluation tions when encountering difficulties. This approach proved
of structural models. useful in modeling GPI anchors and HS/CS linkers. However,
when models were extended for HS or CS polymers beyond
Caveats and limitations the linker region, no models with correct configurations were
The use of bondedAtomPairs (BAP) syntax requires both recovered. Common artifacts included incorrect anomeric
glycobiological expertise to define the correct glycan types and configurations, misassigned epimers, and high-energy ring
linkages and a strong foundation in glycochemistry to inter- conformations (data not shown). In many cases, only the
pret conformational plausibility. We addressed the former by regions of HS/CS chains directly interacting with target pro-
compiling input templates for common glycan motifs, and the teins yielded plausible structures. Modeling specific degrees of
latter by systematically evaluating the structural plausibility of polymerization may mitigate this issue.
modeled glycans presented in this study. Analytic tools such In the case of matriglycan, which consists of [Xyl-α1,3-
as GlyProbity (Woods 2019) and Privateer (Dialpuri et al. GlcA-β1,3] disaccharide repeats (Sheikh et al. 2022), AF3
2023) are available for assessing ring puckering and glyco- consistently misassigns α-xylose as β, even in contexts where
sidic torsions based on CCD identifiers, which may increase structures of matriglycan-bound proteins exist in the PDB
the throughput of model quality assessment. However, these (e.g. laminin (Briggs et al. 2016), Lassa virus glycoprotein
tools were originally developed for validating models derived (Katz et al. 2022)). Given the limited natural occurrence and
from X-ray crystallography or cryo-electron microscopy, and greater conformational diversity of α-xylose compared to β-
their applicability to predictions from AF3 remains to be xylose, it may be beneficial to test SMILES or userCCD
fully validated. A more thorough evaluation across diverse input formats when ambiguity in stereochemical encoding
monosaccharides and glycan types is warranted, especially exists. While idealized coordinates in the CCD offer neces-
with an emphasis on comparison with empirically derived sary flexibility to accommodate electron density maps, this
structures to validate the computational models. We anticipate same flexibility can be a double-edged sword for AF3. A
that AF3 will expand access to modeling of glycan-protein notable advantage of this adaptability was observed when
complexes and generation of structural hypotheses for biolog- modeling glycone interactions of an M9 glycan substrate with
ical function. However, like any modeling approach, extreme MAN1A1, where the bound α-Man residue appropriately
care must be taken in overinterpretation of modeling results adopted a high-energy transition state conformation. Such
and all models should be further tested and validated by direct conformational plasticity offers opportunities for mechanistic
empirical experimentation. insights into enzyme catalysis. Nonetheless, this conforma-
If a glycan is modeled into an incorrect conformation, even tional freedom can also obscure critical features, such as
a single anomeric or epimeric misassignment can significantly incorrect assignment of anomeric configurations and epimeric
impact the interpretation of structural and biological function. orientations, thereby complicating the accurate prediction of
However, AF3 currently lacks discernible metrics to penalize glycan structures.
such conformational issues in glycan models. The predicted We also attempted to model a more complex case of
local-distance difference test (pLDDT) (Mariani et al. 2013), glycoenzyme interactions with the conserved Fc N-glycan
commonly used to assess protein confidence, is unreliable on immunoglobulin G (IgG). There are numerous structures
for ligands. We observed instances in which glycans dis- in PDB of IgG or Fc domains containing complex type
played incorrect linkages, epimers, or conformations while biantennary N-glycan structures tightly packed between Fc
retaining high pLDDT scores, particularly when located in chains, effectively inaccessible to glycan-processing enzymes.
catalytic pockets. Conversely, when glycans extend into sol- Attempts to model glycan processing enzymes such as
vent as post-translational modifications, pLDDT scores often MGAT1, MAN2A1, MGAT2, FUT8, and B4GALT1 in
decrease with distance from the protein core, even when complex with the Fc domain N-glycan structure were
conformations are plausible (Fig. S14). Other AF3 scoring unsuccessful. The positions of the glycan structures remained
metrics such as predicted aligned error (PAE) (Evans et al. consistent with crystallographic observations within the inte-
2022), predicted template modeling (pTM) (Zhang and Skol- rior of the Fc homodimer core. In each case, AF3 prioritized
nick 2004), interface PTM (ipTM) (Evans et al. 2022), and interactions between the glycan processing enzymes and the
the sample ranking score (Abramson et al. 2024) also fail to Fc polypeptide rather than the N-glycan structures (data not
capture glycan-specific conformational errors. RMSD com- shown). This suggests that the AF3 modeling was strongly
parisons with crystallographic or energy-minimized structures biased by the abundant Fc domain crystal structures in
are similarly insufficient, as glycan flexibility can obscure the PDB training set. As a result, AF3 presently lacks the
subtle but critical deviations. Although glycosidic torsion capacity to capture the dynamic and transient positioning
angles offer more direct insight, comprehensive reference data of glycans and glycan-bearing C’E loops in the Fc domain
exist primarily for N-glycans (e.g. GlyProbity and Privateer), (Borrok et al. 2012), a feature that has been experimentally
limiting broader evaluation of glycan model accuracy. validated by nuclear magnetic resonance (NMR) (Barb and
12 Huang et al.

Prestegard 2011) and in IgG Fc-endoglycosidase S (EndoS) modeled in complex with both glycolipid and glycoprotein
co-crystal structures (Trastoy et al. 2018). Efforts to model an ligands to better understand its multivalent binding and role
IgG Fc-endoglycosidase S (EndoS) complex with the glycan in infection (Sulák et al. 2011). Similarly, AF3-generated
positioned in the active site were also unsuccessful, despite models of cis-interactions between CD22 molecules may offer
this complex presumably being present in the training dataset insights into dynamic regulatory mechanisms that are difficult
for modeling. In contrast, AF3 was able to recapitulate key to capture through crystallography alone.
aspects of the interaction between glycosylated IgG Fc and Although further improvements in accuracy are needed
glycosylated Fcγ RIIIa (CD16a) (Falconer et al. 2018) (data for antibody–antigen modeling, AF3 shows promise in the
not shown), consistent with known binding modes. These structural prediction of anti-glycan antibodies and glycan-

Downloaded from https://academic.oup.com/glycob/article/35/10/cwaf048/8242499 by Grand Canyon University user on 25 September 2025


observations underscore the importance of critically evaluat- containing epitopes. Given that most biologics are glycopro-
ing AF3-generated models in a context-dependent manner. teins targeting glycoprotein antigens in vivo, AF3 offers a
AF3 does not incorporate an explicit energy function; valuable framework for evaluating both the structural con-
therefore, the generated models do not represent thermody- sequences of glycoform variation and the functional effects of
namic equilibria nor do they explicitly represent energetically glycosylation on therapeutic efficacy.
favored conformations. Furthermore, the predicted structures We have also noted that the AF3 modeling approach
should be interpreted only as static snapshots, similar to could be extended beyond glycoconjugates to other complex
those obtained from empirical methods such as X-ray molecule types. This is made possible by the use of other
crystallography, which contribute substantially to the PDB molecular building blocks encoded with idealized CCD codes
dataset used in AF3 training. The AF3 models of bound and linked via bondedAtomPairs (BAP) syntax. The simplicity
glycans commonly deviate from the equivalent crystal of the JSON scripting strategy for input into AF3 enables
structure complexes when the glycan termini extend into an accessible, flexible, and universal workflow for assem-
solvent. These differences in modeled conformation correlate bling stereochemically complex macromolecules from the
with lower pLDDT scores for the modeled ligands and higher molecular elements found in the CCD library. Consequently,
B-factors for the same regions of the ligand in the crystal this approach significantly broadens future capabilities for
structures indicating greater disorder for solvent oriented modeling biologically important and structurally complex
ligand subregions (Fig. S14). Users interested in glycan molecular interactions.
dynamics or equilibrium conformational sampling are advised In conclusion, although structural validation using orthogo-
to employ complementary computational methods, such as nal methods remains essential, AF3 presents a transformative
molecular dynamics simulations or ensemble-based tools like platform for modeling glycan-mediated interactions across
GlycoShield (Tsai et al. 2024) or GlycoShape (Ives et al. proteins, lipids, and carbohydrates. With thoughtful syntax
2024), which provide structural representations grounded planning and critical evaluation, AF3 has the potential to
in exhaustive conformational sampling. dramatically streamline and accelerate research in the field of
glycoscience.
Unanticipated and versatile applications in
glycoscience and beyond
The relative scarcity of glycan structural data in the PDB, com- Methods
pared to protein structural data, would seemingly pose a chal- AlphaFold 3 version
lenge when training AF3 for glycan structural modeling. Thus, AlphaFold 3 Version 3.0.1 (available at https://github.co
it would be reasonable to assume that effective glycan mod- m/google-deepmind/alphafold3) was run as a Singularity
eling with AF3 would be limited to “pre-trained” structures container on the Sapelo2 high-performance computing
or simple models constructed from the fragmentary glycan cluster at the Georgia Advanced Computing Resource Center
information available in the PDB. Surprisingly, this assump- (GACRC), University of Georgia. Modeling was performed
tion appears to be incorrect. AF3 is capable of modeling a using NVIDIA A100 or H100 GPUs. The AlphaFold 3 model
wide range of glycan-binding proteins, including glycosyl- parameter permission file was granted by the DeepMind team.
transferases, glycoside hydrolases, lectins, and polysaccharide
lyases, containing bound glycan structures that closely resem-
AlphaFold 3 input JSON file
ble crystallographic complexes not included in its training set.
AF3 has clearly “learned” to generalize its predictions and A full description of the input JSON file structure is available
generate plausible glycan models that extend well beyond the on AlphaFold 3 GitHub repository (https://github.com/
scope of its original training data. It remains unclear whether google-deepmind/alphafold3/blob/main/docs/input.md). The
this capability stems from the structural constraints embedded initial JSON template used in this study was adapted from
in the monosaccharide building block CCD files, the par- AFusion (https://github.com/Hanziwww/AlphaFold3-GUI).
tial glycan structures in the training data, or from patterns All modeling was conducted using version 2 of the JSON
derived from non-carbohydrate chemical moieties included in input format.
its datasets. While this ability may seem counterintuitive given
the limited scale of glycan training data, the resulting models SMILES input and glycan energy minimization
offer a powerful tool for hypothesis generation. Energy minimized glycan PDBs were generated using
Our investigations underscore the broad applicability of GLYCAM Carbohydrate Builder web tool (https://glycam.o
AF3 in modeling glycans, both as post-translational modifica- rg/) (Grant et al. 2025) and subsequently converted to SMILES
tions and as free ligands. This capability opens new pathways format through Open Babel web-based converter (https://
for translational glycobiology applications. For instance, www.cheminfo.org/Chemistry/Cheminformatics/FormatCo
Bc2L-C, a dual-domain lectin from B. cenocepacia, can be nverter/index.html) (O’Boyle et al. 2011).
Glycobiology, 2025, Vol, 35, Issue 10 13

Conversion of SMILES and PDB to userCCD format Funding


The rdkit_utils Python script from the AlphaFold 3 repository This work was supported by U.S. National Science Foundation Bio-
(https://github.com/google-deepmind/alphafold3/blob/main/ Foundry: Glycoscience Research, Education and Training [BioF:GREAT
src/alphafold3/data/tools/rdkit_utils.py) was used to convert NSF: 2400220] and National Institutes of Health [GM154846 to
SMILES strings or PDB files into CCD-like mmCIF files for K.W.M.].
use as userCCD inputs. rdkit_utils depends on RDKit (RDKit:
Open-source cheminformatics. https://www.rdkit.org. DOI:
Conflict of interest
10.5281/zenodo.15286010).
None declared.

Downloaded from https://academic.oup.com/glycob/article/35/10/cwaf048/8242499 by Grand Canyon University user on 25 September 2025


CCD library for AlphaFold 3 input
Chemical Component Dictionary (CCD) codes and associated Data availability
mmCIF files were retrieved and manually curated from
All JSON input files and userCCD examples used in this study are
PDBeChem database (https://www.ebi.ac.uk/pdbe-srv/pdbe provided in the Supplementary Document 2. All AF3 models and
chem/) for compatibility with AlphaFold 3 input require- metadata are also available from ModelArchive using the following
ments. link: https://www.modelarchive.org/doi/10.5452/ma-af3glycan. Avail-
ability of predicted structural models is subject to the AlphaFold 3
Protein sequences for AlphaFold 3 input Output Terms of Use (https://github.com/google-deepmind/alphafold3/
Protein sequences were obtained from UniProt (https:// blob/main/OUTPUT_TERMS_OF_USE.md).
www.uniprot.org/) using the following accession numbers: M.
musculus MAN1A1 (P45700), H. sapiens MGAT2 (Q10469), References
H. sapiens ST6GAL1 (P15907), H. sapiens EPO (P01588), B.
cenocepacia Bc2L-C (B4EH86), H. sapiens ALG1 (Q9BT22), Abramson J et al. 2024. Accurate structure prediction of biomolecular
interactions with AlphaFold 3. Nature. 630:493–500. https://doi.o
H. sapiens GPC1 (P35052), H. sapiens Bikunin (P02760),
rg/10.1038/s41586-024-07487-w.
H. sapiens α-dystroglycan (Q14118), H. sapiens Siglec-2
Adams TM, Zhao P, Chapla D, Moremen KW, Wells L. 2022. Sequen-
(P20273), H. sapiens B3GALT5 (Q9Y2C3), P. multocida hep- tial in vitro enzymatic N-glycoprotein modification reveals site-
arosan synthase 2 (Q5SGE1), H. sapiens aggrecan (P16112), specific rates of glycoenzyme processing. J Biol Chem. 298:102474.
C. sativus phloem protein 2 (Q8LK69), B. thetaiotamicron https://doi.org/10.1016/j.jbc.2022.102474.
BT1258 (Q8A889), B. ovatus polysaccharide lyase family 38 Agirre J. 2017. Strategies for carbohydrate model building, refinement
(A0A5M5BWR5), P. terrae glucanase (A0A1G7XNR7), B. and validation. Acta Crystallogr D Struct Biol. 73:171–186. https://
circulans xylanase (P09850), E. coli fimbrial adhesin FimH doi.org/10.1107/S2059798316016910.
(P08191), and C. botulinum neurotoxin A (P0DPI0). Allain EP, Rouleau M, Lévesque E, Guillemette C. 2020. Emerging roles
for UDP-glucuronosyltransferases in drug resistance and cancer
progression. Br J Cancer. 122:1277–1287. https://doi.org/10.1038/
Glycan and protein visualization
s41416-019-0722-0.
Glycan cartoon representations were generated using Gly- Barb AW, Prestegard JH. 2011. NMR analysis demonstrates
coWorkbench 2.1 (stable build 146) (Ceroni et al. 2008), immunoglobulin G N-glycans are accessible and dynamic. Nat
with custom modifications. Protein structures were visualized, Chem Biol. 7:147–153. https://doi.org/10.1038/nchembio.511.
aligned and rendered using PyMOL version 3.1.3.1. Root- Barb AW, Brady EK, Prestegard JH. 2009. Branch-specific sialylation of
mean-square deviation (RMSD) was calculated by PyMol IgG-fc glycans by ST6Gal-I. Biochemistry. 48:9705–9707. https://
align function. doi.org/10.1021/bi901430h.
Bermeo R, Bernardi A, Varrot A. 2020. BC2L-C N-terminal lectin
domain complexed with Histo blood group oligosaccharides pro-
vides new structural information. Molecules. 25:248. https://doi.o
Acknowledgments
rg/10.3390/molecules25020248.
We thank Shan-Ho Tsai and Jordan Utley at the Georgia Advanced Bobbili KB, Sivaji N, Priya B, Suguna K, Surolia A. 2023. Struc-
Computing Resource Center (GACRC), University of Georgia, for their ture and interactions of the phloem lectin (phloem protein 2)
assistance with the installation and maintenance of AlphaFold 3. Cus17 from Cucumis sativus. Structure. 31:464–479.e5. https://
doi.org/10.1016/j.str.2023.02.008.
Borrok MJ, Jung ST, Kang TH, Monzingo AF, Georgiou G. 2012.
Author contributions Revisiting the role of glycosylation in the structure of human IgG fc.
Chin Huang (Conceptualization [equal], Data curation [equal], Formal ACS Chem Biol. 7:1596–1602. https://doi.org/10.1021/cb300130k.
analysis [equal], Investigation [equal], Methodology [equal], Resources Briggs DC et al. 2016. Structural basis of laminin binding to the LARGE
[equal], Software [equal], Validation [equal], Visualization [equal], glycans on dystroglycan. Nat Chem Biol. 12:810–814. https://doi.o
Writing—original draft [equal], Writing—review & editing [equal]), rg/10.1038/nchembio.2146.
Natarajan Kannan (Writing—review & editing [equal]), Kelley W Case DA et al. 2005. The amber biomolecular simulation programs. J
Moremen (Conceptualization [equal], Formal analysis [equal], Funding Comput Chem. 26:1668–1688. https://doi.org/10.1002/jcc.20290.
acquisition [equal], Investigation [equal], Project administration Ceroni A et al. 2008. GlycoWorkbench: a tool for the computer-
[equal], Supervision [equal], Writing—original draft [equal], Writing— assisted annotation of mass spectra of glycans. J Proteome Res. 7:
review & editing [equal]). 1650–1659. https://doi.org/10.1021/pr7008252.
Collins BE et al. 2004. Masking of CD22 by cis ligands does not prevent
redistribution of CD22 to sites of cell contact. Proc Natl Acad Sci
USA. 101:6104–6109. https://doi.org/10.1073/pnas.0400851101.
Supplementary material Collins BE et al. 2006. High-affinity ligand probes of CD22 overcome
Supplementary material is available at Glycobiology Journal online. the threshold set by cis ligands to allow for binding, endocytosis,
14 Huang et al.

and killing of B cells. J Immunol. 177:2994–3003. https://doi.o Karaveg K et al. 2005. Mechanism of class 1 (glycosylhydrolase
rg/10.4049/jimmunol.177.5.2994. family 47) {alpha}-mannosidases involved in N-glycan processing
Cremer D, Pople JA. 1975. General definition of ring puckering coor- and endoplasmic reticulum quality control. J Biol Chem. 280:
dinates. J Am Chem Soc. 97:1354–1358. https://doi.org/10.1021/ 16197–16207.
ja00839a011. Katz M et al. 2022. Structure and receptor recognition by the Lassa
Dashti H et al. 2020. Probabilistic identification of saccharide moi- virus spike complex. Nature. 603:174–179. https://doi.org/10.1038/
eties in biomolecules and their protein complexes. Sci Data. 7:210. s41586-022-04429-2.
https://doi.org/10.1038/s41597-020-0547-y. Kinoshita T. 2020. Biosynthesis and biology of mammalian GPI-
Dialpuri JS et al. 2023. Analysis and validation of overall N-glycan anchored proteins. Open Biol. 10:190290. https://doi.org/10.1098/

Downloaded from https://academic.oup.com/glycob/article/35/10/cwaf048/8242499 by Grand Canyon University user on 25 September 2025


conformation in privateer. Acta Crystallogr D Struct Biol. 79: rsob.190290.
462–472. https://doi.org/10.1107/S2059798323003510. Kinoshita T, Fujita M. 2016. Biosynthesis of GPI-anchored proteins:
Discovery C et al. 2024. Chai-1: decoding the molecular interactions of special emphasis on GPI lipid remodeling. J Lipid Res. 57:6–24.
life. bioRxiv:2024.2010.2010.615955. https://doi.org/10.1194/jlr.R063313.
Duan S, Paulson JC. 2020. Siglecs as immune cell checkpoints in disease. Kirschner KN et al. 2008. GLYCAM06: a generalizable biomolecular
Annu Rev Immunol. 38:365–395. https://doi.org/10.1146/annure force field. Carbohydrates. J Comput Chem. 29:622–655.
v-immunol-102419-035900. Krammer EM et al. 2023. Structural insights into a cooperative switch
Ereño-Orbea J et al. 2017. Molecular basis of human CD22 func- between one and two FimH bacterial adhesins binding pauci- and
tion and therapeutic targeting. Nat Commun. 8:764. https://doi.o high-mannose type N-glycan receptors. J Biol Chem. 299:104627.
rg/10.1038/s41467-017-00836-6. https://doi.org/10.1016/j.jbc.2023.104627.
Evans R et al. 2022. Protein complex prediction with AlphaFold- Krishna R et al. 2024. Generalized biomolecular modeling and design
multimer. bioRxiv:2021.2010.2004.463034. with RoseTTAFold all-atom. Science. 384:eadl2528.
Fadda E. 2022. Molecular simulations of complex carbohydrates and Kuhn B et al. 2013. The structure of human α-2,6-sialyltransferase
glycoconjugates. Curr Opin Chem Biol. 69:102175. https://doi.o reveals the binding mode of complex glycans. Acta Crystal-
rg/10.1016/j.cbpa.2022.102175. logr D Biol Crystallogr. 69:1826–1838. https://doi.org/10.1107/
Falconer DJ, Subedi GP, Marcella AM, Barb AW. 2018. Antibody S0907444913015412.
Fucosylation lowers the Fcγ RIIIa/CD16a affinity by limiting the Lal A, Schutzbach JS, Forsee WT, Neame PJ, Moremen KW. 1994.
conformations sampled by the N162-glycan. ACS Chem Biol. 13: Isolation and expression of murine and rabbit cDNAs encoding an
2179–2189. https://doi.org/10.1021/acschembio.8b00342. alpha 1,2-mannosidase involved in the processing of asparagine-
Feng Z et al. 2004. Ligand depot: a data warehouse for ligands linked oligosaccharides. J Biol Chem. 269:9872–9881. https://doi.o
bound to macromolecules. Bioinformatics. 20:2153–2155. https:// rg/10.1016/S0021-9258(17)36964-8.
doi.org/10.1093/bioinformatics/bth214. Lebrilla CB, Liu J, Widmalm G, Prestegard JH. 2022. Oligosaccharides
Garron ML, Cygler M. 2010. Structural and mechanistic classification and polysaccharides. In: Varki A, et al., editors. Essentials of glyco-
of uronic acid-containing polysaccharide lyases. Glycobiology. 20: biology. Cold Spring Harbor (NY): Cold Spring Harbor Laboratory
1547–1573. https://doi.org/10.1093/glycob/cwq122. Press. pp. 33–42
Gasteiger J, Rudolph C, Sadowski J. 1990. Automatic generation of Li H et al. 2022. Structure of the human GlcNAc-1-phosphotransferase
3D-atomic coordinates for organic molecules. Tetrahedron Com- αβ subunits reveals regulatory mechanism for lysosomal enzyme
puter Methodology. 3:537–547. https://doi.org/10.1016/0898-5529 glycan phosphorylation. Nat Struct Mol Biol. 29:348–356. https://
(90)90156-3. doi.org/10.1038/s41594-022-00748-0.
Grant OC, Woods RJ. 2014. Recent advances in employing molecu- Lin PH et al. 2024. Solid-phase-supported chemoenzymatic synthe-
lar modelling to determine the specificity of glycan-binding pro- sis and analysis of chondroitin Sulfate proteoglycan glycopep-
teins. Curr Opin Struct Biol. 28:47–55. https://doi.org/10.1016/j. tides. Angew Chem Int Ed Engl. 63:e202405671. https://doi.o
sbi.2014.07.001. rg/10.1002/anie.202405671.
Grant OC et al. 2025. Generating 3D models of carbohydrates with Lo JM, Kung CC, Cheng TR, Wong CH, Ma C. 2025. Structure-based
GLYCAM-web. bioRxiv:2025.2005.2008.652828. mechanism and specificity of human galactosyltransferase β3GalT5.
Hawkins PC, Nicholls A. 2012. Conformer generation with OMEGA: J Am Chem Soc. 147:10875–10885. https://doi.org/10.1021/jacs.4
learning from the data set and the analysis of failures. J Chem Inf c11724.
Model. 52:2919–2936. https://doi.org/10.1021/ci300314k. Lobsanov YD et al. 2002. Structure of penicillium citrinum alpha 1,2-
Hawkins PC, Skillman AG, Warren GL, Ellingson BA, Stahl MT. 2010. mannosidase reveals the basis for differences in specificity of the
Conformer generation with OMEGA: algorithm and validation endoplasmic reticulum and Golgi class I enzymes. J Biol Chem. 277:
using high quality structures from the protein databank and Cam- 5620–5630. https://doi.org/10.1074/jbc.M110243200.
bridge structural database. J Chem Inf Model. 50:572–584. https:// Long F et al. 2017. AceDRG: a stereochemical description generator for
doi.org/10.1021/ci100031x. ligands. Acta Crystallogr D Struct Biol. 73:112–122. https://doi.o
He X et al. 2024. Highly accurate carbohydrate-binding site predic- rg/10.1107/S2059798317000067.
tion with DeepGlycanSite. Nat Commun. 15:5163. https://doi.o Luo Y, Parmeggiani F. 2025. CLIMBS: assessing carbohydrate-protein
rg/10.1038/s41467-024-49516-2. interactions through a graph neural network classifier using syn-
Ives CM et al. 2024. Restoring protein glycosylation with Gly- thetic negative data. bioRxiv:2025.2002.2027.640667.
coShape. Nat Methods. 21:2117–2127. https://doi.org/10.1038/ Ma S et al. 2024. Enzyme-sialylation-controlled chemical sulfation
s41592-024-02464-7. of glycan epitopes for decoding the binding of Siglec ligands. J
Jumper J et al. 2021. Highly accurate protein structure prediction Am Chem Soc. 146:29469–29480. https://doi.org/10.1021/jacs.4
with AlphaFold. Nature. 596:583–589. https://doi.org/10.1038/ c08817.
s41586-021-03819-2. Macauley MS, Crocker PR, Paulson JC. 2014. Siglec-mediated regu-
Jung J et al. 2025. Understanding the glycosylation pathways involved lation of immune cell function in disease. Nat Rev Immunol. 14:
in the biosynthesis of the Sulfated glycan ligands for Siglecs. 653–666. https://doi.org/10.1038/nri3737.
ACS Chem Biol. 20:386–400. https://doi.org/10.1021/acschembio.4 Macauley MS et al. 2015. Unmasking of CD22 Co-receptor on germinal
c00677. Center B-cells occurs by alternative mechanisms in mouse and
Kadirvelraj R et al. 2018. Human N-acetylglucosaminyltransferase II man. J Biol Chem. 290:30066–30077. https://doi.org/10.1074/jbc.
substrate recognition uses a modular architecture that includes M115.691337.
a convergent exosite. Proc Natl Acad Sci USA. 115:4637–4642. Mariani V, Biasini M, Barbato A, Schwede T. 2013. lDDT: a local
https://doi.org/10.1073/pnas.1716988115. superposition-free score for comparing protein structures and
Glycobiology, 2025, Vol, 35, Issue 10 15

models using distance difference tests. Bioinformatics. 29: Shao C et al. 2021. Modernized uniform representation of carbohydrate
2722–2728. https://doi.org/10.1093/bioinformatics/btt473. molecules in the protein data Bank. Glycobiology. 31:1204–1218.
Mendoza F, Masgrau L. 2021. Computational modeling of carbohy- Sheikh MO et al. 2022. Cell surface glycan engineering reveals that
drate processing enzymes reactions. Curr Opin Chem Biol. 61: matriglycan alone can recapitulate dystroglycan binding and func-
203–213. https://doi.org/10.1016/j.cbpa.2021.02.012. tion. Nat Commun. 13:3617.
Meng L et al. 2013. Enzymatic basis for N-glycan sialylation: struc- Smith BAH, Bertozzi CR. 2021. The clinical impact of glycobiol-
ture of rat α2,6-sialyltransferase (ST6GAL1) reveals conserved ogy: targeting selectins, Siglecs and mammalian glycans. Nat Rev
and unique features for glycan sialylation. J Biol Chem. 288: Drug Discov. 20:217–243. https://doi.org/10.1038/s41573-020-
34680–34698. https://doi.org/10.1074/jbc.M113.519041. 00093-1.

Downloaded from https://academic.oup.com/glycob/article/35/10/cwaf048/8242499 by Grand Canyon University user on 25 September 2025


Moremen KW, Haltiwanger RS. 2019. Emerging structural insights into Stancanelli E et al. 2024. Structural and functional analysis of Hep-
glycosyltransferase-mediated synthesis of glycans. Nat Chem Biol. arosan synthase 2 from Pasteurella multocida (PmHS2) to improve
15:853–864. https://doi.org/10.1038/s41589-019-0350-2. the synthesis of heparin. ACS Catal. 14:6577–6588. https://doi.o
Moremen KW, Trimble RB, Herscovics A. 1994. Glycosidases of the rg/10.1021/acscatal.4c00677.
asparagine-linked oligosaccharide processing pathway. Glycobiol- Stanley P, Moremen KW, Lewis NE, Taniguchi N, Aebi M. 2022.
ogy. 4:113–125. https://doi.org/10.1093/glycob/4.2.113. N-glycans. In: Varki A, et al., editors. Essentials of glycobiology.
Moremen KW, Tiemeyer M, Nairn AV. 2012. Vertebrate protein gly- Cold Spring Harbor (NY): Cold Spring Harbor Laboratory Press.
cosylation: diversity, synthesis and function. Nat Rev Mol Cell Biol. pp. 103–116
13:448–462. https://doi.org/10.1038/nrm3383. Steiner RA, Tucker JA. 2017. Keep it together: restraints in crys-
O’Boyle NM et al. 2011. Open babel: an open chemical toolbox. J tallographic refinement of macromolecule-ligand complexes. Acta
Cheminform. 3:33. Crystallogr D Struct Biol. 73:93–102. https://doi.org/10.1107/
Otsuka MY et al. 2025. Aggrecan immobilizes to perineuronal nets S2059798316017964.
through hyaluronan-dependent and hyaluronan-independent bind- Stowell SR et al. 2010. Innate immune lectins kill bacteria express-
ing activities. J Biol Chem. 301:108525. https://doi.org/10.1016/j. ing blood group antigen. Nat Med. 16:295–301. https://doi.o
jbc.2025.108525. rg/10.1038/nm.2103.
Passaro S et al. 2025. Boltz-2: towards accurate and efficient binding Sulák O et al. 2011. Burkholderia cenocepacia BC2L-C is a super lectin
affinity prediction. bioRxiv:2025.2006.2014.659707. with dual specificity and proinflammatory activity. PLoS Pathog.
Perez S, Makshakova O. 2022. Multifaceted computational Model- 7:e1002238. https://doi.org/10.1371/journal.ppat.1002238.
ing in Glycoscience. Chem Rev. 122:15914–15970. https://doi.o Tandrup T et al. 2025. The Swiss Army knife of alginate metabolism:
rg/10.1021/acs.chemrev.2c00060. mechanistic analysis of a mixed-function polysaccharide
Pinho SS, Alves I, Gaifem J, Rabinovich GA. 2023. Immune regulatory lyase/epimerase of the human gut microbiota. J Am Chem
networks coordinated by glycans and glycan-binding proteins in Soc. 147:23594–23607. https://doi.org/10.1021/jacs.5c03557.
autoimmunity and infection. Cell Mol Immunol. 20:1101–1113. Thompson AJ et al. 2012. The reaction coordinate of a bacterial GH47
Praissman JL et al. 2016. The functional O-mannose glycan on α- α-mannosidase: a combined quantum mechanical and structural
dystroglycan contains a phospho-ribitol primed for matriglycan approach. Angew Chem Int Ed Engl. 51:10997–11001. https://doi.o
addition. Elife. 5:e14473. https://doi.org/10.7554/eLife.14473. rg/10.1002/anie.201205338.
Prestegard JH. 2021. A perspective on the PDB’s impact on the field of Trastoy B et al. 2018. Structural basis for the recognition of complex-
glycobiology. J Biol Chem. 296:100556. https://doi.org/10.1016/j. type N-glycans by endoglycosidase S. Nat Commun. 9:1874.
jbc.2021.100556. Tsai YX et al. 2024. Rapid simulation of glycoprotein structures by
Ramírez AS, Locher KP. 2023. Structural and mechanistic studies of grafting and steric exclusion of glycan conformer libraries. Cell.
the N-glycosylation machinery: from lipid-linked oligosaccharide 187:1296–1311.e26. https://doi.org/10.1016/j.cell.2024.01.034.
biosynthesis to glycan transfer. Glycobiology. 33:861–872. https:// Urbanowicz BR et al. 2012. 4-O-methylation of glucuronic acid in
doi.org/10.1093/glycob/cwad053. Arabidopsis glucuronoxylan is catalyzed by a domain of unknown
Ramírez AS, Kowal J, Locher KP. 2019. Cryo-electron microscopy function family 579 protein. Proc Natl Acad Sci USA. 109:
structures of human oligosaccharyltransferase complexes OST-A 14253–14258. https://doi.org/10.1073/pnas.1208097109.
and OST-B. Science. 366:1372–1375. https://doi.org/10.1126/scie Vallee F, Karaveg K, Herscovics A, Moremen KW, Howell PL. 2000.
nce.aaz3505. Structural basis for catalysis and inhibition of N-glycan processing
Razi N, Varki A. 1998. Masking and unmasking of the sialic acid- class I alpha 1,2-mannosidases. J Biol Chem. 275:41287–41298.
binding lectin activity of CD22 (Siglec-2) on B lymphocytes. Proc https://doi.org/10.1074/jbc.M006927200.
Natl Acad Sci USA. 95:7469–7474. https://doi.org/10.1073/pna Varki A. 2017. Biological roles of glycans. Glycobiology. 27:3–49.
s.95.13.7469. https://doi.org/10.1093/glycob/cww086.
Sadowski J, Gasteiger J, Klebe G. 1994. Comparison of automatic three- Varki A et al. 2015. Symbol nomenclature for graphical representations
dimensional model builders using 639 X-ray structures. J Chem Inf of glycans. Glycobiology. 25:1323–1324. https://doi.org/10.1093/
Comput Sci. 34:1000–1008. https://doi.org/10.1021/ci00020a039. glycob/cwv091.
Sastre DE et al. 2024. Human gut microbes express functionally distinct Wandall HH, Nielsen MAI, King-Smith S, de Haan N, Bagdonaite
endoglycosidases to metabolize the same N-glycan substrate. Nat I. 2021. Global functions of O-glycosylation: promises and chal-
Commun. 15:5123. https://doi.org/10.1038/s41467-024-48802-3. lenges in O-glycobiology. FEBS J. 288:7183–7212. https://doi.o
Schnaar RL, Gerardy-Schahn R, Hildebrandt H. 2014. Sialic acids rg/10.1111/febs.16148.
in the brain: gangliosides and polysialic acid in nervous system Wang HT et al. 2021. Rational enzyme design for controlled func-
development, stability, disease, and regeneration. Physiol Rev. 94: tionalization of acetylated xylan for cell-free polymer biosynthesis.
461–518. https://doi.org/10.1152/physrev.00033.2013. Carbohydr Polym. 273:118564. https://doi.org/10.1016/j.carbpo
Schnaar RL, Sandhoff R, Tiemeyer M, Kinoshita T. 2022. Glycosph- l.2021.118564.
ingolipids. In: Varki A, et al., editors. Essentials of glycobiology. Wasim L et al. 2019. N-linked glycosylation regulates CD22 organiza-
Cold Spring Harbor (NY): Cold Spring Harbor Laboratory Press. tion and function. Front Immunol. 10:699. https://doi.org/10.3389/
pp. 129–140 fimmu.2019.00699.
Seeberger PH. 2022. Monosaccharide diversity. In: Varki A, et al., Weininger D. 1988. SMILES, a chemical language and information
editors. Essentials of glycobiology. Cold Spring Harbor (NY): Cold system. 1. Introduction to methodology and encoding rules. J
Spring Harbor Laboratory Press. pp. 21–32 Chem Inf Comput Sci. 28:31–36. https://doi.org/10.1021/ci0005
Senior AW et al. 2020. Improved protein structure prediction using 7a005.
potentials from deep learning. Nature. 577:706–710. https://doi.o Westbrook JD et al. 2015. The chemical component dictionary:
rg/10.1038/s41586-019-1923-7. complete descriptions of constituent molecules in experimentally
16 Huang et al.

determined 3D macromolecules in the protein data Bank. Bioin- Fucosylation and sulfation patterns. J Am Chem Soc. 146:
formatics. 31:1274–1278. https://doi.org/10.1093/bioinformatics/ 9230–9240. https://doi.org/10.1021/jacs.4c00363.
btu789. Xiang Y, Karaveg K, Moremen KW. 2016. Substrate recognition and
Woods RJ. 2018. Predicting the structures of glycans, glycoproteins, catalysis by GH47 α-mannosidases involved in Asn-linked glycan
and their complexes. Chem Rev. 118:8005–8024. https://doi.o maturation in the mammalian secretory pathway. Proc Natl Acad
rg/10.1021/acs.chemrev.8b00032. Sci USA. 113:E7890–e7899.
Woods RJ. 2019. Glyfinder and Glyprobity: New online tools for Xu Y et al. 2023. Structures of liganded glycosylphosphatidylinositol
locating and curating carbohydrate structures in Wwpdb. In: Time- transamidase illuminate GPI-AP biogenesis. Nat Commun. 14:5520.
proof perspectives on Glycoscience—Beilstein Glyco-bioinformatics https://doi.org/10.1038/s41467-023-41281-y.

Downloaded from https://academic.oup.com/glycob/article/35/10/cwaf048/8242499 by Grand Canyon University user on 25 September 2025


symposium. Limburg, Germany. pp. 82–83 Young JY et al. 2018. Worldwide protein data Bank biocuration
Wu Y et al. 2023. Exploiting substrate specificities of 6-O- supporting open access to high-quality 3D structural biology data.
sulfotransferases to enzymatically synthesize keratan Sulfate Database (Oxford). 2018:bay002. https://doi.org/10.1093/databa
oligosaccharides. JACS Au. 3:3155–3164. https://doi.org/10.1021/ se/bay002.
jacsau.3c00488. Zhang Y, Skolnick J. 2004. Scoring function for automated assessment
Wu Y et al. 2024. A biomimetic synthetic strategy can provide keratan of protein structure template quality. Proteins. 57:702–710. https://
Sulfate I and II oligosaccharides with diverse doi.org/10.1002/prot.20264.

© The Author(s) 2025. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium,
provided the original work is properly cited.
Glycobiology, 2025, 35, cwaf048
https://doi.org/10.1093/glycob/cwaf048
Glyco-Informatics

You might also like