0% found this document useful (0 votes)
58 views8 pages

UCSC Genome Browser 2023 Update

The document summarizes updates to the UCSC Genome Browser database in 2023. It describes new clinical data tracks added, updates to gene annotations, new single cell data, and over 50 new or updated annotation tracks across various assemblies. It also discusses ongoing support and new features provided by the Genome Browser.

Uploaded by

clfp25
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views8 pages

UCSC Genome Browser 2023 Update

The document summarizes updates to the UCSC Genome Browser database in 2023. It describes new clinical data tracks added, updates to gene annotations, new single cell data, and over 50 new or updated annotation tracks across various assemblies. It also discusses ongoing support and new features provided by the Genome Browser.

Uploaded by

clfp25
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

D1188–D1195 Nucleic Acids Research, 2023, Vol.

51, Database issue Published online 24 November 2022


[Link]

The UCSC Genome Browser database: 2023 update


Luis R. Nassar 1,* , Galt P. Barber1 , Anna Benet-Pagès2,3 , Jonathan Casper1 ,
Hiram Clawson1 , Mark Diekhans 1 , Clay Fischer1 , Jairo Navarro Gonzalez 1 ,
Angie S. Hinrichs 1 , Brian T. Lee 1 , Christopher M. Lee 1 , Pranav Muthuraman1 ,
Beagan Nguy1 , Tiana Pereira1 , Parisa Nejad1 , Gerardo Perez1 , Brian J. Raney1 ,
Daniel Schmelter1 , Matthew L. Speir 1 , Brittney D. Wick1 , Ann S. Zweig1 , David Haussler1 ,
Robert M. Kuhn1 , Maximilian Haeussler1 and W. James Kent1
1
Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA, 2 Institute of Neurogenomics,
Helmholtz Zentrum München GmbH - German Research Center for Environmental Health, 85764 Neuherberg,
Germany and 3 Medical Genetics Center (Medizinisch Genetisches Zentrum), Munich 80335, Germany

Received September 15, 2022; Revised October 14, 2022; Editorial Decision October 17, 2022; Accepted October 25, 2022

ABSTRACT of data being generated every year, tools like the UCSC
Genome Browser and other browsers (2–6) are increasingly
The UCSC Genome Browser ([Link] playing a key step in analysis and interpretation. Our re-
edu) is an omics data consolidator, graphical viewer, source services over 1.4 million users per year across its pri-
and general bioinformatics resource that continues mary site as well as its European and Asian based mirrors.
to serve the community as it enters its 23rd year. We also maintain near 100% uptime and continually update
This year has seen an emphasis in clinical data, with our software on a tri-week cycle.
new tracks and an expanded Recommended Track With regards to data access and visualization, we offer
Sets feature on hg38 as well as the addition of a over 6000 tracks on the two latest human GRCh assem-
single cell track group. SARS-CoV-2 continues to re- blies alone, GRCh38/hg38 and GRCh37/hg19. There are
main a focus, with regular annotation updates to the also over 200 assemblies available on the Genome Browser
browser and continued curation of our phylogenetic and over 2000 if GenArk (7) is included. We support over 30
data formats such as bed/bigBed, wig/bigWig (8), VCF (9)
sequence placing tool, hgPhyloPlace, whose tree has
and GTF/GFF. This not only allows users to display their
now reached over 12M sequences. Our GenArk re- own annotations, but also to visualize data from a large
source has also grown, offering over 2500 hubs and number of sources in a single location. Nearly all data is
a system for users to request any absent assem- available for extraction via bulk download, public MySQL
blies. We have expanded our bigBarChart display server, RESTful API (10) or the Table Browser (11).
type and created new ways to visualize data via bi- We also provide tools to facilitate scientific collaboration
gRmsk and dynseq display. Displaying custom anno- as well as support for the community. Immutable snapshots
tations is now easier due to our chromAlias system of annotations and locations can be shared via the sessions
which eliminates the requirement for renaming se- feature (My Data → My Sessions), custom data can be
quence names to the UCSC standard. Users involved shared as custom tracks (My Data → Custom Tracks) and
in data generation may also be interested in our new hubs (My Data → Track hubs), and user-generated hubs
can be shared with the wider community by means of the
tools and trackDb settings which facilitate the cre-
Public Hub list. We also respond to over 600 mailing list
ation and display of their custom annotations. questions per year, assisting users with topics such as how
best to display their data, troubleshooting our tools, and
generating chain files for lifting between assemblies.
INTRODUCTION
Lastly, we provide and support many other tools and util-
The University of California Santa Cruz (UCSC) Genome ities. Some of the most popular tools not yet mentioned are
Browser (1) is an online resource for the genomics com- BLAT (12) for placing sequences, In-Silico PCR for iden-
munity providing data access and visualization, collabora- tifying PCR primers, and LiftOver which provides a web
tion and support resources, and a suite of tools that are interface for converting genomic coordinates between as-
now standard in the field. With the ever-increasing amounts semblies. Our hundreds of utilities ([Link]

* To whom correspondence should be addressed. Tel: +1 305 205 9160; Email: lrnassar@[Link]
Present address: Luis Nassar, Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA.


C The Author(s) 2022. Published by Oxford University Press on behalf of Nucleic Acids Research.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License ([Link] which
permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Nucleic Acids Research, 2023, Vol. 51, Database issue D1189

[Link]/[Link]#utilities downloads) can also be 15 normal humans, and a Merged Cells track, which is an
downloaded. These include file format creation, such as aggregate track created by the Genome Browser contain-
bedToBigBed, command line versions of our web tools such ing data from 12 papers covering 14 organs. A complete
as liftOver, and other resources. And for users that may have list of new single-cell tracks is available in Supplementary
sensitive data or poor connections, we offer various ways to Table S2.
mirror our software locally (Mirrors → Mirroring Instruc-
tions). For more information on what the Genome Browser
Gene set updates
has to offer, visit our training page ([Link]
edu/training). This year we have added or updated 15 gene annota-
tions for human and mouse. We continue to provide
the latest GENCODE gene models (20), currently v41,
NEW AND UPDATED ANNOTATIONS
which are always available on hg38, hg19 and mm39.
Over the last year we have added and updated over 50 anno- We also archive these releases for reproducibility, hav-
tation tracks to existing assemblies, added a new single-cell ing added 38–41 during this period. The NCBI RefSeq
annotation group to hg38, made seven new or updated Pub- gene models (21) on hg38 and hg19 have also been up-
lic Hubs available, and added over 900 new assembly hubs dated corresponding to NCBI release 109.20211119 and
via GenArk. We have also created hundreds of liftOver files, 105.20220307 respectively. We have also added the 1.0 re-
all of which are available on our download server, which al- lease of the Matched Annotation from NCBI and EMBL-
low coordinate lifting between assemblies. This includes 36 EBI (MANE) project (22), which provides a set of high-
files directly requested by users on our mailing list. confidence transcripts that are identically annotated be-
tween RefSeq and Ensembl/GENCODE. Lastly, we have
updated select tables (kgXref, kgAlias) and our search files
New clinical data
for the default hg19 gene annotation track UCSC Genes
Twelve new tracks have been added to human assemblies (knownGene) (23) so that new and updated gene symbols
in support of variant interpretation and clinical genomics. can be found.
Some notable examples include DECIPHER (DatabasE of
genomiC varIation and Phenotype in Humans using En-
Other new tracks
sembl Resources) (13), which aggregates variant informa-
tion from various sources, added to hg38; Orphanet (14), In addition to clinical, single-cell and gene tracks, we have
which provides comprehensive datasets related to rare dis- added 8 new tracks to our vertebrate assemblies. These in-
eases and orphan drugs from the Orphanet knowledge base; clude a European Variation Archive (EVA) (24) track cor-
GenCC (The Gene Curation Coalition) (15), which aims responding to EVA release 3 on 14 assemblies including
to collect and standardize gene-disease validity annotations mm39, providing novel variant data on these Browsers.
across various submitters; and dbSNP155 (16), which is the There is also now a 241-way Cactus (25,26) comparative
latest NCBI dbSNP release with over one billion variants. genomics alignment track on hg38 generated by the Zoono-
We have also continued to update our Microarray Probesets mia Project (27), which is the largest conservation track in
tracks, which now contain the positions of probes and tar- the Genome Browser. We have also added various regu-
gets of over 50 NGS arrays. A new Constraint Scores track latory tracks and additional annotations from the GTEx
is also available which hosts various mutation constraint an- Consortium (28). See Supplementary Tables S2–S4 for a
notations from different data providers. For a complete list full list of tracks and assemblies. We also continue to run
of new and updated clinical tracks see Supplementary Table our pipelines which automatically update annotations on
S1. 13 tracks, which can be seen in Supplementary Table S5.
In order to better introduce the clinical resources avail-
able on hg38, we have expanded our Recommended Track
SARS-CoV-2 genome browser updates
Sets (17) ([Link]
html#022222) feature to hg38 (Figure 1). Like hg19, this We continue to regularly update our SARS-CoV-2 assem-
feature contains 4 sets of curated track configurations for bly data, adding or updating 14 tracks over the last year.
different clinical applications. Among these tracks is our Variants of Concern (VOC)
track, which we continue to update with the latest WHO-
designated variants of concern. For a full list of updated
New single-cell track group on hg38
tracks see Supplementary Table S4.
We have added a new single-cell RNA-seq (scRNA-seq) Curation has also been ongoing on the growing
track group to hg38 (Figure 2). It currently contains phylogenetic tree which supports our tool for placing
14 scRNA-seq tracks, originally wrangled into our Cell SARS-CoV-2 sequences using UShER (29), hgPhyloPlace
Browser ([Link] (18), covering major organs ([Link] The tree
of the body with each track being comprised of 2–19 indi- now contains over 12 million sequences, with updates
vidual mRNA expression tracks in barChart format. There occurring daily. A minimized version of the tree is included
are also two aggregate tracks: Tabula Sapiens (19), which in the pangolin tool (30), used by public health departments
contains data from the Tabula Sapiens Consortium pro- worldwide to assign lineages to new sequences. The full
viding an atlas of nearly 500 000 cells from 24 organs of tree including GISAID sequences cannot be redistributed
D1190 Nucleic Acids Research, 2023, Vol. 51, Database issue

Figure 1. Recommended Track Sets available for hg38 in the Genome Browser menu (Genome Browser → Recommended Track Sets).

Figure 2. Single-cell RNA-seq track group now available on hg38.

due to GISAID restrictions ([Link]), but we offer data. These new hubs include the 2022 update of the popu-
download files for a public sequence tree with over 6 million lar ReMap Regulatory Atlas hub (32), which contains tran-
sequences ([Link] scriptional regulator annotations on 6 model genomes, and
wuhCor1/UShER SARS-CoV-2/) (31). We have also a 605 species Mammal and Bird alignment using the Cactus
recently expanded hgPhyloPlace for use with monkeypox aligner. For a full list see Supplemental Table S6.
(RefSeq NC 063383.1).
NEW ASSEMBLY DATA
New hubs
Over the last year we have updated the official patch se-
This year we have added 7 new ‘Public hubs’, which are ex- quences from the Genome Reference Consortium (GRC)
ternally hosted and maintained annotations available to our for hg38 and mm10. The GRCh38/hg38 assembly has been
users via the Track Data Hubs page ([Link] updated to patch 13, and GRCm38/mm10 has been up-
edu/cgi-bin/hgHubConnect). We continue to accept sub- dated to patch 6. These updates contain both fix sequences
missions from users looking to promote and share their and alternate haplotypes.
Nucleic Acids Research, 2023, Vol. 51, Database issue D1191

Figure 3. Advanced options in Track Search now includes the option to search Public Hub tracks.

Genome Archive (GenArk) semblies (e.g. hg19, hg38), and that in line with our repro-
ducibility practice, all previous hubs and hub data will con-
With the continued drop in sequencing cost and increase in
tinue to exist.
assembly quality, we have expanded the resources spent on
rapid creation of browsers via assembly hubs based on Gen-
Bank (33) assembly accessions. This collection of in-house
generated hubs, referred to as Genome Archive (GenArk - NEW GENOME BROWSER SOFTWARE
[Link] currently contains Over the last year we have expanded functionality of
2589 hubs. Over the last year alone we have added 904 the Genome Browser with small additions as well as
new NCBI/VGP assemblies. There is now also a viral new and updated displays and settings. By user request,
genomes category ([Link] we have added a comma separated values option to the
viral/[Link]) containing 257 viral assemblies ready Table Browser output. The new setting can be toggled
for display. In response to user demand, we have cre- on the ‘output field separator’ tab and facilitates data
ated an assembly request page ([Link] download for use in other software such as excel. It is
[Link]). This page allows users to search now also possible to include Public Hub tracks in the
for most GenBank assemblies, currently containing 15 018 Track Search ([Link]
eligible browser candidates, and request a browser be cre- db=hg38&hgt tSearch=track+search) results by toggling
ated if one does not already exist. New browsers are on the feature in the advanced options (Figure 3).
typically ready in less than a week. For more informa-
tion on GenArk, see our detailed four-part blog series
on the topic ([Link] New displays
11/23/genark-hubs-part-1/).
bigBarChart. Two new settings have been added to the
bigBarChart ([Link]
T2T CHM13 v2.0 assembly (hs1) [Link]) format to allow for additional customiza-
Soon after the T2T consortium published their T2T tion of how the bars display: barChartBarMinWidth and
CHM13 v2.0 assembly (34), we created a GenArk browser barChartBarMinPadding. There is also a new feature
to display the sequence alongside various annotation tracks for bigBarChart tracks that enables a facet display (Fig-
which were a combination of consortium-generated and ure 4) in the item details page and track configuration
in-house data. These include various gene annotations, page ([Link]
lifted clinical data, and comparative genomics tracks fo- hg38&c=chrX&g=tabulaSapiensTissueCellType). These
cused on the new sequence added in T2T CHM13 v2.0. facets allow for visualization and grouping of complex
We expanded our hgConvert ([Link] and expansive data, such as single cell data, into various
bin/hgConvert?db=hg38) and hgLiftOver ([Link] categories and granularities based on associated metadata.
[Link]/cgi-bin/hgLiftOver?db=hg38) tools to support The facets are enabled by adding the new trackDb settings
GenArk assemblies in order to facilitate data conversion be- barChartFacets and barChartStatsUrl ([Link]
tween hg19/hg38 and T2T CHM13 v2.0. [Link]/goldenPath/help/[Link]#example6).
In anticipation of many high-quality genomes becoming
available in the near future, T2T CHM13 v2.0 was the first bigRmsk. The bigRmsk track type ([Link]
human assembly to be elevated from hub to curated hub. edu/goldenPath/help/[Link]) has been added for
Curated hubs, while still hubs, have all the support of native displaying repeat annotations generated by the Repeat-
assemblies such as easier discovery and track search, API Masker program. The setting is optimized for displaying re-
support, and the ability to add custom annotations with- peat types, automatically changing its display based on the
out first having to connect to the hub. With this change to window size. The track includes item coloring based on the
curated hub the assembly name was changed to Homo sapi- classification of the repeat, and the Full mode includes ad-
ens 1 (hs1). T2T CHM13 v2.0 (hs1) can be accessed directly ditional details such as length of unaligned repeat model se-
from the Genomes dropdown menu. It is worth noting that quence and context for where a repeat fragment originates
to users curated hubs are functionally identical to native as- (Figure 5A).
D1192 Nucleic Acids Research, 2023, Vol. 51, Database issue

Figure 4. Search facets for bigBarChart track Tabula Sapiens: Tabula Tissue Cell.

Figure 5. (A) Full display mode in bigRmsk track. (B) dynseq display in full mode.

dynseq display. We have added support for the dynseq dis- and added 15 new trackDb settings with various functions
play (35) developed by the Kundaje lab ([Link] (Table 1). When creating hub tracks for a genome that is in-
[Link]/dynseq-pages/). This display scales the height of cluded in GenArk, you can now designate the GCA/GCF
each nucleotide letter based on the signal value within identifier and the Genome Browser will automatically at-
a bigWig track ([Link] tach the matching GenArk assembly hub genome and dis-
[Link]#Ex4; Figure 5B). play the data on it ([Link]
html#genArkTrackHub). This harmonizes the system to
New hub features and TrackDb statements function like native assemblies, such as hg19 and hg38, and
removes the requirement of a multi-line genome stanza.
In order to facilitate custom annotations and user content, Another new feature that builds upon hub annotations
we have expanded custom track as well as hub support which are designated by the bigDataUrl setting is access
Nucleic Acids Research, 2023, Vol. 51, Database issue D1193

Table 1. List of new trackDb settings added to the Hub Track Database Definition document ([Link]
[Link]) over the last year
Setting name Description
otherTwoBitUrl For in pairwise alignment tracks (chain, PSL), used to specify location of query sequence.
logo Enables the dynseq display feature on wiggle tracks.
speciesLabels Allows one to specify new labels that map to sequence names in bigMaf tracks.
hicDistanceMax Controls the maximum interaction distance in nucleotides for the heatmap in Hi-C tracks.
hicDistanceMin Controls the minimum interaction distance in nucleotides for the heatmap in Hi-C tracks.
barChartFacets Enables the facets feature in bigBarChart track description and item details pages.
barChartStatsUrl Associates a table in tab-separated-values with the bigBarChart track, with one line per bar.
Currently used in coordination with the barChartsFacets tag to specify metadata such as cell
types or tissue of origin.
barChartBarMinPadding Sets the minimum pixel width between bars for bigBarChart tracks.
barChartBarMinWidth Sets the minimum pixel width of the bars in bigBarChart tracks.
barChartStretchToItem Extends the barCharts to cover the entire horizontal space available in the graph. Useful for
bigBarChart tracks with many bars.
pslSequence Specifies display configuration options for PSL tracks that also have sequence loaded.
showCdsAllScales Shows CDS for PSL tracks at all zoom levels.
showCdsMaxZoom Specifies (bases/pixel) the maximum zoom-out allowed for displaying the CDS for PSL tracks.
showDiffBasesMaxZoom Shows annotations highlighting base or codon differences only if current zoom level does not
exceed value (bases/pixel) in PSL tracks.

to the extended case/color options. This means that when New and updated tools
browsing the tracks display while displaying hub data, you
We continue to add to and maintain our suite of over
can go to View → DNA in the top blue bar menu and
300 command line tools ([Link]
select the ‘extended case/color options’ button. In that
[Link]#utilities downloads). Of notable mention
page you will be able to modify the DNA sequence in
is our chromToUcsc tool, as well as four new tools which
the window in various ways depending on the data tracks
help with data analysis and track creation. Many were
which are currently being displayed, such as adding a spe-
developed to assist in the creation of the new bigBar-
cific color for any part of the sequence covered by the
Chart single cell tracks. See our bigBarChart documenta-
annotations.
tion for an example ([Link]
help/[Link]#example7).
chromAlias. The chromAlias system provides an index of
corresponding sequence names across different groups and chromToUcsc. Can be used to convert most standard an-
consortiums. An example would be how UCSC names chro- notation file formats (e.g. BED, wiggle, GTF, VCF, etc.)
mosomes with the ‘chr’ prefix while other groups such that contain different sequence names to those expected by
as Ensembl (36) list only the number: ‘chr2’ in UCSC UCSC, to the UCSC convention.
corresponds to ‘2’ in Ensembl. In the past users would
have to modify the sequence names if they did not ad- tabToTabDir. Takes a single large table and converts it to
here to the UCSC convention, but that is no longer the a directory full of smaller tables. This is useful for exploring
case. chromAlias associations have been built for all native and curating large metadata tables. It can be particularly
Genome Browsers as well as GenArk assemblies, looking helpful in reducing a table with many fields into a few nor-
for corresponding matches in GenBank, Ensembl, and Ref- malized (in the relational sense) tables with fewer fields.
Seq when available. When custom annotations are now at-
tached, if the sequence names do not match UCSC’s, then matrixClusterColumns. Converts a single cell gene expres-
the chromAlias table is referenced and displays the anno- sion matrix to a cell-type gene expression matrix. It takes a
tations if a match is found. This support has also been cell-by-cell metadata matrix that refers to the same cells as
extended to the bedToBigBed utility, which now option- a gene expression matrix and combines the gene expression
ally accepts a chromAlias file instead of a [Link] file values for all cells of a given type into a single value repre-
and will build the bigBed without any need for renam- senting the cell type. It can also be used on other metadata
ing sequences. These chromAlias files can be found on our fields to produce matrices that show mean or average gene
download server, e.g. hg38 ([Link] expression levels for a donor, an organ, or any other meta-
edu/goldenPath/hg38/bigZips/[Link]). data field or combination of fields.

gencodeVersionForGenes. Takes a list of gene symbols or


TrackDb settings gene accessions and searches for the version of GENCODE
or RefSeq that matches the most genes in the list. Optionally
We added 15 new trackDb settings to our Hub Track produces a bed file containing the gene structures for the
Database Definition document ([Link] genes in the list.
edu/goldenPath/help/trackDb/[Link]). These
include additional configurations for Hi-C track display, matrixToBarChartBed. Combines an expression matrix
bigBarChart display and PSL display among others. See and a bed file with gene structures to make a bed file with a
Table 1 for a full list and short description. bar chart showing gene expression on the Genome Browser.
D1194 Nucleic Acids Research, 2023, Vol. 51, Database issue

OUTREACH AND CONTACT INFORMATION the bigRmsk format as well as Jean-Madeleine de Sainte
Agathe for lending their expertise in the creation of the Con-
The Genome Browser supports users in a variety of ways
straints score container track. Lastly, the authors acknowl-
including a blog ([Link] videos
edge Greta Martin and the rest of their grants team that
([Link] and both virtual and in-person
keep the figurative lights on, their system administrators
trainings ([Link] In the year since the
Jorge Garcia, Haifang Telc, and Erich Weiler that keep the
last NAR update, we have conducted 25 workshops and
literal lights on, and the rest of the support staff whose work
courses, including several at international meetings. Our
allows them to focus on creating the best tool they can.
training page ([Link] provides
access to these resources as well as an index to user guides
and help pages for all the Genome Browser tools. Three
new videos have also been added to the YouTube channel FUNDING
([Link] featuring the use of the SARS- National Human Genome Research Institute [2U24HG00
CoV-2 browser. 2371 to L.R.N., G.P.B., J.C., H.C., C.F., J.N.G., A.S.H.,
We provide email support through a public and a pri- B.T.L., C.M.L., P.N., G.P., B.J.R., D.S., M.L.S., B.D.W.,
vate mailing list where users can avail themselves of our A.S.Z., R.M.K., M.H., W.J.K., 5U01HG010971 to M.D.,
expert and responsive staff. Access to the mailing lists can 5R01HG010329 to M.D., M.H., 2U24HG007234 to M.D.,
be found at [Link] where 5U41HG010972 to B.J.R., M.H.]; National Institute
there is also a link to an archive of previously answered of Allergy and Infectious Disease [75N93019C00076 to
questions from the public list. H.C., M.L.S.]; Howard Hughes Medical Institute [090100
In response to inquiries from our users, we released a to D.H.]; Silicon Valley Community Foundation [2017-
module of content designed for use in the undergraduate 171531(5022) to G.P.B., J.C., C.F., P.M., B.N., T.P., P.N.,
classroom. This content features vignettes written by un- M.L.S., B.D.W., W.J.K.]; University of California Office of
dergraduates to illustrate, using the Genome Browser, a va- the President [R01RG3764 to L.R.N.]; California Depart-
riety of lessons in Molecular Biology, Genetics, Medicine, ment of Public Health [20-11088 to A.S.H., P.N., M.H.];
Population Biology and Evolution. This can be found at Centers for Disease Control [75D30121C11554 to A.S.H.,
[Link] M.H.]; Burroughs Wellcome Fund [1021635 to C.F.]; A.B.P.
is supported by the DFG, German Research Foundation
FUTURE PLANS [NFDI 1/1] ‘GHGA––German Human Genome-Phenome
Archive’. Funding for open access charge: National Human
This coming year represents the first in our new 5-year plan- Genome Research Institute [5U41HG002371].
ning cycle. A major goal during this time is evaluation and Conflict of interest statement. L.R.N., G.P.B., J.C., H.C.,
adoption of a pangenome graph data format. We will also C.F., J.N.G., A.S.H., B.T.L., C.M.L., P.N., G.P., B.J.R., D.S.,
be releasing a new site-wide search function and a track du- M.L.S., B.D.W., A.S.Z., R.M.K., M.H., W.J.K. receive roy-
plication feature. Work continues to expand hub support. alties from the sale of UCSC Genome Browser source code,
Most new data will be created in big formats and new assem- LiftOver, GBiB, and GBiC licenses to commercial entities.
blies will be implemented as hubs instead of SQL databases W.J.K. owns Kent Informatics.
(e.g. hs1). Along those lines, work will begin on a tool to
facilitate hub development. Lastly, an emphasis on clinical
genomics and single cell data will continue, with features
REFERENCES
such as Recommended Track Sets and the new single cell
track group seeing updates throughout the year. 1. Kent,W.J., Sugnet,C.W., Furey,T.S., Roskin,K.M., Pringle,T.H.,
Zahler,A.M. and Haussler,D. (2002) The human genome browser at
UCSC. Genome Res., 12, 996–1006.
2. Cunningham,F., Allen,J.E., Allen,J., Alvarez-Jarreta,J., Amode,M.R.,
DATA AVAILABILITY Armean,I.M., Austine-Orimoloye,O., Azov,A.G., Barnes,I.,
The UCSC Genome Browser ([Link] is Bennett,R. et al. (2022) Ensembl 2022. Nucleic Acids Res., 50,
D988–D995.
freely available to all users. The only exceptions are the 3. Thorvaldsdóttir,H., Robinson,J.T. and Mesirov,J.P. (2013) Integrative
source code for the Genome Browser, Blat utility, liftOver genomics viewer (IGV): high-performance genomics data
utility and other utilities which are free for non-profit aca- visualization and exploration. Brief. Bioinform., 14, 178–192.
demic research and for personal use. A license is required 4. Li,D., Purushotham,D., Harrison,J.K., Hsu,S., Zhuo,X., Fan,C.,
Liu,S., Xu,V., Chen,S., Xu,J. et al. (2022) WashU epigenome browser
for commercial use of these utilities or the source code. update 2022. Nucleic Acids Res., 50, W774.
5. Buels,R., Yao,E., Diesh,C.M., Hayes,R.D., Munoz-Torres,M.,
Helt,G., Goodstein,D.M., Elsik,C.G., Lewis,S.E., Stein,L. et al.
SUPPLEMENTARY DATA (2016) JBrowse: a dynamic web platform for genome visualization
and analysis. Genome Biol., 17, 66.
Supplementary Data are available at NAR Online. 6. Rangwala,S.H., Kuznetsov,A., Ananiev,V., Asztalos,A., Borodin,E.,
Evgeniev,V., Joukov,V., Lotov,V., Pannu,R., Rudnev,D. et al. (2021)
Accessing NCBI data using the NCBI sequence viewer and genome
ACKNOWLEDGEMENTS data viewer (GDV). Genome Res., 31, 159–169.
7. Lee,B.T., Barber,G.P., Benet-Pagès,A., Casper,J., Clawson,H.,
The authors thank the users and data providers for their Diekhans,M., Fischer,C., Gonzalez,J.N., Hinrichs,A.S., Lee,C.M.
continued use and support of the Genome Browser. They et al. (2021) The UCSC genome browser database: 2022 update.
would also like to thank Robert Hubley for his work on Nucleic Acids Res., 50, D1115–D1122.
Nucleic Acids Research, 2023, Vol. 51, Database issue D1195

8. Kent,W.J., Zweig,A.S., Barber,G., Hinrichs,A.S. and Karolchik,D. 23. Hsu,F., Kent,W.J., Clawson,H., Kuhn,R.M., Diekhans,M. and
(2010) BigWig and bigbed: enabling browsing of large distributed Haussler,D. (2006) The UCSC known genes. Bioinformatics, 22,
datasets. Bioinformatics, 26, 2204–2207. 1036–1046.
9. Danecek,P., Auton,A., Abecasis,G., Albers,C.A., Banks,E., 24. Cezard,T., Cunningham,F., Hunt,S.E., Koylass,B., Kumar,N.,
DePristo,M.A., Handsaker,R.E., Lunter,G., Marth,G.T., Sherry,S.T. Saunders,G., Shen,A., Silva,A.F., Tsukanov,K., Venkataraman,S.
et al. (2011) The variant call format and VCFtools. Bioinforma. Oxf. et al. (2021) The european variation archive: a FAIR resource of
Engl., 27, 2156–2158. genomic variation for all species. Nucleic Acids Res., 50,
10. Lee,C.M., Barber,G.P., Casper,J., Clawson,H., Diekhans,M., D1216–D1220.
Gonzalez,J.N., Hinrichs,A.S., Lee,B.T., Nassar,L.R., Powell,C.C. 25. Armstrong,J., Hickey,G., Diekhans,M., Fiddes,I.T., Novak,A.M.,
et al. (2020) UCSC genome browser enters 20th year. Nucleic Acids Deran,A., Fang,Q., Xie,D., Feng,S., Stiller,J. et al. (2020) Progressive
Res., 48, D756–D761. cactus is a multiple-genome aligner for the thousand-genome era.
11. Karolchik,D., Hinrichs,A.S., Furey,T.S., Roskin,K.M., Sugnet,C.W., Nature, 587, 246–251.
Haussler,D. and Kent,W.J. (2004) The UCSC table browser data 26. Paten,B., Earl,D., Nguyen,N., Diekhans,M., Zerbino,D. and
retrieval tool. Nucleic Acids Res., 32, D493–D496. Haussler,D. (2011) Cactus: algorithms for genome multiple sequence
12. Kent,W.J. (2002) BLAT––The BLAST-Like alignment tool. Genome alignment. Genome Res., 21, 1512–1528.
Res., 12, 656–664. 27. Zoonomia Consortium (2020) A comparative genomics multitool for
13. Firth,H.V., Richards,S.M., Bevan,A.P., Clayton,S., Corpas,M., scientific discovery and conservation. Nature, 587, 240–245.
Rajan,D., Vooren,S.V., Moreau,Y., Pettett,R.M. and Carter,N.P. 28. The GTEx Consortium (2020) The GTEx consortium atlas of genetic
(2009) DECIPHER: database of chromosomal imbalance and regulatory effects across human tissues. Science, 369, 1318–1330.
phenotype in humans using ensembl resources. Am. J. Hum. Genet., 29. Turakhia,Y., Thornlow,B., Hinrichs,A.S., De Maio,N., Gozashti,L.,
84, 524–533. Lanfear,R., Haussler,D. and Corbett-Detig,R. (2021) Ultrafast
14. Pavan,S., Rommel,K., Mateo Marquina,M.E., Höhn,S., Lanneau,V. sample placement on existing tRees (UShER) enables real-time
and Rath,A. (2017) Clinical practice guidelines for rare diseases: the phylogenetics for the SARS-CoV-2 pandemic. Nat. Genet., 53,
orphanet database. PLoS One, 12, e0170365. 809–816.
15. DiStefano,M.T., Goehringer,S., Babb,L., Alkuraya,F.S., Amberger,J., 30. O’Toole,Á., Scher,E., Underwood,A., Jackson,B., Hill,V.,
Amin,M., Austin-Tse,C., Balzotti,M., Berg,J.S., Birney,E. et al. McCrone,J.T., Colquhoun,R., Ruis,C., Abu-Dahab,K., Taylor,B.
(2022) The gene curation coalition: a global effort to harmonize et al. (2021) Assignment of epidemiological lineages in an emerging
gene–disease evidence resources. Genet. Med., 24, 1732–1742. pandemic using the pangolin tool. Virus Evol., 7, veab064.
16. Sherry,S.T., Ward,M.-H., Kholodov,M., Baker,J., Phan,L., 31. McBroome,J., Thornlow,B., Hinrichs,A.S., Kramer,A., De Maio,N.,
Smigielski,E.M. and Sirotkin,K. (2001) dbSNP: the NCBI database Goldman,N., Haussler,D., Corbett-Detig,R. and Turakhia,Y. (2021)
of genetic variation. Nucleic Acids Res., 29, 308–311. A daily-updated database and tools for comprehensive SARS-CoV-2
17. Benet-Pagès,A., Rosenbloom,K.R., Nassar,L.R., Lee,C.M., mutation-annotated trees. Mol. Biol. Evol., 38, 5819–5824.
Raney,B.J., Clawson,H., Schmelter,D., Casper,J., Gonzalez,J.N., 32. Hammal,F., de Langen,P., Bergon,A., Lopez,F. and Ballester,B.
Perez,G. et al. (2022) Variant interpretation: UCSC genome browser (2021) ReMap 2022: a database of human, mouse, drosophila and
recommended track sets. Hum. Mutat., 43, 998–1011. arabidopsis regulatory regions from an integrative analysis of
18. Speir,M.L., Bhaduri,A., Markov,N.S., Moreno,P., Nowakowski,T.J., DNA-binding sequencing experiments. Nucleic Acids Res., 50,
Papatheodorou,I., Pollen,A.A., Raney,B.J., Seninge,L., Kent,W.J. D316–D325.
et al. (2021) UCSC cell browser: visualize your single-cell data. 33. Benson,D.A., Cavanaugh,M., Clark,K., Karsch-Mizrachi,I.,
Bioinformatics, 37, 4578–4580. Lipman,D.J., Ostell,J. and Sayers,E.W. (2013) GenBank. Nucleic
19. Schaum,N., Karkanias,J., Neff,N.F., May,A.P., Quake,S.R., Acids Res., 41, D36–D42.
Wyss-Coray,T., Darmanis,S., Batson,J., Botvinnik,O., Chen,M.B. 34. Nurk,S., Koren,S., Rhie,A., Rautiainen,M., Bzikadze,A.V.,
et al. (2018) Single-cell transcriptomics of 20 mouse organs creates a Mikheenko,A., Vollger,M.R., Altemose,N., Uralsky,L.,
tabula muris. Nature, 562, 367–372. Gershman,A. et al. (2022) The complete sequence of a human
20. Frankish,A., Diekhans,M., Jungreis,I., Lagarde,J., Loveland,J.E., genome. Science, 376, 44–53.
Mudge,J.M., Sisu,C., Wright,J.C., Armstrong,J., Barnes,I. et al. 35. Nair,S., Barrett,A., Li,D., Raney,B.J., Lee,B.T., Kerpedjiev,P.,
(2021) gencode 2021. Nucleic Acids Res., 49, D916–D923. Ramalingam,V., Pampari,A., Lekschas,F., Wang,T. et al. (2022) The
21. O’Leary,N.A., Wright,M.W., Brister,J.R., Ciufo,S., Haddad,D., dynseq genome browser track enables visualization of
McVeigh,R., Rajput,B., Robbertse,B., Smith-White,B., Ako-Adjei,D. context-specific, dynamic DNA sequence features at single nucleotide
et al. (2016) Reference sequence (RefSeq) database at NCBI: current resolution genomics. bioRxiv doi:
status, taxonomic expansion, and functional annotation. Nucleic [Link] 31 May 2022, preprint:
Acids Res., 44, D733–D745. not peer reviewed.
22. Morales,J., Pujar,S., Loveland,J.E., Astashyn,A., Bennett,R., 36. Yates,A.D., Achuthan,P., Akanni,W., Allen,J., Allen,J.,
Berry,A., Cox,E., Davidson,C., Ermolaeva,O., Farrell,C.M. et al. Alvarez-Jarreta,J., Amode,M.R., Armean,I.M., Azov,A.G.,
(2022) A joint NCBI and EMBL-EBI transcript set for clinical Bennett,R. et al. (2020) Ensembl 2020. Nucleic Acids Res., 48,
genomics and research. Nature, 604, 310–315. D682–D688.

You might also like