Do you need to compare and combine data based on NCBI RefSeq and UniProt datasets, and aren’t sure which proteins are comparable? For many years, NCBI Gene has provided information about the relationships between RefSeq and UniProt accessions courtesy of data imported from UniProt, but the tremendous growth of both datasets has led to large gaps in the data. We have developed a new process to compare the two datasets, first looking for 100% identical proteins and then checking the remaining sequences for similar matches in related taxa. The result is mapping information now covering over 170 million RefSeq proteins across the tree of life.
You can find links to related UniProt accessions on individual NCBI Gene records. The entire dataset is available on our FTP site.
Stay up to date
This update was done as part of the NIH Comparative Genomics Resource (CGR). CGR facilitates reliable comparative genomics analyses for all eukaryotic organisms through an NCBI Toolkit and community collaboration.
This data was produced to maximize the reach of genomic data and enable better data exchange between community resources. We’ll be exploring ways to leverage these relationships as well similar mappings between NCBI RefSeq and Ensembl. These efforts will help interconnect data across the global research community.
Join our mailing list to keep up to date with RefSeq and other CGR news.
If you have questions or would like to provide feedback, please write to our help desk.