Academia.eduAcademia.edu

Figure 4: A. Here we compare the three major representation methods for proteins sequences in this work, namely the amino acid sequence (green edge), latent encoding from an unsupervised autoen.- coder (blue edge), and a jointly-trained autoencoder (orange edge). B. Grids of the representations of sequences from four datasets (left to right: GIFFORD, GB1, GFP, TAPE) visualized by PCA and colored by their respective fitness values. C. Here we visualize the same represenations now using PHATE Moon et al. [2017], which visualizes multi-scale organization of the data. D. We use a smoothness metric which measure neighborhood-level variation of fitness values in latent space and compare the smoothness acroess the various approaches to protein representations in Table 2

Figure 4 A. Here we compare the three major representation methods for proteins sequences in this work, namely the amino acid sequence (green edge), latent encoding from an unsupervised autoen.- coder (blue edge), and a jointly-trained autoencoder (orange edge). B. Grids of the representations of sequences from four datasets (left to right: GIFFORD, GB1, GFP, TAPE) visualized by PCA and colored by their respective fitness values. C. Here we visualize the same represenations now using PHATE Moon et al. [2017], which visualizes multi-scale organization of the data. D. We use a smoothness metric which measure neighborhood-level variation of fitness values in latent space and compare the smoothness acroess the various approaches to protein representations in Table 2