Skip to main content
Fig. 1 | BMC Evolutionary Biology

Fig. 1

From: Reduced alphabet of prebiotic amino acids optimally encodes the conformational space of diverse extant protein folds

Fig. 1

Work flow of the virtual mutagenesis procedure and mutual information optimization. A large data set of protein sequences (whose structures are known) is rewritten using a given reduced alphabet \( {\mathfrak{R}}_i^{10} \), a 10-member subset of the 20 genetically coded amino acids, and \( {\mathcal{S}}_j \), the substitution rule that dictates how the remaining amino acids are to be mutated virtually. For every combination of \( {\mathfrak{R}}_i^{10} \) and \( {\mathcal{S}}_j \), mutual information can be computed to assess their effectiveness in preserving structural information in the data set of more than 2000 single-domain proteins. Because there are more than 1015 different combinations of \( {\mathfrak{R}}_i^{10} \) and \( {\mathcal{S}}_j \) for which a mutual information can be computed, a Monte Carlo procedure is implemented to search across the different \( {\mathcal{S}}_j \) efficiently given each of the 184,756 ways to configure \( {\mathfrak{R}}_i^{10} \). In the end, the percentile rank of the prebiotic set \( {\mathfrak{R}}_{\mathrm{prebiotic}}^{10} \) is computed from the spectrum of mutual information values given by all other alternative 10-letter alphabets

Back to article page