“Being energy-efficient (Rose & Wright, 2004) and self-replicating, the biological channel is also free from problems peculiar to radio signals: there is no need to rely on time of arrival, frequency and direction.”
Vladimir I. shCherbaka and Maxim A. Makukovb*
Department of Mathematics, al-Farabi Kazakh National University, Almaty, Republic of Kazakhstan
A B S T R A C T
It has been repeatedly proposed to expand the scope for SETI, and one of the suggested alternatives to radio is the biological media. Genomic DNA is already used on Earth to store nonbiological information. Though smaller in capacity, but stronger in noise immunity is the genetic code. The code is a flexible mapping between codons and amino acids, and this flexibility allows modifying the code artificially. But once fixed, the code might stay unchanged over cosmological timescales; in fact, it is the most durable construct known. Therefore it represents an exceptionally reliable storage for an intelligent signature, if that conforms to biological and thermodynamic requirements. As the actual scenario for the origin of terrestrial life is far from being settled, the proposal that it might have been seeded intentionally cannot be ruled out. A statistically strong intelligent-like “signal” in the genetic code is then a testable consequence of such scenario. Here we show that the terrestrial code displays a thorough precision-type orderliness matching the criteria to be considered an informational signal. Simple arrangements of the code reveal an ensemble of arithmetical and ideographical patterns of the same symbolic language. Accurate and systematic, these underlying patterns appear as a product of precision logic and nontrivial computing rather than of stochastic processes (the null hypothesis that they are due to chance coupled with presumable evolutionary pathways is rejected with P-value < 10–13). The patterns are profound to the extent that the code mapping itself is uniquely deduced from their algebraic representation. The signal displays readily recognizable hallmarks of artificiality, among which are the symbol of zero, the privileged decimal syntax and semantical symmetries. Besides, extraction of the signal involves logically straightforward but abstract operations, making the patterns essentially irreducible to any natural origin. Plausible way of embedding the signal into the code and possible interpretation of its content are discussed. Overall, while the code is nearly optimized biologically, its limited capacity is used extremely efficiently to store non-biological information.
Recent biotech achievements make it possible to employ genomic DNA as data storage more durable than any media currently used (Bancroft et al., 2001; Yachie et al., 2008; Ailenberg & Rotstein, 2009). Perhaps the most direct application for that was proposed even before the advent of synthetic biology. Considering alternative informational channels for SETI, Marx (1979) noted that genomes of living cells may provide a good instance for that. He also noted that even more durable is the genetic code. Exposed to strong negative selection, the code stays unchanged for billions of years, except for rare cases of minor variations (Knight et al., 2001) and context-dependent expansions (Yuan et al., 2010). And yet, the mapping between codons and amino acids is malleable, as they interact via modifiable molecules of tRNAs and aminoacyl-tRNA synthetases (Giegé et al., 1998; Ibba & Söll, 2000; see also Appendix A). This ability to reassign codons, thought to underlie the evolution of the code to multilevel optimization (Bollenbach et al., 2007), also allows to modify the code artificially (McClain & Foss, 1988; Budisa, 2006; Chin, 2012). It is possible, at least in principle, to arrange a mapping that both conforms to functional requirements and harbors a small message or a signature, allowed by 384 bits of informational capacity of the code. Once genome is appropriately rewritten (Gibson et al., 2010), the new code with a signature will stay frozen in the cell and its progeny, which might then be delivered through space and time to
putative recipients. Being energy-efficient (Rose & Wright, 2004) and self-replicating, the biological channel is also free from problems peculiar to radio signals: there is no need to rely on time of arrival, frequency and direction. Thus, due to these restrictions the origin of the famous “Wow!” signal received in 1977 remains uncertain (Ehman, 2011). The biological channel has been given serious considerations for its merits in SETI, though with the focus on genomes (Yokoo & Oshima, 1979; Freitas, 1983; Nakamura, 1986; Davies, 2010; Davies, 2012). Meanwhile, it has been proposed to secure terrestrial life by seeding exoplanets with living cells (Mautner, 2000; Tepfer, 2008), and that seems to be a matter of time. The biological channel suggests itself in this enterprise. To avoid anthropocentric bias, it might be admitted that terrestrial life is not the starting point in the series of cosmic colonization (Crick & Orgel, 1973; Crick, 1981). If so, it is natural to expect a statistically strong intelligent-like “signal” in the terrestrial genetic code (Marx, 1979). Such possibility is incited further by the fact that how the code came to be apparently non-random and nearly optimized still remains disputable and highly speculative (for reviews on traditional models of the code evolution see Knight et al., 1999; Gusev & Schulze-Makuch, 2004; Di Giulio, 2005; Koonin & Novozhilov, 2009). The only way to extract a signal, if any, from the code is to arrange its elements – codons, amino acids and syntactic signs – by their parameters using some straightforward logic. These arrangements are then analyzed for patterns or grammar-likestructures of some sort. The choice of arrangements and parameters should exclude arbitrariness. For example, only those parameters should be considered which do not depend on systems of physical units. However, even in this case a priori it is unknown exactly what kind of patterns one might expect. So there is a risk of false positives, as with a data set like the genetic code it is easy to find various patterns of one kind or another. Nonetheless, the task might be somewhat alleviated. First, it is possible to predict some general aspects of a putative signal and its “language”, especially if one takes advantage of active SETI experience. For example, it is generally accepted that numerical language of arithmetic is the same for the entire universe (Freudenthal, 1960; Minsky, 1985). Besides, symbols and grammar of this language, such as positional numeral systems with zero conception, are hallmarks of intelligence. Thus, interstellar messages sent from the Earth usually began with natural sequence of numbers in binary or decimal notation. To reinforce the artificiality, a symbol of zero was placed in the abstract position preceding the sequence. Those messages also included symbols of arithmetical operations, Egyptian triangle, DNA and other notions of human consciousness (Sagan et al., 1972; The Staff at the NAIC, 1975; Sagan et al., 1978; Dumas & Dutil, 2004). Second, to minimize the risk of false positives one can impose requirements as restrictive as possible on a putative signal. For example, it is reasonable to expect that a genuinely intelligent message would represent not just a collection of patterns of various sorts, but patterns of the same “linguistic style”. In this case, if a potential pattern is noticed, further search might be narrowed down to the same sort of patterns. Another stringent requirement might be that patterns should involve each element of the code in each arrangement, whereas the entire signal should occupy most, if not all, of the code’s informational capacity. By and large, given the nature of the task, specifics of the strategy are defined en route. Following these lines, we show that the terrestrial code harbors an ensemble of precision-type patterns matching the requirements mentioned above. Simple systematization of the code reveals a strong informational signal comprising arithmetical and ideographical components. Remarkably, independent patterns of the signal are all expressed in a common symbolic language. We show that the signal is statistically significant, employs informational capacity of the code entirely, and is untraceable to natural origin. The models of emergence of primordial life with original signal-free genetic code are beyond the scope of this paper; whatever it was, the earlier state of the code is erased by palimpsest of the signal.
Should there be a signal in the code, it would likely have manifested itself someway during the half-century history of traditional analysis of the code organization. So it is of use to summarize briefly what has been learned about that up to date. Also, for the sake of simplicity in data presentation, we will mention in advance some a posteriori information concerning the signal to be described, with fuller discussion in due course. We suggest to a reader unfamiliar with molecular mechanisms behind the genetic code first to refer to Appendix A, where it is also explained why the code is amenable to intentional “modulation” (to use the language of radio-oriented SETI) and, at the same time, is highly protected from casual “modulation” (has strong noise immunity). The code at a glance. As soon as the genetic code was biochemically cracked (Nirenberg et al., 1965), its non-random structure became evident (Woese, 1965; Crick, 1968). The most obvious pattern that emerged in the code was its regular redundancy. The code comprises 16 codon families beginning with the same pair of bases, and these families generally consist of either one or two equal series of codons mapped to one amino acid or to Stop (Fig. 1a). In effect, the standard code is nearly symmetric in redundancy. There are only two families split unequally: those beginning with TG and AT. The minimum action to restore the symmetry is to match TG-family against AT-family by reassigning TGA from Stop to cysteine. Incidentally, this symmetrized version is not just a theoretical guess but is also found in nature as the nuclear code of euplotid ciliates (Meyer et al., 1991). While the standard code stores the arithmetical component of the signal, the symmetrical euplotid version keeps the ideographical one (the interrelation between these two code versions is discussed later). Regular redundancy leads also to the block structure of the genetic code. This makes it possible to depict the code in a contracted form, where each amino acid corresponds to a single block, or a contracted series (Fig. 1b). The three exceptions are Arg, Leu and Ser, which have one IVseries and one II-series each. Apart from regular redundancy, a wealth of other features were reported afterwards, among which are robustness to errors (Alff-Steinberger, 1969), correlation between thermostability and redundancy of codon families (Lagerkvist, 1978), nonrandom distribution of amino acids among codons if judged by their polarity and bulkiness (Jungck, 1978), biosynthetic pathways (Taylor & Coates, 1989), reactivity (Siemion & Stefanowicz, 1992), and even taste (Zhuravlev, 2002). The code was also shown to be effective at handling additional information in DNA (Baisnée et al., 2001; Itzkovitz & Alon, 2007). Apparently, these features are related, if anything, to the direct biological function of the code. There are also a number of abstract approaches to the code, such as those based on topology (Karasev & Stefanov, 2001), information science (Alvager et al., 1989), and number theory (Dragovich, 2012). However, the main focus of these approaches is in constructing theoretical model descriptions of known features in the code, rather than dealing with new ones.
Millsian, Inc. is dedicated to developing the molecular modeling applications of The Grand Unified Theory of Classical Physics (GUT-CP), solving atomic and molecular structures by applying the classical laws of physics (Newton’s and Maxwell’s Laws) to the atomic scale.
The functional groups of all major classes of chemical bonding, including those involved in most organic molecules, have been solved exactly in closed-form solutions. By using these functional groups as building blocks, or independent units, a potentially infinite number of molecules can be solved. As a result, Millsian software can visualize the exact three-dimensional structure and calculate physical characteristics of almost any molecule of any length and complexity. While previous software based on traditional quantum methods resorted to approximations and required super computers for even simple systems, Millsian software requires no special expertise to solve complex proteins and DNA on a personal computer.
The Millsian competitive advantage includes rendering true molecular structures providing precise bonding characteristics, spatial and temporal charge distributions, and energies of every electron in every bond and bonding atom, facilitating the identification of biologically active sites in drugs; and facilitating drug design. The Company believes that this represents a major breakthrough in material science that has the potential to impact nearly all businesses involved in drug development and chemistry.
Artificiality. To be considered unambiguously as an intelligent signal, any patterns in the code must satisfy the following two criteria: (1) they must be highly significant statistically and (2) not only must they possess intelligent-like features (Elliott, 2010), but they should be inconsistent in principle with any natural process, be it Darwinian (Freeland, 2002) or Lamarckian (Vetsigian et al., 2006) evolution, driven by amino acid biosynthesis (Wong, 2005), genomic changes (Sella & Ardell, 2006), affinities between (anti)codons and amino acids (Yarus et al., 2009), selection for the increased diversity of proteins (Higgs, 2009), energetics of codon-anticodon interactions (Klump, 2006; Travers, 2006), or various pre-translational mechanisms (Wolf & Koonin, 2007; Rodin et al., 2011). The statistical test for the first criterion is outlined in Appendix B, showing that the described patterns are highly significant. The second criterion might seem unverifiable, as the patterns may result from a natural process currently unknown. But this criterion is equivalent to asking if it is possible at all to embed informational patterns into the code so that they could be unequivocally interpreted as an intelligent signature. The answer seems to be yes, and one way to do so is to make patterns virtual, not actual. Exactly that is observed in the genetic code. Strict balances and their decimal syntax appear only with the application of the “activation key”. Physically, there are no strict balances in the code (e.g., in Fig. 5b one would have 1002 ≠ 999 instead of 999 = 999). Artificial transfer of a nucleon in proline turns the arithmetical patterns on and thereby makes them virtual. This is also the reason why we interpret distinctive notation as an indication of decimalism, rather than as a physical requirement (yet unknown) for nucleon sums to be multiples of 037: in general, physically there is no such multiplicity in the code. In its turn, notationally preferred numeral system is by itself a strong sign of artificiality. It is also worth noting that all three-digit decimals – 111, 222, 333, 444, 555, 666, 777, 888, 999 (as well as zero, see below) – are represented at least once in the signal, which also looks like an intentional feature. However, it might be hypothesized that amino acid mass is driven by selection (or any other natural process) to be distributed in the code in a particular way leading to approximate mass equalities and thus making strict nucleon balances just a likely epiphenomenon. But it is hardly imaginable how a natural process can drive mass distribution in abstract representations of the code where codons are decomposed into bases or contracted by redundancy. Besides, nucleon equalities hold true for free amino acids, and yet in these free molecules side chains and standard blocks had to be treated by that process separately. Furthermore, no natural process can drive mass distribution to produce the balance in Fig. 10d: amino acids and syntactic signs that make up this balance are entirely abstract since they are produced by translation of a string read across codons. Another way to make patterns irreducible to natural events is to involve semantics, since no natural process is capable of interpreting abstract symbols. It should be noted that notions of symbols and meanings are used sometimes in a natural sense (Eigen & Winkler, 1983), especially in the context of biosemiotics (Barbieri, 2008) and molecular codes (Tlusty, 2010). The genetic code itself is regarded there as a “natural convention” that relates symbols (codons) to their meanings (amino acids). However, these approaches make distinction between organic semantics of molecular codes and interpretive or linguistic semantics peculiar to intelligence (Barbieri, 2008). Exactly the latter type of semantics is revealed in the signal of the genetic code. It is displayed there not only in the symmetry of antonymous syntactic signs (Fig. 10c), but also in the symbol of zero. For genetic molecular machinery there is no zero, there are nucleotide triplets recognized sterically by release factors at the ribosome. Zero – the supreme abstraction of arithmetic – is the interpretive meaning assigned to Stop-codons, and its correctness is confirmed by the fact that, being placed in its proper front position, zero maintains all ideogram symmetries. Thus, a trivial summand in balances, zero, however, appears as an ordinal number in the ideogram. In other words, besides being an integral part of the decimal system, zero acts also as an individual symbol in the code. In total, not only the signal itself reveals intelligent-like features – strict nucleon equalities, their distinctive decimal nota
tion, logical transformations accompanying the equalities, the symbol of zero and semantical symmetries, but the very method of its extraction involves abstract operations – consideration of idealized (free and unmodified) molecules, distinction between their blocks and chains, the activation key, contraction and decomposition of codons. We find that taken together all these aspects point at artificial nature of the patterns.
THE answer to the origin life could simply be the number 37 which could prove that our genetic code was created by ancient aliens, according to a new scientific theory.
The notion that life on Earth has alien origins is nothing new, but a pair of researchers believe that they have cracked an ancient code which prove that life was planted on Earth by extraterrestrial beings.
Maxim Makukov from the Fesenkov Astrophysical Institute in Almaty, Kazakhstan, claims to have discovered an “intelligent-like signal” that is encoded into our genetic material.
Although he admits that the theory is “out there”, he and his research mentor, mathematician Vladmir shCherbak, believe that they have conclusive evidence that a message, or a signature, is in our genetic code.
Panspermia is the process in which life is transferred from one planet to another. Some biologists believe that life on Earth began when an asteroid collided with Mars, causing the supposedly once microbial-full soil to be flung Earthbound.
The Kazakhstani duo have taken this one step further, and believe that life is a result of “direct panspermia” – something was intentionally sent towards Earth to kickstart life.
By analysing the genetic code – which is the set of rules which translate DNA into proteins and does not alter as it is passed down through generations – they note that the number 37 crops up several times.
One instance is that the mass of the molecular core shared by all 20 amino acids is 74 – which is 37 doubled.
Another is in ‘Rumer’s transformation’. Yuri Rumer first identified in 1966 that the genetic code can be divided equally in half, with one half being “whole family” codons – a codon being three structural units within DNA – and the other half being “split family”, which do not have the AC code, an amino acid that is used to build proteins.
There are a total of 28 codons which have a total atomic mass of 1665 and a combined side chain atomic mass of 703 – both of which are multiples of 37.
The scientists have a total of nine examples in their research paper published in Icarus, where they state that the chances of the number 37 appearing this many times by random in the genetic code in a staggering one in 10 trillion.
Prof Makukov told New Scientist: “It was clear right away that the code has a non-random structure.
“The patterns that we describe are not simply non-random.
“They have some features that, at least from our point of view, were very hard to ascribe to natural processes.”
As for what planted the message, the Kazakh scientist says: “Maybe they’re gone long ago. Maybe they’re still alive. I think these are questions for the future.
“For the patterns in the code, the explanation we give, we think is the most plausible.”
So you’re an alien seeding primordial Earth with life. Like any creator, you sign your work. Now we may have found that signature – in the genetic code
MAXIM MAKUKOV has an idea. It’s unorthodox; you might call it “out there”. Makukov understands that. He knew he’d have his critics the moment he began to develop it. But it’s there in the numbers, he says. And numbers don’t lie.
A cosmologist and astrobiologist at the Fesenkov Astrophysical Institute in Almaty, Kazakhstan, Makukov says the numbers reveal that all terrestrial life came from outer space. Not only that, it was planted on Earth by intelligent aliens. Billions of years ago, the planet was barren and lifeless. But then, at some distant and unknowable moment, it was seeded with what Makukov calls an “intelligent-like signal” – a signal that is too orderly and intricate to have occurred randomly.
This signal, he says, is in our genetic code. Highly preserved across cosmological timescales, it has been waiting there, like an encrypted message, for anyone qualified to read it. All of the teeming varieties of life on Earth – from kangaroos and daffodils to albatrosses and us – carry it within them. And now Makukov, along with his mentor, mathematician Vladmir shCherbak of the al-Farabi Kazakh National University in Almaty, claims to have cracked it. If they are right, the answer to life, the universe and everything is… 37.
“Most biologists will agree there is a contribution to the origin of life on Earth from cosmic sources”
The idea that terrestrial life has extraterrestrial origins has a long and sometimes distinguished history. The standard version goes something like this: a primitive alien life form, perhaps a bacterium, somehow hitches a ride through space aboard an object like a meteoroid, …