Genes are what life forms use to pass on traits to their offspring. They contain information on how to build organisms and their parts; they are activated or expressed as needed. Genes can get modified (mutations), and they can duplicated and rearranged and recombined, producing more variety of gene combinations. Gene duplication enables the production of genes with additional functions, because the original can get specialized in one direction and a copy specialized in another direction. Genes are often packaged in chromosomes, and chromosomes can split and join and get duplicated. The total set of an organism's genetic information is its genome, and sometimes whole genomes can get duplicated.

Genes are made of strands of nucleic acid, DNA and sometimes RNA, and these strands often reside in chromosomes. RNA is ribonucleic acid and DNA is deoxyribonucleic acid. They are composed of nucleotides assembled in linear sequence, and each nucleotide contains a phosphate ion (the acid part), a 5-carbon sugar (ribose/deoxyribose), and a nucleobase: adenine, guanine, uracil/thymine, and cytosine. RNA has uracil while DNA has thymine. Each nucleobase can be matched with a complementary one: adenine - uracil/thymine, guanine - cytosine, and each strand of nucleic acid can serve as a template for a complementary strand. Both DNA and RNA can be copied onto both of DNA and RNA. This mechanism was first proposed by Watson and Crick in 1953, and it has since been abundantly verified.

Genetic information is translated into protein sequences with the help of transfer RNA, which matches an amino acid (protein building block) to a triplet of nucleotides or codon. This assembly is assisted by ribosomes, structures of RNA and protein, which act as a workbench. Proteins are composed of linear sequences of amino acids, though some proteins contain several amino-acid chains, and some proteins contain additional sorts of molecules (coenzymes). Proteins serve a variety of functions, like enzymes (biochemical catalysts), structural material (cytoskeleton, collagen, keratin, etc.), mechanical systems (cell-membrane pumps, actin and myosin in cell division and muscles), and gene regulators, switching other genes on and off, controlling those genes' expression. Enzymes are important in biosynthesis (assembly of biological molecules), energy metabolism, and various other functions. RNA can act as an enzyme (ribozyme), and it also can be involved in gene regulation. Though DNA can also do these things, in the wild it is only known for being a master copy of genetic information.

To make a protein, a messenger-RNA strand is copied from the gene sequence's original DNA, and that in turn gets operated on by the ribosomes and transfer RNA's to make that protein. This is the "central dogma" of molecular biology, first stated by Francis Crick in 1958.

Cellular organisms all have DNA heredity and DNA-to-RNA-to-protein systems, and they are enough alike to indicate that they had a single origin. All cellular organisms have very similar nucleic-acid-to-amino-acid translation tables, for instance. Some organisms are not cellular, however: viruses. They depend on their host cells to replicate their genetic material.

Organisms' genomes contain not only genes and gene-regulation sequences, but often a lot of "junk DNA" without a clear function, if any function at all. Many organisms' genomes contain large quantities of junk DNA, though small, fast-reproducing ones like bacteria often have little junk DNA. Some kinds of junk DNA have recognizable origins, however, like pseudogenes and transposons ("jumping genes"). Pseudogenes are non-functional copies of genes, and transposons are bits of genetic material that copy themselves into a genome, sometimes a large number of times.

The DNA-RNA-protein system looks like a clear case of irreducible complexity, so biologists have proposed that one of them had evolved from the others or else emerged as an add-on. Proteins are very versatile, and the simpler amino acids can easily be made prebiotically, but it's hard to make a template reproduction mechanism work for proteins. This leaves RNA and DNA. Of the two, DNA is more stable, but its building blocks are made from RNA ones, by turning ribose into deoxyribose and uracil into thymine, and RNA has several functions, while DNA has only one. This has led to the "RNA world", where RNA acted as both gene and enzyme. But though nucleobases can be made prebiotically without much trouble, it is much more difficult to make ribose, and there are serious proposals that RNA had had a predecessor.


It has long been evident that organisms produce other organisms very similar to themselves. Dogs give birth to dogs, and similar-looking dogs at that, and likewise for cats, but dogs don't give birth to cats or cats to dogs. However, organisms do not exactly resemble their parents, and they often seemed to be blends of their parents. Beyond that, many people believed such notions as Lamarckian inheritance, like the Lamarckian genetic engineering of Genesis 30.

That was a major problem with Charles Darwin's idea of natural selection - it would not work very well with blending inheritance. Darwin himself accepted that Lamarckian evolution can happen, and he proposed that inheritance happens by "pangenesis", every part of an organism contributing some tiny thingie with genetic information.

One of his contemporaries, the monk Gregor Mendel, discovered discrete inheritance with his experiments in crossbreeding pea plants. He discovered that some alleles or versions of genes override other versions, thus discovering dominant and recessive gene versions. It was apparent from his work that genotype (what genes) and phenotype (outward features from gene expression) need not match.

His work did not get much interest, and it was buried in a rather obscure journal, until around 1900, when biologists Hugo de Vries, Carl Correns and Erich von Tschermak independently rediscovered discrete inheritance. But it was not until the 1930's that genetics and natural selection were successfully reconciled, resulting in the "Modern Synthesis" of evolutionary biology.

But around then, genetic research in the Soviet Union suffered a big setback at the hands of Trofim Denisovich Lysenko and his followers. Trofim Lysenko was a plant breeder and a quack geneticist who claimed that he could breed much higher-performing crop plants than mainstream biologists could. He claimed that genes do not exist, that heredity works by pangenesis, and that his experimental treatments could get inherited -- Lamarckian genetic engineering. Visiting biologists found him hopelessly ignorant and convinced that statistical testing is a waste of time.

His followers denounced mainstream genetics as Mendelist Weismannist Morganist bourgeois idealism, and Lysenko and his followers got the favor of important Soviet Communist Party officials, including Joseph Stalin himself. By 1948, the Lysenkoites' triumph was complete, and the remaining geneticists meekly recanted their "errors".

But while Lysenko was ridiculing the notion of a hereditary substance, Western biologists were closing in on its nature. In 1944, Oswald Avery discovered the nature of a substance that could transmit certain genetic information from one bacterium to another. Rather surprisingly, it wasn't protein, as everybody had expected. It was DNA, and nucleic acids had seemed monotonous. Later biologists expanded on his work, clinching the case. In the early 1950's, Watson and Crick got to work on the question of the structure of DNA, using analysis techniques that had already been successful for proteins. After working through several possibilities, they discovered the famous double-helix structure, with complementary bases paired between the ribose-phosphate backbones. In their 1953 paper, they coyly hinted at the template mechanism for reproduction, and that turned out to be correct.

Back then, biologists had lots of elegant ideas about how to map genetic information onto protein sequences, but when the actual translation was discovered, it was relatively dumb: three nucleotides map onto one amino acid.

In the 1960's, biologists started sequencing lots of proteins, and later, genes. They found that one could reconstruct organisms' family trees from their gene and protein sequences. A remarkable consequence was that many genetic changes are only weakly selected or not selected at all, that much molecular evolution was "neutral selection" or genetic drift.

Around then, Lynn Margulis revived the endosymbiosis hypothesis of the origin of mitochondria and chloroplasts, the hypothesis that they were descended from separate free-living organisms. They were known to have genetic material, and a common hypothesis back then was that that was genetic material that had failed to get incorporated into the cell's nucleus. But as their genes got sequenced, it became evident that mitochondria and chloroplasts were genetically closer to various prokaryotes than to their host cell's nucleus. Mitochondria are closest to "alpha-proteobacteria", like root-nodule bacteria, and chloroplasts to cyanobacteria.

In the 1970's, Carl Woese and his colleagues proposed that there was a deep split in the prokaryotes, between Eubacteria (most "normal" prokaryotes) and Archaea (a motley collection of oddballs often found in extreme environments). The eukaryote information systems turned out to be closest to Archaea, while various metabolic genes turned out closest to Eubacteria. Aside from the origin of mitochondria and chloroplasts, the origin of eukaryotes continues to be a mystery.

If one can sequence genes, then why not sequence whole genomes? The first genome sequenced was in 1977, that of bacteriophage (bacterium-infesting virus) phi-X-174, 5386 nucleotides coding for 11 proteins. The first genome of a cellular organism sequenced was in 1995, that of the bacterium Haemophilus influenzae, about 1,830,137 nucleotides long. Craig Venter worked on that one, and he helped produce a draft version of the human genome by 2001. Sequencing technologies have continued to improve, with numerous species now having sequenced genomes.

Though it is well understood how to get from genes to proteins, going from genes to macroscopic body features has been an much more difficult problem. But here also, there has been some progress, like the discovery of several development-control genes, like Hox genes in animals and MADS-box genes in plants. Hox genes are expressed in stripes from the nose end to the tail end, and these proteins in turn control the expression of other genes. First discovered in Drosophila fruit flies, it was discovered to be well-conserved across the animal kingdom, but with variations here and there.