Recently in Evolution Category

Phylogenomic Fallacies

| 73 Comments

This is the fourth in a series of articles for the general public focused on understanding how species are related and how genomic data is used in research. Today, we talk about some common fallacies in phylogenomics.

Where do humans fit on the evolutionary tree of life? This is an important topic in evolutionary biology. A lot of people believe humans are the most important and highly-evolved organisms, but in reality, all modern species are equally evolved. Our natural tendency to assume that humans are evolutionarily superior has led to a few misconceptions about phylogenetic trees.

plants.png

To understand the first misconception, let’s look at a phylogenetic tree of plants (from “The Amborella Genome and the Evolution of Flowering Plants”). Eudicots and monocots are two classes of flowering plants, or angiosperms, and the plants in black are non-flowering plants. The term “basal” refers to the base of a phylogenetic tree, and a basal group is a species that branches closer to that base. The authors chose to label the angiosperms that are not eudicots or monocots as “basal angiosperms.” But this label is arbitrary; all the angiosperms are equidistant from the common ancestor and thus equally evolved. We sometimes tend to give more weight to branches that contain the species of interest and call other branches basal, almost assigning them a lesser importance. In this case, the species of interest is plants that consist of many foods that humans eat; a species is often deemed more important as it relates to humans. But modern species are equally evolved from a root common ancestor regardless of when their branch diverged from the common ancestor. To avoid confusion, it might be best to eliminate the “basal” term altogether.

This type of thinking also leads us to place humans at the end of phylogenetic trees. However, this placement is arbitrary and trees can be drawn in many equivalent ways. For example, compare a tree of primates with the branches rotated. The tree on the left, with humans at the top of the tree, is one you might see more often. But both of these trees are actually identical, and the relationships between species that can be inferred from the tree on the right is the same as the relationships in the tree on the left. Species at the tip of a tree are equidistant from the root common ancestor, so they can be considered evolutionarily equivalent.

primate tree 1.png

primate tree 2.png

Similarly, a common misconception is that humans evolved directly from monkeys. Monkeys, though, are modern species just like we are and have been evolving and changing over time. The common ancestor we share with monkeys may have looked much different than monkeys do now. This assumption that modern species represent an ancestral state of human evolution is what T. Ryan Gregory calls the platypus fallacy. Gregory uses the example that we can’t examine the traits of platypuses and think that humans at one point in their evolution possessed these same traits. We can no more infer the traits of human ancestor species from platypuses than platypuses can infer the traits of their ancestors from us.

Human-centered thinking is very prevalent in our society, affecting our laws, religions, and customs. While it probably influences all of us on a personal level, it can lead to false conclusions and misconceptions in science, like thinking that humans are the most highly evolved species. But all modern species are evolutionarily equivalent because they have been evolving for the same amount of time. Eliminating this fallacy will enable us to better understand the evolutionary process.

For more information on basal groups, check out: “Which side of the tree is more basal?, Krell, Frank et al. Systematic Entomology (2004).

This series is supported by NSF Grant #DBI-1356548 to RA Cartwright.

Libellula pulchella forensis

| 5 Comments
IMG_0982Libellula_forensis_600.JPG

Libellula pulchellaeight twelve-spotted skimmer, Elmer’s Two-Mile Creek, Boulder, Colorado. See here for eight-spotted skimmer, L. forensis.

The Hebrew Bible says that God made humans from dust,* but maybe it was a slurry of clay and water. That is a tentative conclusion you might draw from an experiment that used a (very) high-powered laser beam to zap a suspension of clay in an aqueous solution of formamide, a very simple organic compound. The result has been reported in the press, but there is a somewhat more-precise article in Science magazine. (You may find the abstract of the original article here and the supporting information here. I did not get access to the full article.)

In a nutshell, a team at the J. Heyrovský Institute of Physical Chemistry in Prague used a laser that can produce up to 1 kJ in a 300 ps pulse,** irradiated the suspension, and produced adenine, cytosine, guanine, and uracil, which are the bases of the RNA molecule. And apparently not a drop of thymine, one of the bases of DNA. The experiment is supposed to simulate the bombardment of the early Earth by comets and presumably supports the hypothesis that an RNA world came first.

_____
* Actually, Job, Isaiah, Psalms, and I imagine elsewhere say clay, as in, “We are the clay, and you are our potter.” (Don’t get excited; I consider the fact to have no significance whatsoever.)

** I am a laser physicist and wrote my thesis on laser-produced plasmas, so you must forgive me for somewhat stressing the laser, which to this day gives me a certain amount of pulse envy.

Coyote Buttes

| 31 Comments

Photograph by Vivian Dullien.

VivianDullienCoyoteButtesAriz_600.jpg

Coyote Buttes, Arizona.

Analyzing the Genome with Statistics

| 19 Comments

This is the third in a series of articles for the general public focused on understanding how species are related and how genomic data is used in research. Today, we talk about the challenges of using statistics to analyze phylogenomic data.

Suppose you were a door manufacturer trying to figure out the average height of a population living in a certain country. You might conduct an experiment where you ask a group of people to report their height. You would then assemble those measurements in a data set. But in order to study this data set and draw conclusions you would need to analyze it using statistics. For example, how tall should your door be in order to fit 95% of people in the country? How many people do you need to survey to accurately represent the total population? These questions can be answered with statistical analysis.

Because acquiring data from experiments can be costly and time-consuming, we often use small data sets to represent a larger population of interest. In our height experiment, we would not be able to ask every single person in the country his or her height. We would choose a group of people under the assumption that they accurately reflect the population as a whole. However, when we are trying to map out the evolutionary history of organisms using data from sequenced genomes (phylogenomics, which we talked about last time), we need to change our method of analysis.

Let’s look at the treeshrew, for instance. It looks like a rodent but actually shares some internal similarities with primates (studied by Sir Wilfrid Le Gros Clark in the 1920s), like brain anatomy and reproductive traits. To figure out if the treeshrew is more similar to rodents or primates, we could sequence its genome and, using statistics, compare its genes to those of rodents and primates. But typical statistical models are based on subsets of populations, while by definition, genomic sequencing gives us a complete data set - all of the treeshrew’s genes. These typical models may not be suitable for interpreting genomic data.

The treeshrew. Source: Wikipedia

Before reaching a conclusion about the tree shrew, or any set of data, scientists must consider precision and accuracy. Multiple measurements of the same quantity are precise if they are similar to each other. Another way of saying this is that their variance is small. On the other hand, measurements are accurate if they are close to the true value of what they are trying to measure. For genomic data, we need better statistical tools to ensure that the accuracy of our conclusions matches the precision characteristic of these huge data sets.

Larger data sets provide more precise conclusions than smaller ones. For example, when we ask more people to report their height, we are more confident that our sample represents the variability of the actual population. Similarly, we analyze more genes in the treeshrew’s genome to increase our confidence that our conclusion is precise. However, our results might not necessarily be accurate; big data sets may lead us to draw incorrect conclusions with high confidence. The treeshrew’s genome contains some genes that are more similar to rodents’ genes and some that are more similar to primates’ genes (Fan et al., Nie et al., and Xu et al.), and with so much data we could find that the treeshrew is most similar to either group with high confidence. We need analysis tools that will tell us which genes give the correct answer.

Why are conclusions from data sometimes inaccurate? Statistical biases are external factors that produce consistent error in our measurements. Biases have many sources, including faulty experimental design, violation of assumptions made in analyzing the data, and errors in the data collection process. Bias in our height experiment might arise if we unintentionally ask the height of more women than men, causing our estimate of the average height to be lower. But in the case of phylogenomics, we are likely to have biases because of our relative lack of knowledge about the genome: we don’t always know which genes to analyze or the correct way to model the data. For example, some models assume that evolution followed the same pattern throughout all time, but this most likely was not the case.

Furthermore, the process of genome sequencing and analysis itself may create error, especially in the reconstruction of the genome and the alignment of genes for comparison. If we are comparing the genome of the treeshrew to the genomes of primates and rodents, it is difficult for us to know which genes are correlated between species when we are looking at a data set of billions of points. We might use a probability model to determine correlated genes, but all models are at least somewhat incorrect and introduce bias. In smaller data sets, biases are offset by a low precision and relatively small confidence in reaching conclusions. However, in genomic-size data sets, even small biases can be amplified and lead to high confidence in the wrong answer and incorrect phylogenetic trees.

When analyzing phylogenomic datasets, we need to use analyses that are appropriate for large data sets. This will unlock the potential of phylogenomic research to draw unbiased conclusions, like figuring out the correct phylogenetic classification of the treeshrew (still a topic of controversy among evolutionary biologists). However, phylogenomics is such a young field that these tools do not yet exist. When they are developed, we can increase our chances of correctly classifying species’ relationships and discovering the true history of evolution.

For more detail, check out: “Statistics and Truth in Phylogenomics”, Kumar, Sudhir et al. Molecular Biology and Evolution (2011).

References:

Fan, Yu, et al. “Genome of the Chinese tree shrew.” Nature communications 4 (2013): 1426.

Nie, Wenhui, et al. “Flying lemurs-The’flying tree shrews’? Molecular cytogenetic evidence for a Scandentia-Dermoptera sister clade.” BMC biology 6.1 (2008): 18.

Xu, Ling, et al. “Evaluating the Phylogenetic Position of Chinese Tree Shrew ( Tupaia belangeri chinensis) Based on Complete Mitochondrial Genome: Implication for Using Tree Shrew as an Alternative Experimental Animal to Primates in Biomedical Research.” Journal of Genetics and Genomics 39.3 (2012): 131-137.

Our next installment will cover some misused terminology in phylogenomics. This series is supported by NSF Grant #DBI-1356548 to RA Cartwright.

Lenticular clouds

| 11 Comments
IMG_1154Cloud_600.JPG

Interesting cloud formation, Boulder, Colorado. The camera is facing south, and the wind is coming from the west, or right.

One hour later, in Golden,

Philae craft lands on comet

| 70 Comments

Rosetta headquarters announced a few moments ago that the Philae lander is now sitting on the surface of the comet and transmitting data. Unfortunately, the European Space Agency is not exactly releasing a trove of pictures. I know this is not biology, but where did you think those hydrocarbons came from in the first place?

Phylogenomics: Deciphering a Billion-Piece Puzzle

| 146 Comments

This is the second in a series of articles for the general public focused on understanding how species are related and how genomic data is used in research. Today, we talk about phylogenomics, the application of whole genome sequencing to understand evolutionary relationships among species.

DNA Chemical Structure. Source: Madeleine Price Ball

The haploid human genome is 3.2 billion DNA bases long, and each base can be one of four nucleotides: A, T, C, and G. Uncoiled, the DNA in a single human cell would be 2 meters long, and the DNA in a human body would stretch from the sun to Pluto multiple times. With 3.2 billion bases, each person’s genome is unique, and this plays an essential role in shaping our physical and mental individuality. However, despite being unique, each human genome is very very similar, due to our shared ancestral heritage. Similarly, species that share a recent ancestral heritage also have similar genomes. Species that are distantly related are likely to demonstrate significant differences in their genomes. This is why, as we discussed last week, evolutionary biologists compare traits and genes to determine the relationships of different species.

Unfortunately, some genes give us the wrong answer about how species are related. A section of a gene can be identical for two species due to independent mutations. After all, any given base can only mutate into one of three other bases. Chances are the same mutation could happen twice, or multiple mutations can produce the same sequence. Consider two species that are distantly related; one contains an AGA fragment, while the corresponding fragment in the other species is TGT, i.e. they differ in 2 out of 3 positions. As these species evolve, by chance the first species may experience a change in the first position such that AGATGA, and the second species may experience a change in the third position such that TGTTGA. Now, these two sequences look the same so you might think the species share a recent common ancestor; however, it is only an accident of biology that they appear closely related. Because some fragments may be identical due to independent mutations and not shared ancestry, estimating species relationships with using whole genomes is better than just a few genes. The more information we have, the more likely we are to figure out species’ relationships correctly.

The cost to sequence whole genomes has fallen from $100 million to $1000 in just the past twelve years. It now takes days to sequence a genome compared to the 13 years it took for the first human genome. The challenge now is not to obtain the data but to compare all the billions of base pairs in one genome to those in another. Current sequencing methods, while fast, can only read the genome by dividing it into millions of short fragments, which must be reassembled like an enormous puzzle. Researchers then have to figure out which genes correspond to one another in different species’ genomes. These comparisons are challenging because genes in one genome might be in a different order, on different chromosomes, or missing completely in another species’ genome.

Biologists are beginning to use genomic information to understand how species are related and measure how fast or slowly different genes evolve. Then in turn allows us to understand how evolution happens. For example, using genomic information we can figure out how genes mutate, characterize and diagnose genetic diseases, and track harmful pathogens. But before that can happen, we need to address the difficulties of analyzing these large genomic datasets. You might think that more data is always better, but having a lot of data can lead us to have very high confidence in the wrong answer. In a pool of thousands of genes, we need to find the ones that tell us the right answer.

Next week, we’ll discuss statistical challenges associated with big data analysis, especially as it relates to phylogenomics. This series is supported by NSF Grant #DBI-1356548 to RA Cartwright.

I started this post thinking I’d write a review of Andreas Wagner’s recent book “Arrival of the Fittest: Solving Evolution’s Greatest Puzzle” (links below), an engrossing book about how biological innovation arises from the structure of metabolic, genotype, and protein networks, and how robustness–the stability of phenotypes in the face of underlying genetic variability–is critical in evolutionary innovations. But there are several excellent reviews already out there, so another would be redundant. I’ll mention only a couple of points I think worth emphasizing below the fold.

Phelsuma laticauda

| 5 Comments

Photograph by Tony Gamble.

Photography contest, Honorable Mention.

Gamble.Phelsuma_laticauda_dorsum.jpg

Phelsuma laticauda – gold dust day gecko.

The Family Tree of Life

| 92 Comments

In the next few weeks, we’ll be posting a series of articles for the general public focused on understanding how species are related and how genomic data is used in research. We start with a background on phylogenetic trees.

Imagine you could go back in time and meet your great grandmother or even your great-great-great-great-great grandmother, when they were your age. Would they look like you? Or would they look more like your siblings or cousins? Maybe you would all look a little different. Scientists try to figure out how the distant ancestors of apes, other animals, plants, and all organisms living today looked and behaved, much in the same way that people use a family tree to trace their ancestry.

primate-family-tree-780x520_0.gif

The common ancestor of great apes lived about 18 million years ago. Source: Smithsonian National Museum of Natural History http://humanorigins.si.edu/evidence/genetics

In evolutionary biology scientists use a type of tree called a “phylogenetic tree” to organize the history of how species descended from common ancestors. The closer two species are to a common ancestor on the phylogenetic tree, the more closely the two are related.

Take the phylogenetic tree of primates, for example. The common ancestor of apes lived about 18 million years ago. But over time, this one group branched off to form many different species, including humans, which have their own separate branch on this tree.

How did so many unique species develop from one ancestor? New branches formed by a process known as divergence. When groups of ancient organisms became geographically isolated from one another, either through migration or geologic events like earthquakes, each group began to develop its own unique set of physical attributes. Sometimes, by chance, a change in a characteristic enabled an individual to survive better in its environment and produce more offspring.

Perhaps individuals in one group with larger arms were better able to break open the hard-shelled fruits that were common in one region, while some individuals in another group had the ability to travel more easily through tall trees that offered protection from predators. Whatever the reason may have been, selection favored genetic differences that improved survival. Over time, this gradual process of isolation and selection produced distinct species, which in turn branched into more species.

The end result of divergence is many species, related in a tree-like fashion, and we display these relationships using phylogenetic trees. Scientists now use increasingly sophisticated methods to determine how species were related and build phylogenetic trees. In the past, scientists built these trees simply by comparing physical traits, like how many limbs an organism has or whether it has a tail. But with the recent surge in fast and affordable gene sequencing technologies, researchers today can directly compare species’ DNA to determine how they are related.

But analyzing entire genomes, with billions of DNA base pairs, presents its own unique set of challenges, and researchers often struggle to determine if the DNA differences they find between species are truly significant or are simply due to common variability. As computer software and statistical analysis become more adept at handling these challenges, our understanding of species’ relationships could change — providing exciting new insights into our family tree of life.

Check back next week when we discuss the differences between studying small and large datasets, and the challenges associated with big data analysis. This series is supported by NSF Grant #DBI-1356548 to RA Cartwright.

IMG_4248Eclipse_600.JPG

Pinhole-camera images of solar eclipse formed by spaces between leaves in canopy. According to Jon Grepstad, this phenomenon was explained by Aristotle. The eclipse is just ending; the picture was as close to total as it got here (Boulder, Colorado).

Aeshna cyanea

| 3 Comments

Photograph by Marilyn Susek.

Photography contest, Honorable Mention.

Susek.Dragon_Fly.jpg

Aeshna cyanea – southern hawker.

Beginning this week, we will run photographs every other Monday, so no picture next week; we no longer have enough honorable mentions and other miscellaneous photographs to continue posting a photograph every week. But polish your lenses (very carefully) and keep an eye out for the contest in the summer.

Cupido comyntas

| 1 Comment

Photograph by Robin Lee-Thorp.

Photography contest, Honorable Mention.

Lee-Thorp.Eastern Blue.JPG

Cupido comyntas – eastern tailed-blue butterfly.

Larus delawarensis

| 8 Comments
IMG_1104Gull_600.JPG

Larus delawarensis – ring-billed gull, Boulder, Colorado. There is right now a fairly large flock at Walden Ponds east of Boulder. They are too far away to get a picture, unless you like snapshots of an array of gray-and-white ellipses. But this one very kindly landed in a parking lot and posed long enough to enable this portrait.

On August 14, William Dembski spoke at the Computations in Science Seminar at the University of Chicago. Was this a sign that Dembski’s arguments for intelligent design were being taken seriously by computational scientists? Did he present new evidence? There was no new evidence, and the invitation seems to have come from Dembski’s Ph.D. advisor Leo Kadanoff. I wasn’t present, and you probably weren’t either, but fortunately we can all view the seminar, as a video of it has been posted here on Youtube.

It turns out that Dembski’s current argument is based on two of his previous papers with Robert Marks (available here and here) so the arguments are not new. They involve considering a simple model of evolution in which we have all possible genotypes, each of which has a fitness. It’s a simple model of evolution moving uphill on a fitness surface. Dembski and Marks argue that substantial evolutionary progress can only be made if the fitness surface is smooth enough, and that setting up a smooth enough fitness surface requires a Designer.

Briefly, here’s why I find their argument unconvincing:

  1. They conside all possible ways that the set of fitnesses can be assigned to the set of genotypes. Almost all of these look like random assigments of fitnesses to genotypes.
  2. Given that there is a random association of genotypes and fitnesses, Dembski is right to assert that it is very hard to make much progress in evolution. The fitness surface is a “white noise” surface that has a vast number of very sharp peaks. Evolution will make progress only until it climbs the nearest peak, and then it will stall. But …
  3. That is a very bad model for real biology, because in that case one mutation is as bad for you as changing all sites in your genome at the same time!
  4. Also, in such a model all parts of the genome interact extremely strongly, much more than they do in real organisms.
  5. Dembski and Marks acknowledge that if the fitness surface is smoother than that, progress can be made.
  6. They then argue that choosing a smooth enough fitness surface out of all possible ways of associating the fitnesses with the genotypes requires a Designer.
  7. But I argue that the ordinary laws of physics actually imply a surface a lot smoother than a random map of sequences to fitnesses. In particular if gene expression is separated in time and space, the genes are much less likely to interact strongly, and the fitness surface will be much smoother than the “white noise” surface.
  8. Dembski and Marks implicitly acknowledge, though perhaps just for the sake of argument, that natural selection can create adaptation. Their argument does not require design to occur once the fitness surface is chosen. It is thus a Theistic Evolution argument rather than one that argues for Design Intervention.

That’s a lot of argument to bite off in one chew. Let’s go into more detail below the fold …

Apis mellifera

| 8 Comments
IMG_4085_A_Mellifera_600.jpg

Apis mellifera – western or European honeybee, dining along with others on a milkweed flower. Apparently a melanic form, because Bugguide assures me that it is “just a dark one.”

Noctilucent clouds

| 5 Comments

Photograph by Kari Tikkanen.

Photography contest, Honorable Mention.

Tikkanen.Noctilucent_Clouds.jpg

Noctilucent clouds. Mr Tikkanen writes that these “are bluish clouds located in the mesosphere at altitudes of around 80 kilometers. Relative recent appearance and their gradual increase may be linked to climate change.”

Brachystola magna

| 20 Comments

Photograph by Ralph Arvesen.

Photography contest, Honorable Mention.

Ralph.Arvesen - Plains Lubber (Brachystola magna Girard).jpg

Brachystola magna – plains lubber, or western lubber..

Alluvial fan

| 11 Comments
IMG_4151AlluvialFan_600.JPG

Alluvial fan created by the torrential rainfall 1 year ago, as seen from the Visitor Center, Trail Ridge Road, Rocky Mountain National Park, Colorado, September, 2014. The meander at the bottom of the screen passes through the bed of Fan Lake, which was formed in 1982 when the Lawn Lake Dam burst and inundated the City of Estes Park.

About this Archive

This page is an archive of recent entries in the Evolution category.

Eugenics is the previous category.

Evolution Education is the next category.

Find recent content on the main index or look in the archives to find all content.

Categories

Archives

Author Archives

Powered by Movable Type 4.381

Site Meter