John S. Wilkins posted Entry 1518 on October 1, 2005 01:00 AM.
Trackback URL:

A study in Science has returned biological methods to linguistic evolution in a reversal of history, and concluded that one can, within limits, reconstruct the history of language.

Charles Darwin was not the first person to suppose that historical evolution could be recognised by homologies and represented by tree diagrams. That honour goes to Sir William Jones in 1797, although the tree idea was later.

Jones argued that one could compare cognate terms and infer a historical relationship between languages and this has become the foundation of modern philology. For example, words that are based on the idea of “knowing” (including, as it happens, “idea”) generate a tree of Indo-European languages. [And like biological evolution, there are “creationists” who think that all language was created in Sanskrit.]

Now, a study in Science has returned biological methods to linguistic evolution in a reversal of history, and come up with some interesting conclusions.

This paper is not the first to do this, of course. Merritt Ruhlen previously argued that all human languages could be traced back to a last common ancestral tongue, presumably the original language in Africa. However, many linguists think he not only chose cognate terms too broadly to bolster his reconstruction, but he also overlooked a singular problem for all historical sciences: time erases information. Still, the idea of constructing a phylogenetic tree for language is feasible, to some depth. What that depth is, and the rate at which words evolve, is the subject of the new paper.

The current study [available here but only by subscription to Science] is by Dutch and British researchers at Max Planck Institute for Psycholinguistics, the Center for Language Studies, Radboud University, and the Leverhulme Centre for Human Evolutionary Studies, University of Cambridge, headed by Michael Dunn. They analysed over 125 terms with more strict criteria for homology than Ruhlen had (they used structural rather than cognate data to prevent bias1), from 16 Austronesian and 15 Papuan languages in the west Pacific and Papuan area.

This was a good choice, since this is both an area where the dates of settlement and migration are known fairly well, and is also a region of major linguistic complexity. And what they came up with was that one can only go back around 8000 years ± 2000 years.

That’s not bad, though. 8-10 thousand years ago was just after the late Pleistocene; the period ended by the low point of the last ice age. The seas were as much as 320 feet lower, and many of the islands of Melanesia and Polynesia could be walked between.


Dunn and colleagues found that the evolutionary tree of these languages suggested that the two major groups of the study - the Island Melanesian language group and the Papuan group - evolved from a common ancestor about that time, when the levels were lower. And an anomalous relationship of Bismark Islands languages to the rest can be explained by the fact that Bougainville and the Solomon Islands were united into a single landmass when the sea level was lower, while the Bismarks were always isolated by deep water.

The rates at which new words are coined and old words change turns out to be a power law distribution of sorts. This is what makes it hard to recover history past the depth of time Dunn and co. covered. It is also what makes it hard to work out long distant sequences of biological evolution, although genes and molecules chaneg far slower than languages. Russell Gray, in a commentary piece offers this graph:


As time erases the homologies, history disappears. This means that there may be some things we will never know the course of. [That’s no reason to think they didn’t happen, though, even if we don’t know them. Reality is not constrained by what we can know.]

People often think evolution is all about natural selection. But that is only one small part of it. The really interesting thing, at least to me, about evolution, is the evolutionary tree - phylogeny. And the methods used by biologists are general tools for understanding the past.

One commonly claimed disanalogy between cultural and biological evolution is that culture is horizontally as well as vertically transmitted. In biological terms, there is a lot of cross-lineage borrowing. But as a recent item noted, this is increasingly recongised in biology, too, and it is noteworthy that this does not seem to have muddied the phylogenetic “signal” in this case. History is lost, but some is retained. We need not become pure pessimists about knowing the past, in biology or culture.

1. Dunn M, Reesink G, Terrill A. 2002 “The East Papuan languages: A preliminary typological appraisal” Oceanic Linguistics 41 (1): 28-62 Jun - Thanks to Gary Hurd for the ref.

Commenters are responsible for the content of comments. The opinions expressed in articles, linked materials, and comments are not necessarily those of See our full disclaimer.

Comment #50576

Posted by Heathen Dan on October 2, 2005 7:02 AM (e)

[And like biological evolution, there are “creationists” who think that all language was created in Sanskrit.]

Why go to other cultures when creationists themselves have their pet theory called “the collapsing babel tower linguistic diffusion theory?”

Comment #50579

Posted by Daniel Morgan on October 2, 2005 7:39 AM (e)

The sad thing is that some of these unthinkers will see this as evidence substantiating Babel. They fail to see that absence of evidence (inability to reconstruct beyond time constraints) is not evidence of absence (no languages existed beyond our ability to reconstruct them).

Comment #50588

Posted by Michael Hopkins on October 2, 2005 8:48 AM (e)

I noticed a rooted “cladogram” in the figures above. Did the authors use some other language as an out group?

Comment #50590

Posted by John Wilkins on October 2, 2005 9:21 AM (e)

There’s no mention in the paper except that they already knew the languages were part of the western Austronesian clade, so I presume they picked a suitably distant outgroup.

Comment #50608

Posted by Gary Hurd on October 2, 2005 12:32 PM (e)

And unlike in biology, while some languages are may be structurally simpler than others, no family can be said to represent a more primitive type than others. This poses serious problems for outgroup definition. The alternative methods are to leave trees unrooted, or to use midpoint rooting. The languages currently attested in Island Melanesia are presumably a subset surviving from a possibly much larger set of lineages and degrees of divergence in linguistic lineages over time is highly unpredictable. Midpoint rooting makes assumptions about constant rates of change which are unwarranted in the case of languages. To avoid the assumptions required by an outgroup, all analyses here use unrooted trees.

This is from their online supplemental data.

Comment #50609

Posted by Michael Hopkins on October 2, 2005 12:45 PM (e)

They say they use unrooted trees and yet the left cladogram is most definately rooted while the one on the right is unrooted unless I am seriously missing the mark. I also I really don’t like the idea an outgroup is “more primative type.” My understanding of an outgroup is that it should not be descended from the common ancestors of the taxa one wishes to classify. Humans could be an outgroup for birds for example. (Of course humans might not be the most useful outgroup but that is different issue.)

If I am misunderstanding the cladistic method then please someone correct me.

Comment #50658

Posted by Gary Hurd on October 2, 2005 7:31 PM (e)

If you mean the left side of Fig. 3, it looks like a typical cluster analysis output. They are all “rooted” in the sense that there is always a distance/similarity cutoff at which all objects are joined. This bothers some people because it is then a subjective descision as to where to set the “real” cutoff. As they want to search for a descent tree in a rather “old fashion” sense, I think that they ‘dropped a stitch.’ My guess is that their unrooted tree is OK none the less.

Comment #50661

Posted by Gary Hurd on October 2, 2005 7:47 PM (e)

The caption reads:

(Left) Reconstructed phylogeny of the languages of the Meso-Melanesian, Papuan Tip, and North New Guinea groups based on the linguistic comparative method (10, 27). (Right) Unrooted parsimony tree showing relationships among the Meso-Melanesian and Papuan Tip groups based on grammatical traits only (that is, discarding abundant lexical evidence) (the figure shows reweighted and raw bootstrap values).

Actually, it is rather interesting that the cluster analysis is so comparable to the cladogram. As they pointed out:

The two trees show a high degree of concordance, with monophyly in both major taxa and the similar geographical structuring of within-taxon diversity.

Which means the data are so highly structured that almost any analysis should capture it.

This is vaguely related to the paper I am supposedly finishing today. (Editors are such fools- BWWAAHHAHHAHHHHHHAH yerp! Hic. Actually, I am so screwed).

Comment #50873

Posted by John Wilkins on October 4, 2005 3:08 AM (e)

Mike Hopkins wrote: I also I really don’t like the idea an outgroup is “more primative type.”

“Primitive” has a special meaning in phylogeny, and it doesn’t have to do with it being more or less complex, advanced or fit.

Any character or taxon that is prior in the evolutionary tree is considered “primitive”, or more technically, a plesiomorphy, the English of which is “underived”. Anything that is later in the tree and has changed is considered “evolved”, or technically an “apomorphy”, the English of which means “derived”.

Something can remain unchanged, in which case it remains underived (plesiomorphic), or it can change, in which case it is derived (apomorphic). On the basis of derived and underived characters, cladograms and evolutionary trees are constructed.

But the primitive condition might be a more complex trait - for example, the flight mechanisms of the ancestors of a secondarily flightless bird. In the case of the Penguin family, they are “derived” as flightless, and the “primitive” condition is the ability to fly.

So it is not a valuative notion. Think of it etymologically - “primitive” just means “came first”. Only with the pre-Darwinian notion of evolution did that imply it was less “advanced”.