Wednesday, April 24, 2013

Linguistics: Measuring the Age of Living Languages

(I found this one in my drafts folder, it's been sitting there for about four months or so, so I figured I may just go ahead and post it after some slight editing - including incomplete sections and all. I may edit it a bit in the future.)

It is fairly common to hear - even from fairly educated people - that this or that language is the oldest language. Examples I have recently heard such claims for various African languages (especially ones with clicks) such as !Kung, or languages like Basque, Latin or Sanskrit. Sometimes, it is just a dialect of some language that is considered 'the oldest' - this or that dialect of English, Swedish or German is described as "the oldest dialect of X".

What does such a statement even try to say? What determines the age of a dialect that is spoken as a living language in the present? 

Of course, if we took an old text and read it out loud, the language encoded in that text is indeed centuries old - but aspects of our rendition would be newer: it is unlikely the reader would get the pronunciation identical to the way it was pronounced centuries ago, and similar problems go for intonation and possibly  even likely  how he would understand the meaning contained therein. It will probably be at least slightly misunderstood by the reader or listener: since the written form is a zombified instance of a linguistic utterance, the language we use to parse it no longer is exactly the same language as it was back in the day, its tissue has decayed: secondary connotations of the words have been lost, as have a whole lot of other things, so we are probably at a loss in trying to understand certain implicit details in such a text - simple things like whether it is sarcasm, dead serious or something else along those lines.

The Old Testament in Masoretic or LXX form probably is in a language more than 2000 years old, and clay tablets from Mesopotamia are in languages even older than that. But this is essentially the only situation in which it makes sense to speak of the age of a language, and as explained above, we then have zombified languages, where information loss already has set in.

However, with a living, spoken dialect, what aspect of the language are we speaking of when we say it is older than another language? Is it something to do with how well it has conserved the meaning of words over time? Is it how well the grammar has been conserved? Is it the conservation of pronunciation? Is it the pragmatics - the ways we use the language to express things - which is an important aspect, but one of the hardest to pin down? How would we go about measuring any of these in an objective manner, even?

We could pick a sort of objective thing - the point in time when it diverged from another language or dialect. In that case, the language could have gone through great changes every generation since it split and still be the oldest language out of two closely related ones!

Scenario: on an island, Island A, in the pacific, people speak a language. We will call it Islandean. As population grows, a group set off to settle another island – Island B  far away, which they have spotted during their frequent fishing expeditions. Contact between the populations on Island A and Island B is infrequent after the initial settlement. They both start out speaking Islandean, but as the amount of contact as been reduced, Islandean at A and Islandean at B therefore diverge, and a while down the line, the descendant versions Islandean, Islandean A' and Islandean' B (where ' marks "new version") have diverged enough not to be mutually intelligible. They are now two languages. A time appears, again, when Island A gets crowded, and its population sets off to colonize Island C. The linguistic divergence again sets off - both start speaking Islandean A', but as time goes by, Island A has Islandean A'', and Island C has Islandean C (which too is a derivative of Islandean A'). Going by a family tree model, Islandean B is the oldest of these languages - it split from its two relatives the earliest:

What happens if Islandean A' or Islandean C goes extinct? The most recent split that either of Islandean B' and Islandean A' have had from each other still remains unchanged at the root of the tree- yet we know a later split happened in Islandean A'/C, a split whose one branch just happened to terminate - should we then claim Islandean B' is the older one, since it's been diverging for two generations, while A' only diverged for one since its most recent sibling - regardless of this sibling having since gone extinct?

The time at which mutual intelligibility was lost - and thus distinct languagehood, if we go by some definitions -  might not be entirely trivial to decide, as different speakers probably would have different ability to quickly adapt their linguistic skills in order to understand the other language - and it is possible one of the languages would be more difficult for speakers of the other to understand. 

Let us ignore that kind of tricky question for now, and instead decide that the 'older' language is whichever one is the more 'conservative' among them. As I already pointed out, that's not trivial. Do we count the number of sound changes, and pick the language that has had the fewest of them? The number of semantic changes? The number of grammar changes? Should we assign different significance to different kinds of grammar/sound/semantic changes? 

Even if we roughly can guess what the ancestral language was like, we're still taking a stab in the dark when it comes to measuring these things. There may have been countless changes that haven't altered any structural features of the languages, and there may have been changes that we cannot even be sure whether they happened at all, since later changes may have eradicated their results or made further structural changes. Trying to measure the age of a living language is a meaningless task and this is why real linguists do not talk about which dialect or language is older than the other.

