Friday, June 12, 2015

On Historical Linguistics: Part 2

The model I presented in the previous post only presents a way of structuring the findings once we have them. The next question then is what changes are the most useful ones to trace?

People like to think that similarities in vocabulary are a reliable indicator - this is probably why the belief that English descends from Latin is quite popular, even to the extent that ignorant teachers tell their pupils this as a fact. The problems with just looking at individual words is that words are borrowed quite freely from one language to another (or, well, at least it seems this is the case in Eurasia - for some reason, South American native langauges seem to have had more free exchange of grammar for whatever reason).

Another type of similarity is typological similarity. Typology is the study of the "properties" of languages, things like "what order do subject, verb and object come in", "does the language predominantly use suffixes, prefixes or neither", "does the language have prepositions or postpositions", and a lot of similar stuff. But here we have a rather interesting problem.

For some reason, various features tend to cluster together: it's not unusual for languages with this or that feature also to have these particular other features - and it seems this follows from some property of our brains or as some consequence of something more subtle about language itself.

Thus, even languages that recently have changed in certain ways, and where we can know contact with a language that already had a certain property is not the cause for the change, we can see that the other features often tend to hobble along in the same direction. Of course, language contact can make this even more powerful – unrelated languages can acquire features by influence, one from another. In fact, large areas of such belts of influence have been identified, and are called 'sprachbunds' or 'convergence areas'. So we find that from the perspective of historical linguistics, the fact that some pair of languages do a lot of things in similar ways (suffixes, SOV, postpositions, ... or whatever other bundle of features you can imagine) does not necessarily tell us anything about whether they are related or not.

However, one type of change is very amenable to a hierarchical analysis - sound changes. Over time, languages have sound changes happening to them. These can sort of be expressed as "search-and-replace". We represent the language in a textual form. (Note: we are of course rather used to this nowadays, given that we have literacy and all, but even a few centuries ago this was not a very common skill. Language is primarily spoken, and representing it as text when dealing with the history of spoken languages does kind of deserve mention of this fact.) We can then, for instance, do a sound change along the lines of this:
t → th
d → t
A word such as tin would come out as thin. Now, in case th did not exist in the language previously, the language has not lost any phonological distinctions - all words that previously were distinct still are distinct. But let's imagine another change:
t → t
d → t
This change removes a distinction, and thus we lose some "knowledge" about how the language was previously - after this, we cannot tell by just looking at a word whether it previously had a t or a d where there now is a t.

There also are conditional changes. These are basically changes that consist of rules where one sound is changed depending on sounds nearby, e.g.
k → t͡ʃ, / _e, _i
This would replace k with t͡ʃ when followed by e or i, a sound change that has happened in many languages worldwide. Essentially, though, such changes can be written like this instead, to remove the need for the notation with / _e, _i:
ke → t͡ʃe
ki → t͡ʃi 
Notice, however, that the k → t͡ʃ, / _e, _i notation is more succinct, and we also are less likely by accident to forget some particular instance. In fact, the change there could possibly be expressed even more powerfully as k → t͡ʃ, / _V, when V is a front vowel.
Other contextual things that may be of relevance are whether a sound is in word-initial or word-final position, whether it's before or after or even in the same syllable as the word stress, whether it's in such a position with regards to some weaker stresses of the same word, etc. To make the notation able to deal with such, symbols for stresses of different types is all it takes. Similar additions coding for whatever feature we need to trace should be easy to add as well. The notation that expresses contexts with / [surrounding sound]_[surrounding sound] is more compact than writing out every single substitution separately, but I am not going for a full Historical Linguistics 101 course here, so I will not regale you with such details.

A good principle:  
i) shared innovations indicate closer relationships
ii) shared retentions do not indicate anything very interesting with regards to distance
Why would this be the case? There are lots of possible changes - a shared change is thus somewhat a priori unlikely. Anytime some part of a language has not been hit by a change we will have a retention, though, so retentions by their nature will occur a lot more often than shared innovations.

Although I previously mentioned that the lexicon is somewhat unreliable, an analogy based on the lexicon might be better. Let us imagine we have a small island on which there are two languages. We do not know whether these first entered the island, and then diverged, or diverged and only then entered the island. We find that there's an animal on the island that does not exist elsewhere. It also turns out that they have very similar words for it, words that do not exist in any of the related languages outside of the island. How likely is it that they both came up with the same word independently? Fairly unlikely.

If they have different words, this does not necessarily tell us anything at all. One - or even both - of the languages might have come up with new words more recently. If they have the same word, we need to account for that: either, one of the languages has borrowed it from the other after arriving on the island separately from the other group (who borrowed from whom does not necessarily tell us who were there first, however!), or they arrived as one language that only more recently has differentiated into two.

However, one thing that would more clearly suggest that they did not arrive together is if one of the languages shared a lot of innovations with some language (or group of languages) outside of the island, and the other didn't - or even better if it shared innovations with another group of languages altogether. The likelihood that all of these groups started diverging from a shared origin at the same time, and some of the groups in isolation from the others had done the same innovations is very low.

A language probably goes through far fewer sound changes than lexical changes - by an order of magnitude, at the very least - through any time span. Sound changes are further not really "loaned" after the fact - they tend to spread through a speaker community - and sometimes beyond it - but there isn't really any way in which they could be loaned. Words are loaned, not processes that happened in the past. A process that is going on can spill over, a process that has already happened is not relevant any longer.

Some changes seem to be fairly common cross-linguistically. We can observe, for instance, that historical *k has become t͡ʃ in certain very similar positions in English and Swedish as spoken in Finland. We know Swedish as spoken in Finland is closer related to Swedish as spoken in Sweden than it is to English, however. The same change has happened in a lot of languages, but Swedish and English are similar enough that one might find the shared change somewhat significant. It's not significant, really – Swedish as spoken in Finland and English don't have any particular affinities. However, such common changes might seem to undermine our use of sound change and shared innovation. There is a solution, however!

The order in which changes have happened may leave traces that make it possible to resolve what the order was. Languages in which a series of early changes have happened in the same order  from some ancestral form (after which more divergent changes have happened) are thus very likely to be more closely related than languages in which no such shared order exists.

An example might be helpful here. Let us imagine a language L and a sound change, lets call it A, where a final syllable having /i/ as its vowel causes the vowel of the previous syllable to become fronted, so e.g. /kaki/ → /keki/. (/a u o/ are 'back vowels' and /i e/ are front vowels; why they're called that can be learned from books on phonology, and I will not get into it. Suffice to say, this has to do with articulatory features of the vowels). Another change I already mentioned has k turn into t͡ʃ before front vowels, let's call it B.

If A  happens before B, /kaki/ will end up as /t͡ʃet͡ʃi/, but if Bhappens before A,we get /ket͡ʃi/. We might find that some related language K has /kaki/ or /keki/ or /keke/ or whatever for a similar meaning, and we can posit with great likelihood that the original form was something like *kaki. However, our hunch would be better supported if we found many words where L's /t͡ʃ/ corresponds to K's /k/. If we were to find that a lot of words in L had t͡ʃ where K had k, and likewise a lot of words have k in both of the languages, we need to account for that - and an ordered pair of sound changes in one of them is a realistic and simple way of doing that. The fewer the sound changes we need to posit to explain it, however, the better we're doing (due to Occam's razor - don't posit a hundred changes when two suffice, etc).

If ten sound changes has happened, there's factorial of ten orders they could have happened in. That's a whopping 362880 different possible orders, each order roughly equally likely. (Well, some changes may depend on a previous change, reducing the number of possible orders a bit, but still.) The likelihood of two languages sharing ten sound changes in the same order without being in close contact is thus pretty low. (A further caveat, however: sometimes, sound changes do not have effects that make it possible to decide which out of two or even three changes happened first).

So, now that we have considered the benefits of the shared sound change as our measure of similarity, we will go on to try and see where this leads us.

No comments:

Post a Comment