Wednesday, November 7, 2012

Linguistics: How to discuss sounds of languages

I fear this post grew a bit too long. I also notice I have not provided any sources for any of this, but ... the basic structure is first explaining slightly the sounds of speech, then how these relate to letters, then how we actually parse the sounds of speech as we hear them. It is a bit abstract, as I am not really that interested in the more psychological aspect of it. It is a bit difficult to know what bits need to be taken into account at this stage, as I am only trying to introduce the idea of phonology here.

It is difficult to explain why someone is wrong, when what they speak of are things for which the average person lacks a conceptual framework. Imagine talking about number theory to someone who does not know any algebra.

Linguistics is a huge topic. It it also a topic about which a lot of misconceptions are commonly held by the general public. I will not set out to correct all of these misconceptions, as I probably have some misconceptions myself. What is necessary, though, for discussing how Acharya misinterprets evidence, is a conceptual framework for languages. (I almost hear her fans chanting how this is subscribing to a "method", and how that is just brainwashing oneself, conforming to "academia" or whatever.) I will in a number of posts elaborate on the bits of linguistics necessary for understanding what modern linguists think when it comes to questions like which languages are related, what does it mean for languages to be related, and so on.
Ugghh... this means I have three separate series of posts in parallel: Higgins, Linguistics and The Christ Conspiracy.

For readers who already are well acquainted with concepts such as phonemes, allophones, graphemes, the fact that sounds and letters are not the same thing, and so on, this post will not be necessary.

Alas, this easily gets too specific or too abstract. It is difficult to maintain that balance. I will err on the side of too specific. In this post, all phonetic transcription is done using the International Phonetic Alphabet. Other phonetic alphabets exist, and may have advantages in specific fields - the Americanist Phonetic Notation or the Uralic Phonetic Alphabet are examples thereof.

Also, duly note that phonemic theory as presented here is, in fact, outdated. However, it is a good first approximation, and it is way better than thinking that letters and sounds are the same thing. Even now, many grammars and descriptions of languages use phonemes. The replacement for phonemic theory - optimality theory and generative theories of phonemes give benefits in deeper analyses of language. No linguist, however, denies that different languages distinguish different sounds in different ways. 

Speech sounds

A speech sound is a sound produced by the speech organs (mouth, nose, larynx, pharynx, etc), that also can be used as a part of a word in an utterance. Normally coughs, hiccups and sneezes are not considered speech sounds although they are produced by the right set of organs (however, if a linguist were transcribing a language spoken by some kind of alien with a similar speech apparatus, he would do well not to a priori exclude those either until he has ascertained that they are not speech sounds in the alien language). Which sounds are considered speech sounds vary from language to language. The tsk sound that anglophones use in some contexts is not considered a speech sound, as it cannot be part of any word in English. Such sounds that still have some use are said to be metalinguistic. In some African languages, such as !Xóõ, that particular sound is a speech sound.

Speech sounds are generated by modulating an air stream that goes out through the oral and nasal cavities. The most common source of such an air-stream is normal exhalation from the lungs. Some alternate manners do occur though: in clicks, the air-stream is inwards, and caused by enclosing air between the tongue and the palate, increasing the size of this enclosure in order to decrease the air pressure, and opening the front closure slightly so that air quickly streams in. In glottalized sounds, the glottis is closed, and it is either retracted or ejected, such that air either runs in or is pressed out due to the change in pressure. Pulmonary inhalation could also be used for separate linguistic sounds, but it seems unusual - though apparently, !Xóõ has a click where simultaneously, air is inhaled through the nose. Some languages do have metalinguistic uses for pulmonary inhalation though, such as the northern Swedish pulmonary ingressive fricative (an inhaled "sh") that is used to mark agreement with what someone just has uttered. Finally, people with some disorders that make speech difficult can use a gastral air stream (essentially burping) as a source of phonation.

Coarsely, speech sounds are divided into two categories, consonants and vowels. The difference between these two can be found in how they are generated by the speech organs. Consonants have some obstruction somewhere in the mouth that creates some noise, or even a full stop to the air stream. Of the above air-stream types, vowels only really occur with pulmonic egressive airstreams - they are pretty much always the result of exhalation. 

If we start looking really close at two speech sounds that we identify as the same sound, we will probably find some difference sooner or later at some level of analysis. At the very last, we would find such differences if we looked at a large-scale printout of the wave-form of the sound. This is not just the result of our mouths not being good at the kind of precision it would take to perfectly reproduce a sound: in fact, two microphones recording the exact same sound in slightly different spots in the same room are likely to pick up really minute differences due to acoustic phenomena, as well as background noise. By evolution, our hearing can account for this, and our brains hear things as similar if they are similar enough (where, of course, similar enough is not strictly well defined). However, our brains are surprisingly flexible things, and they can learn to spot differences they have not spotted before, and they can get better and better at this. We group together sounds that are similar, but different communities may have grouped together different sets of similarities.

Speech sounds are called phones, and any actual phone may very well be unique in its quality. At the very least, its wave form if we made such an excessive analysis would be unique (even though there are multiple such recorded wave forms we could look at for any single uttered sound, depending on where we put the microphone). The abstraction of very similar speech sounds is called a phone, so we can speak of, e.g. the phone [ɯ̟͜ɨ̹˞̘], which would be a diphthong with uselessly many specific features, possibly too many for most phoneticians to bother to note. A line has to be drawn somewhere, - map vs. terrain, really, the linguist attempts mapping out an utterance, not recreating it in every detail - and a reasonable place to draw that line is where a specifically trained professional can just about distinguish a difference. So some precision is lost at this point, and how much precision that is permitted to go lost is really up to the requirements of the moment.*

We will return back to speech sounds in a bit, but first, we will look at a thing speech sounds are not.


A letter is ink on a paper (or pixels on a screen) or indentations and protrusions on a surface, and thus of quite a different nature from the above speech sounds. Letters are conventional things - that is, there is a convention as to what sound is represented by what letter**. However, written language tends to be conservative compared to spoken language. Spoken language lives mainly in the present, although with the appearance of sound recordings, television and sound reproduction in general, spoken language has attained some kind of lasting existence as well.

A letter by itself need not actually signify a sound - in many languages, the orthography does not represent the sounds in a trivial manner. Let us say we take an anglophone newspaper, and cover the text entirely with something, such that we can only see one letter at a time. We pick a random letter. Let us say we come across an <s>. (Convention is, in some linguistics texts, that transcription in a native script is marked by smaller than and greater than arrows, if there is need to mark it.) We do not actually know what sound this represents! What if the next letter is <h> or <c> or even, in some circumstances, <i>? What if we first find an <h> - we have to check if the previous letter is <p>, <t>, <c>, <s>, or <r>, a word boundary or just a vowel. <c> likewise obtains different sounds depending on the following vowel or consonant. The vowels are even worse! Compare the pronunciation of bit and bite, rat and rate, sit and site, ...

There is no one-to-one correspondence from the letter on the paper - or even from sequences of letters on paper - to the sound we make when reading the text in English. This case also obtains in other languages; generally, the older the orthography is, the less the correlation. Is there perchance at least a one-to-one correspondence from the sound we make to the letter on the paper?

Let us consider the word filter. This is pronounced exactly identical to the word philter. (Minor nitpick: filter can be used as a verb as well, which as far as I can tell is not the case with philter. When used as a verb, intonation patterns are likely somewhat different, and thus the careful listener may spot tiny differences between filter as a verb and philter as a noun.) So we have two identical sounds in the onset of the word, one spelled <f>, the other <ph>. Hence, in English, there is neither a one-to-one correspondence from sound to writing or from writing to sound.

Clearly, sounds and letters need not correspond very well. Some languages do have relatively straightforward orthographies, especially if few sound changes have occured since the orthography was established, or if the language has few morphophonological*** processes. Now, the point I am making is definitely not that English orthography is bad or needs reforming or any of that; it is just a convenient illustration of the fact that written language is not the same as spoken language, although there clearly are strong affinities between them in some ways.

Mark Rosenfelder argues that English orthography mostly is fairly predictable and has a relatively clever system for encoding its many vowel qualities with a way too small set of orthographic vowels. His illustration of this principle using an algorithm to predict pronunciation is convincing.

Important point: the sounds we make are not letters, and the letters we write are not sounds. (However, in very many languages, there are relatively straightforward correspondences between the two.)


When speaking, we apparently make a lot of varying sounds as already noted. We do not distinguish all of these noises, as I said, we can probably not even exactly reproduce the same noise twice. So, as noted, just as in phonetic transliteration, when hearing and perceiving language we do something that maybe best can be compared to rounding off in maths. 

If we imagine that the sounds we produce are on a three-dimensional grid, and are given coordinates such as [3.571, 2.718, 1.095], [8.103, 9.772, 6.135], we could make a somewhat bad analogy and say we actually perceive these as [3.6, 2.7, 1.1], [8.1, 9.8, 6.1]. However, we do not all round exactly the same, and it seems we do not even have the same scale - and furthermore, our scales are all warped in different spots. How we round - and where our scale is more sensitive and less sensitive - depends on the distinctions we need to spot to be able to keep up with people in our surroundings. Most Anglophones will have scales that are warped roughly the same, most speakers of Finnish will have scales warped roughly the same, etc. 

It seems as though acquiring new distinctions later in life may be a bit more difficult, but it is possible given time and effort. A person can have multiple differently warped scales! (The likelihood that the different scales affect each other is great, though, which is the reason that if bilinguals are common in a region, both languages in that region may approach each other as far as pronunciation goes.)

What is a phoneme? To be abstract, it's a set. It is a set of phones. Let us consider /k/. This phoneme appears in words such as cool, cat, kite, accrue, kit, uncle, ask, acute. You are very likely to agree that those k-like sounds all are the same sound. If we look and listen a bit closer, we will notice some differences though. In cool, cat and kit, the /k/ is followed by a slight puff of air. This is called aspiration, and is marked with a superscript h in phonetic transcription, [kh]. In some languages - Mandarin, Hindi, etc, that sound is perceived as distinct from k, and in fact, some dialects of English are pretty close to that situation as well.

The difference in English between hard /g/**** and /k/ is that in /g/, the vocal cords vibrate. This is called voicing. Try pronouncing good and could with your fingers on your throat, and you will probably notice the difference. A sequence of just /g g g g/ and /k k k k/ may also help illustrating the difference.
In some languages, g and k are not distinguished, but would be transcribed phonemically as /k/. Even further, in some languages /k/  and /kh/ contrast, but /k/ contains both the voiced [g] and voiceless [k]. Hence we can see that which differences we notice in language is an issue of what we have learned to notice, and which differences we make use of is a direct result of that, and which differences we will notice is a result of which differences people in our environment utilize (or for most people, which differences people in our environment utilized when we were kids).

Further, we can notice that the k-sound in cool has a slight lip rounding, which in IPA is marked as [kw]. Since this particular instance is also at the beginning of a word, we also pronounce it with aspiration, thus giving [khw]. This rounding is a result of coarticulation - in articulating the /k/, we are already preparing the articulation of the subsequent vowel, and thus round our lips a bit in anticipation of it. The more general idea of such coarticulation and assimilation is an important process that explains many sound changes over time as well, as features bleed from one sound to surrounding ones. In kit and acute, the k is slightly advanced along the palate, due to the subsequent sounds (the front vowel [i], or the semivowel [j], normally represented as <y> in English) being somewhat further towards the front of the mouth. In some languages this fronted k would be distinct from the non-fronted k. I will not tell you the diacritic to mark such fronting, but will let interested readers find out for themselves. (Any good book on phonetics should help, but there is a significant amount of that available online these days as well.)

These different sounds which we parse as being functionally the same are called allophones - the sound we perceive them as is the abstract phoneme, and the allophones are allophones of that phoneme. A common misunderstanding among people with a basic knowledge of this topic is that one specific sound is the phoneme, and the other sounds are the allophones, as if [k] were the phoneme and [kh], [khw] are allophones. This is not necessarily the case - all the sounds that actually occur are allophones, and they are thus realizations of phonemes, not alternatives that appear instead of the phoneme. English /k/ is {[k], [kh], [khw]...} to use a slightly maths-inspired notation. (It is possible that one allophone will be a primary allophone, though. Phoneticians and phonologists that subscribe to that idea may describe the primary allophone as the phoneme, or so I am told by friends who have read papers by such phonologists.)

How do we know what phonemes a language has? It's easy to be mislead when trying to learn a new language without good teachers. Let us imagine a speaker of Ostrobothnian Swedish, having the following set of phonemes:
/i a e u o y ö ä i: a: e: u: ʉ  ʉ: o: y: ö: ä: p t k b d g f v s ʃ (t͜ʃ) ʈ ɖ ɳ ʂ m n ŋ ɭ l r h/. Let us imagine this person sets out to learn Georgian, and is not told anything about phonology, and has a highly naive view of language in general.

Georgian has the following phonemes, roughly:
/i a e u o b d g p t k p' t' k' q'  ʃ t͜ʃ   t͜ʃ' z s r l v ʒ ɣ x d͡ʒ h/
The Ostrobothnian will probably quickly learn to distinguish the vowels - although it is possible some of the Georgian vowels cover a slightly different bit of vowel space, and he may thus misassign some /e/ to /a/ or vice versa, and so on. The four consonants p' t' k' q' are likely to cause a major problem though: audibly, they are not that distinct from Swedish p t k (except q, which is further back towards the throat. Some may still just hear this as just another k), so he is by and large not likely to distinguish them. The voiced fricatives and affricate z, ʒ and d͡ʒ as well as the velar fricatives x and ɣ may also cause problems (unless he also knows French or English well enough to have acquired a passable version of their phonemic systems). So, it is quite probably that the Ostrobothnian naively would think of Georgian as having (sounds enclosed by parentheses are possibly identified by the Ostrobothnian, but not certainly so)
/i a e o u b d g p t k ʃ (t͜ʃ)  (z) s r l v (ʒ) (ɣ) (x) (d͡ʒ) h/ - an interpretation of what he has heard which drops several distinctions. How do we know that this kind of mistake does not occur when linguists study languages for which the phonemic system is not previously known?

An important tool in finding whether two sounds are contrastive is called minimal pairs. We find two words that are almost identical. Say, "see" and "sea". We utter it in isolation and ask native speakers what word I said. If with greater than random chance speakers can identify which one, some sound in them differs. If not, they are homonyms, and thus not differentiated by any sound. We find that ran and ram are minimal pairs, differentiated by m vs. n. If we however produce a "mispronounced" word, by replaced a sound with an allophone of the same phoneme, say, we pronounce cat as [kat] rather than [khat], pretty much all speakers will identify it as the same word.

Minimal pairs are not the only method, but it is a good start. I will not get into the nitty-gritty here of phonemic analysis, I am happy if the reader comes away with a basic understanding of what phonemes are. More concise descriptions can be found, less concise ones as well. Any introductory linguistics textbook should deal with this, if the reader is confused about something or is interested in finding out more or checking whether my description is roughly accurate.

Important point: the phonemes of one language and those of another are seldom the same. Even if we give an identical list of phonemes for two languages, it is possible the range of actual sounds made in uttering words using those sounds may vary, as the phonemes cover differently sized and shaped chunks of the possible phonetic space.

These things mean that English /k/ and Russian /k/ are not exactly the same sound, although some overlap may occur. They also mean that the English letters and the English sounds are not the same. These facts also give us a way of speaking of the sounds of a language independently from its letters - we label them using some phonetic alphabet and give each phoneme its own somewhat arbitrary label, often corresponding to a relatively simple***** symbol in the phonetic alphabet that is in the set of sounds the phoneme is realized as.


* In a paper on the syntax of some unwritten language of, say, Papua New Guinea, an exact transcription is not necessary, as the phonology is not a prime concern. In a paper on the allophony of the stops of that language, the transcription used would be a fair deal more precise. In a paper trying to figure out phonemes the language distinguishes and more general rules of allophony, a very precise transcription may be used for some parts, and once a phonemic inventory has been obtained a phonemic transcription would probably be used to test whether it suffices in conjunction with allophonic rules to predict the phones produced by speakers in various contexts. If you need crazy accurate records, transcription probably is not the main way to go, but actual audio recordings. In such cases, the transcription would still be helpful.

** Admittedly, words are also convention-based things: there's a convention between anglophones that the word 'cat' signifies a certain type of mammal, etc.

*** Morphophonology designates those processes where a sound change is part of the formation of morphological forms. Morphology itself pertains to the forms of a word, e.g. man-men, house-houses, rat-rats, swim-swam-swum-swimming, jump-jumped-jumped-jumping, ... An example of morphophonology at work in English is morphophonologically triggered voicing in words such as leaf-leaves, wife-wives, house-houses. In some languages, these processes can be complicated.

**** Note that having soft and hard g is not a universal; I specify "hard" g to mark that I am not speaking of the sound in germane or gin, but of that in good and gross.

***** By simple, I mean nothing more fancy than 'the most easily physically writeable or typeable symbol that is close enough'. 

No comments:

Post a Comment