Friday 30 December 2011

The importance of apostrophe's

Yes - I know. It's a deliberate mistake.

Anyway, this is a screenshot from something on TV (via Tony Tooke):


As you can see, this shop offers rabbits eggs for 2/4 (that's two shillings and fourpence) a dozen.

I'm joking, of course, clearly there is no such thing. They can't possibly be selling them because if they were, it would read rabbits' eggs. Of course. I mean, the ability of grocers to use apostrophes appropriately is widely known and noted.

Wednesday 21 December 2011

Bad signage again

I was in Durham for a conference at the weekend and saw this sign on the room where some of the talks were held:


It advises that no food is to be consumed or taken into the lecture room. But if we followed the instructions is gives accurately, we'd starve to death. 

Co-ordination does cause problems, and it's one of those things that people actually aren't very good at getting right. Ewa DÄ…browska, who gave a paper at the conference, has shown that adults, especially those with lower levels of education, can do very badly on tests involving passives and quantifiers. Given sentences like every dog is in a basket or every basket has a dog in it and pictures to match up, they get it wrong a lot of the time. Co-ordination is a different thing, as it's a purely grammatical task rather than a meaning comprehension task, but as I say, people get it wrong a lot. 

So if you're trying to co-ordinate something, the rest of the sentence needs to be grammatical with respect to both parts. If you say Joseph and Mary travelled to Bethlehem, then it is grammatical to leave either co-ordinated part out: 
Joseph and Mary travelled to Bethlehem.
Joseph and Mary travelled to Bethlehem.
Let's do the same with the sign. We need to identify our co-ordinated phrases first:
No food to be [consumed] or [taken into the lecture room].
Now the deletion:
No food to be consumed or taken into the lecture room.
No food to be consumed or taken into the lecture room.
Oh dear. Now we have been instructed not to consume food. At all. Let's try re-bracketing our constituents. Perhaps what's co-ordinated is this:
No food to be [consumed] or [taken] into the lecture room.
Let's try deletion again:
No food to be consumed or taken into the lecture room.
*No food to be consumed or taken into the lecture room.
Oh dear. Now it's not grammatical. We can't say consumed into the lecture room. Another co-ordination fail - add it to the list. I don't know why people don't just keep it simple and say Don't eat in the lecture room

Monday 19 December 2011

Astonishingly bad scholarship from Creationists

Before I begin, this post is about bad scholarship. It's not an attack on Creationism (although I do disagree with every single thing about Creationism, that's not what this blog is about).

So another thing I was reading in that Dawkins book struck me. (It's been a mine of interesting facts - he writes interestingly but he doesn't always realise his aim, I don't think. He's looking to prove evolution to doubters, and so I've been reading it as if I were such a person, and there's not a hope in hell of me being convinced. He goes into great detail in some parts and then in others skips over vital facts. If I were an evolution-denier I'd be seizing those parts with glee.)

Anyway, he mentions that in The Blind Watchmaker he gives this line:
It is as though [the fossilised remains of lots of major animal phyla in the Cambrian era] were just planted there, without any evolutionary history.
Oh, Richard. Really? How naif of you not to realise how foolish that was. Anyway, so he says that Creationists have quoted this line many times, without quoting the following line:
Evolutionists of all stripes believe, however, that this really does represent a very large gap in the fossil record.
This is an example of how you must NOT do quotations. In my job as a writing tutor I see lots of students unsure of how to quote people properly. I might use this as a how-not-to-quote lesson. The absolute golden rule is that you must not misrepresent anyone's words. So, for instance, it's not OK to give this quotation:
Bailey (2010) argues that "Dairy Milk is... the best chocolate bar"
if this is the original:
"Dairy Milk is not the best chocolate bar"
And likewise, it is not OK to quote a sentence out of context if doing so would cause it to be interpreted to mean something other than what the author intended. It's quite clear here that Dawkins did not mean to say that the fossils have no evolutionary history, and so he shouldn't be quoted as appearing to say that. Not all Creationists would condone such behaviour, of course, and it's not only Creationists who do bad scholarship. But this is really an extreme example of staggeringly bad form.

Friday 16 December 2011

They almost literally never collide

Richard Dawkins said (in The greatest show on earth) of starlings,
They almost literally never collide. 
It struck me as an odd word order to use. Surely it should be
They literally almost never collide.
What's modifying what here? Well, in Dawkins' sentence (let's label it A), almost is modifying literally:
They [[almost literally] never] collide
He means that they rarely collide; he can't say that they literally never collide, but never is used in an almost literal sense here. OK, makes sense.

Or is it that almost modifies literally never?:
They [almost [literally never]] collide
That second one seems odd to me. I don't think it's possible for that to be the structure - I can't even quite see what it would mean. So it's the first then, with literally qualified by almost, and the whole constituent modifying never. The birds, in their massive flocks, are very good at not colliding. It's not quite true to say that they literally never collide, but they almost literally never do. Ignoring the fact that this is a slightly hyperbolic and quite superfluous use of literally, it's perfectly sound.

But if he had written sentence B instead, the modificational relations are clear: as we can't say that literally can modify almost, literally modifies the constituent almost never.
They [literally [almost never]] collide. 
Literally is used, as it so often is, for emphasis: he means they almost never collide, and he means that literally. He's not exaggerating. They really do almost never collide.

Syntactically, both are OK. Stylistically, I know which I prefer.

Wednesday 14 December 2011

Lucky me

I'm back on facebook. I finally made the decision to go back on Tuesday, after a gruelling start to the week made me realise it was the right thing to do.

First, on Monday I had my first meeting with my supervisors in MONTHS. It was a scary prospect, actually, even though they are lovely people and not scary at all. I was suddenly forced to face up to the fact that I had NO work to show for the whole summer and no excuse for that. This led me to conclusion 1: Being off-facebook since June has not increased my productivity.

Then a series of nice things happened to lead me to conclusion 2: Social support will help me finish my PhD.

Before the meeting I took advantage of Friend M who very kindly allowed me to whinge to her and comforted me.

After the meeting I had lunch with a bunch of nice people for Friend A's birthday, after which Friend A sent me a really nice text message telling me to keep my chin up and all would be well.

The next day I had lunch with Friends K and D, during which they took my mind off work, and Friend K reassured me and Friend D made me laugh.

And I would just like to add that I have the best supervisors ever. My first supervisor sent me a nice email on Monday night to remind me that I'm totes the most awesome ever. Or words to that effect. I think it was actually that my research area is cutting edge, but whatever.

Saturday 10 December 2011

Chinese 'yellow, gambling, poison' on Language Log

What on earth is the meaning of this sign?


I had no idea, so I asked just that question of Victor Mair over at Language Log, and he obliged by answering it speedily and fully.

Friday 9 December 2011

Language trumps race

Young children see language as a more salient factor than race in determining identity, it seems.

A study from the University of Chicago asked groups of children aged 5-6 and 9-10 to do a matching task. They were shown images like the one below and asked which adult the child would grow up to be.
I think you and I would pick the man on the left, as we know that it's fairly likely that a white child will grow up to be a white man, and skin colour changes in later life are relatively rare. OK, he's speaking a different language, but maybe he moved to France or whatever. The 9-10-year-old kids agreed with us and matched the same pair.

But the younger children matched the boy with the black man, who speaks the same language as the child. How odd. Katherine Kinzler, the lead author of the study, says:
From a child’s perspective, language offers many of the characteristics of a biologically determined or inherited category. Children usually speak the same language as their families, and they likely do not remember the time as infants that they spent learning a native language.
They've found that accent and language are really important to kids (which I guess we knew) and this shows that a person's language is even more important than their race to a young child. So, would this be different for black Americans, to whom race is important because they early on have to learn that lots of people are racist bastards? Yes, it would. When groups of black children were given the same test, the younger ones matched according to race, not language, just like the older children. How sad. 

Wednesday 7 December 2011

The perils of international trade

I read a really nice piece in the Bangkok Post this week (yeah, I'm global). It was written by an Australian, as far as I can gather from the textual clues, who has been living in Thailand for 22 years. This, I think, makes him qualified to comment on Thai matters with an authoritative outsider's point of view. His article concerns the pronunciation of companies when they open branches in Thailand. Ikea is the business he discusses, and how many Thai people have begun saying it as 'ickier'. The writer suggests a possible reason for this:

The answer lies in the ''I'' at the beginning of the word. In Thai, a short sharp ''I'' is a derogatory way of describing somebody. If you didn't like me, for example, you can call me ''I-Andrew'' with a scowl, though not to my face because it's very rude. I am guessing Ikea sounds like a rude way of referring to a person by the name of ''Kea''.


But some companies have fared a lot worse in the name-mangling stakes. Volvo, for instance:
First, there is no ''v'' in Thai, so they replace it with a ''w''. Second, the final sound of a syllable in Thai cannot end in ''l'', so ''Vol'' becomes ''Wonn''. And so Volvo becomes Wonn-wor.
Thai is, actually, a language with a very different phonology from English so pronunciation is hard in both directions. My Thai friends are too polite to laugh at me when I try to do it, but they can't quite hide their amusement.

Tuesday 6 December 2011

Learnt vs learned

Mr T likes to get his grammar and spelling all correct when he writes facebook posts, so he checks things he's not sure about. Last night he suddenly realised he didn't know what the difference was between learnt and learned (as the past tense form of the verb learn). He googled it, and after a few dictionary entries, one of my posts was on the first page of hits. He was quite frankly astonished: of all the blogs in all the world, he had to google across mine.

I'm really pleased about that, but surprised. Does that mean that it's only me using the word learnt? I can't believe it's just me and the dictionaries. It looks like it's used more in the UK than in the US, which would explain a big lack of it on the internet. Still, there must be other British bloggers out there. For the record, either is completely acceptable and there is no preference for either in terms of formality or 'correctness'. I use learnt mostly just because that's what's in my idiolect. But if I were to justify it, well, I like it better. I like irregular verbs, I think they're pretty. And they often have something to reveal about language.

Steven Pinker, in his Words and Rules, notes the unusual fact that the 'default' (i.e. regular) plural suffix in German is -s, despite the fact that it's nowhere near the most common plural marker. He argues this on account of the fact that it's the suffix chosen for nonce words and proper names, which aren't normally pluralised. In English, although we spell our regular past tense suffixes -ed, there are actually three: /d/, /t/ and /ɪd/. I wonder if one could argue that the default one is /t/, based on the fact that it surfaces in such forms as learnt, dreamt and sent.

Furthermore, if we stopped using these irregular past tense forms, we'd lost interesting facts like this one: dreamt is the only word in English to end in the letter combination mt (apart from derived forms like daydreamt).

Monday 5 December 2011

Kitchen tells you how to cook in French

There's an article in the Guardian about a kitchen that talks to you in French and tells you how to make French dishes. I've actually seen this kitchen, as it's at my university. I haven't used it but I understand it's very impressive. It also has accelerometers in all the equipment (the things inside an iphone) so it knows if you've stirred onions when it told you to chop carrots. Brilliant idea, and I hope it become more widespread.

Thursday 1 December 2011

Very bad Latin joke


A gentleman, having ordered a meal at a fine London restaurant, decided that he would like some wine to accompany his meal. So he summoned the wine steward and asked for a bottle of hock.


"Hock, sir?" asked the steward.


"Yes, hock, man. You know: hic, hunc, huius, huic, hoc."


"Hmm ... very good, sir."


The food duly arrived, but without the wine. This perturbed the gentleman slightly, as he was accustomed to a higher standard of service. He began to dine, and at the next opportunity he beckoned the steward again.


"Didn't I order a bottle of hock?"


"Yes, sir, you did - but then you declined it."


(From this extremely comprehensive and knowledgeable page of Latin phrases.)

Tuesday 29 November 2011

When does waste yeast become Marmite?

There's been a Marmite spill on the M1. This has led to many, many jokes about sending soldiers to clean it up, etc. The way that Independent article reported it though was a bit odd.

Now, for what follows you need to know that Marmite is a spread, and its generic name (i.e. what the supermarkets have to call their own brand) is 'yeast extract'. It's apparently similar to Australian Vegemite, but not to NewZealand Marmite, which has a different flavour, according to Wikipedia. It is a very dark brown, sticky goo and has a very strong flavour. I can't really describe the flavour but it's what they use to make roast beef flavour crisps. It's basically pure umami and I love it, but their slogan in 'Love it or hate it' because some people can't stand it. My mother won't have it in the house. At my grandma's 90th birthday party earlier this year, me and my dad and my uncles and cousins were all standing around eating Marmite on toast and I think my mother left the room and started thinking about disowning us.

So, what the Independent said was this:
A large-scale clean-up operation is under way after a tanker carrying more than 20 tonnes of yeast extract - believed to be Marmite - overturned on a busy motorway.
So it is probably Marmite, OK? But later in the article they quote the South Yorkshire police spokeswoman:
We were called at 10.15pm yesterday to reports of a tanker, which was carrying 23.5 tonnes of waste yeast, overturning.
Now it's waste yeast? Marmite was originally a by-product of the brewing industry. This means that technically, it's not much different from waste yeast. But it is different, in the crucial sense that you can't just eat waste yeast, and I imagine that there is a fairly complicated (but secret) manufacturing process to obtain one from t'other.


Newspaper articles often end up using a lot of synonyms or near-synonyms because they value variety of expression over clarity. This can lead to phrases like 'the busty blonde, 23' or 'the former banker' which, while adding extra information, are often superfluous. I think that's what's caused the problem here. But waste yeast and Marmite are NOT synonyms. As Lynneguist said over on Twitter:
Britain, a land where 'waste yeast' and 'food' can be considered synonyms
Anyway, to finish all this up, here's Nigella's Marmite spaghetti recipe, which is surprisingly delicious (though you do have to like Marmite; it's not going to convert any haters):

Cook spaghetti.
Melt 50g butter and Marmite (as much as you like - a big spoonful or so) and add the drained pasta. 
She suggests serving with parmesan - we didn't have any and it's fine without, but adding it would add to the immense umami overkill. 

Monday 28 November 2011

Fenton the dog. Or Benton the dog.

A video has gone viral of a dog chasing some deer in Richmond Park in London. It's quite funny, I suppose, though it's odd the things that catch people's attention.



Anyway, everyone thought the dog was called Benton, which is I suppose a reasonable enough name for a dog, to call him after Eriq LaSalle's character in ER. My goldfish is called Steve, named after Miles Davis. But it turns out that the dog is called Fenton, perhaps named after Alvin Stardust.*

That reminded me of the McGurk effect, which is an amazing illusion that illustrates just how bad we are at hearing stuff right when our eyes tell us different. Watch this clip from BBC's Horizon: Is Seeing Believing? (from earlier this year, I think). It illustrates it quite nicely.



Weird, isn't it? Maybe the dog didn't come back when he was called because he heard Benton as well.

*OK, that got a bit surreal there for a moment. So, my goldfish is called Steve. We did have two, called Miles and Davis. Miles went to the big fishbowl in the sky, and we felt we really couldn't be left with a fish called Davis. So we started calling him Steve, as in Steve Davis.

The Alvin Stardust/Fenton thing is an interesting story. The man later known as Alvin Stardust had hits in the 60s as Shane Fenton (and the Fentones). However, that's not his real name either. He was born Bernard Jewry, and was hanging about with a band, Shane Fenton and the Fentones, who were waiting to hear back about a demo tape they'd sent to the BBC. But then the original Shane Fenton (also not his real name) died, aged 17. The band were going to break up until suddenly they got the call from the BBC they'd been waiting for. Fenton's mother asked them to continue, in memory of her son, and Jewry became the new Shane Fenton. Apparently Fenton had chosen the name because it sounded 'American' and therefore (at that time) cool.

Saturday 26 November 2011

Posthumous reform

I got an email today telling me this:
On Saturday 26 November 2011, Seaham Harbour's A Woman of No Importance will reform to play a special show at Newcastle Arts Centre, in aid of Macmillan Cancer Support. The band will perform their posthomously released compilation album, 'AWoNI' (fakeindielabel, 2007) in it's entirety, in order.
I'm intrigued as to what tragedy could have wiped out the whole band, but what I really want to know is how a band can possibly reform and perform their posthumously-released compilation album.

Friday 25 November 2011

Who are 'Asians'?

So, there's this book. (You can't Click to look inside!, that's just part of the image. Go to Amazon if you want to do that.)

I'm not going to review it because I haven't read it (it's only on Kindle and I don't have one, as you know), and it's not fair to review a book based on the publisher's blurb. Let me just say that I'm deeply sceptical about its central claim, which is
that "human cognition is not everywhere the same"-that those brought up in Western and East Asian cultures think differently from one another in scientifically measurable ways. Such a contention pits his work squarely against evolutionary psychology (as articulated by Steven Pinker and others) and cognitive science, which assume all appreciable human characteristics are "hard wired." 
Of course there are cultural differences, and they may well be measurable, but I really don't think that this challenges the idea that all human cognition is basically the same.

[Update: I found this blog post with the text of an article submitted to Cognitive Linguistics. It reanalyses Nisbett's work and reckons it's due to linguistic differences rather than cultural differences, as Chinese people respond differently from Japanese people in tests. For instance, head-directionality, they claim, means that Japanese people mention context first, whereas American people mention salient information first.]

Anyway, like I say, I'm not reviewing it. I'm quibbling about a decision (by the publishers?) to keep the subtitle that was used in the US edition. It's this:
How Asians and Westerners think differently... and why.
I'm reasonably comfortable with the use of catch-all expressions for broad ethnic groupings; sometimes it's necessary. So 'Westerners', for me, means North Americans and West/Central Europeans, and probably also Eastern Europeans as well nowadays. As far as I can tell from the blurb, that's pretty much what the author intends:
those brought up in Northern European and Anglo-Saxon-descended cultures
But 'Asian', to me and to most UK English speakers, means people who look a bit like this:

That's Nihal, a well-known DJ on BBC stations Radio 1 and Asian Network. This differs quite a lot from what most US speakers mean by 'Asian':

Presumably the publisher knows about this difference. And presumably they didn't put the subtitle on the front cover for this reason. And throughout the blurb, it refers to 'East Asians', which is presumably a concession to this difference in the meaning of the word. But still, it makes me think the whole book is equally as careless (I'm sure it's not) and puts me right off reading it.

(Actually, we really could do with a better term than 'East Asian'. 'Oriental' doesn't seem to cut it these days, even though it just means 'eastern'. I'm fresh out of ideas though.)

Wednesday 23 November 2011

Measuring impact

These days researchers are asked to assess the impact of their research. It's not good enough to say that you're doing it out of a quest for ever greater knowledge, or because as humans we should strive to find out how our world works. It also isn't going to wash if you point out the great amount of mathematical work that seemed to have no practical application when it was first done but now underpins and is vital for absolutely everything in our day to day life (computers, for instance - the principles behind them preceded the technology by centuries).

So we need to know how to measure impact, or at least how to convince the funding bodies and the REF that we are doing stuff with impact. Even though we don't need to worry about the REF yet, as PhD students, it's wise to get to know about this stuff early.

With that in mind, here's a link to an LSE site with podcasts on just this, aimed at us folk. There are also other resources on the site, and they are holding an event on the 1st December for PhD students.

Tuesday 22 November 2011

Penelope Keith and the stress pattern of English

In this week's Radio Times, the actress Penelope Keith gets worked up about the pronunciation of certain words: 
If I hear 'lamentable', she says with a shudder, 'or worse, 'irrevocable', I want to get a brick and throw it at the wireless. We have to keep screaming[...] because if we don't, this kind of this will become current.
Disregarding (or 'irregardless', if you prefer - it would undoubtedly annoy Penelope) the fact that it's already current, of she wouldn't be hearing it on 'the wireless', what's her problem?


Well, I don't have access to the OED here but Dictionary.com tells me that it is pronounced with the stress on the first syllable. In fact, Merriam-Webster gives it with the stress on the ment syllable, and stressed on the first as an alternative. Clearly, this is an old and well-established pronunciation. Still, Penelope Keith doesn't like it and others probably feel the same way. This is just yer basic peeving and not to be worried about.


But what interested me was that it actually seems odd to pronounce it the way she would like. English generally, in long words, puts the stress on the antepenultimate syllable:
an.te.pe.'nul.ti.mateex.tra.te.'rre.stri.al'fru.mi.ous 'ban.der.snatch
Not always, of course, there are exceptions:
ar.che.'ty.palpho.to.'gra.phic'ca.ter.pi.llar
I don't know enough about this kind of thing to know what causes these to be different, but I suspect it's something to do with the morphology and compounding involved in creating these words. But for lamentable and irrevocable, it seems absolutely natural to follow the pattern and stick the primary stress on the ment and voc bits, not least because we have la'ment and re'voke, although of course stress often changes when words are inflected (cf. pho.'to.gra.phy and pho.to.'gra.phic above). 


So why would we expect the pronunciation preferred by Penelope? I don't have an answer to this one; it's a genuine question. Answers on the back of a postcard (or in the comments). 

Saturday 19 November 2011

Rules of language vs prescriptivism

I blog this blog over at Tumblr as well, as there's a linguistic community over there and Blogger is so uselessly hopeless at having any kind of interaction between bloggers. There was an interesting post recently from Lesserjoke, which I replied to.

He was asked this question:


Question: Hey, I’m just wondering- and totally not in a sarcastic/condescending way- from a linguists perspective, if there’s no “wrong” usage of words or grammar, why have rules at all? Are there any that matter? Just wanted to get your views on it.

And this was his reply:

Answer: Linguists really do vary, and most are not as pigheadedly anti-prescriptivist as I am. =) But, from my perspective, we don’t need rules at all. English survived for quite a while before people started writing down the rules to it, and there are many societies around the world still today that don’t actively enforce linguistic rules.

There’s a huge pressure on people who directly interact to understand one another. If X and Y are going to communicate to each one’s benefit, they’re going to need to be able to successfully pass messages back and forth. And when you expand that to an entire society, the principle remains the same: the language of people who are forced to communicate naturally converges to the point of understandability, without the need for actively prescribing rules.

Due to that pressure, most variation within a language is just statistical noise: it’s interesting, it can teach us a lot about the principles of grammar, and I would even say it’s beautiful… but it’s so minor that it doesn’t get in the way of comprehension. It’s really rare for two speakers of the same language to truly not be able to understand each other.

And if that were to happen — if, without the active enforcement of grammatical rules, a formerly common language begins splitting apart… who cares? Historically, that’s happened plenty of times. The various Romance languages all descended from dialects of Latin, Old English branched away from Old Germanic, and so on and so forth. Languages split when that social pressure goes away: when one population of Old Germanic speakers no longer are interacting enough with the others to need to maintain cross-group intelligibility. It’s a perfectly natural linguistic process, and it almost doesn’t make sense to stand in its way. If we need to understand one another, we will, and if we don’t, what’s the point of making sure we can?

So that’s my answer! The explicit enforcement of grammatical rules is unnecessary and only serves to unfairly shame speakers of nonstandard variants. If we just let the invisible hand take care of it (the way many societies have done and continue to do today), an equilibrium of necessary intelligibility in language would soon be achieved.

I thought that was broadly right but overlooked a fundamental aspect of language, which is that it is strictly rule-governed. I agree that we don't need to explicitly enforce the rules, but speakers of language enforce them themselves without outside interference. I replied stating as much, as follows:

A crucial point is the difference between prescriptive rules and the principles of grammar that underlie language. I agree with the above regarding the explicit enforcement of ‘rules’ - but these are the little things, the things that vary in non-standard usage and so on. 
The fact there are asymmetries that hold across every language tells me that there are underlying principles (rules) that form the structure of language. Variation is on top of that and provides the difference languages that we see. For example, there is no language, not a single one, which has the opposite of V2 (i.e. that places the verb in the penultimate position). There are loads of these facts and they tell us that there must be some kind of rules. 
And furthermore, although speakers can communicate even when there’s a lot of variation, there are limits to what speakers will produce and judge grammatical. It’s basically the difference between saying that you can’t say ‘I done it already’ and saying that you can’t say ‘already it I’ - a speaker may well say the first and a prescriptivist would rule it ‘wrong’, but no speaker of English would produce the second. 
These are examples from syntax, but we can look at phonology too. We can say that it’s fine to pronounce the vowel in ‘grass’ (there’s a massive difference between the north and south of England on this one) in two wildly different ways, and both are fine and understandable. One might be judged wrong by certain people, but as long as both say ‘grass’, it’s not wrong from a linguistic point of view. But it’s simply not possible for an English speaker to pronounce a word [rgas]. It’s against the rules of the language - not the ones someone made up, but the real, natural rules that underpin the structure of the language. 
So I would say we DO need rules; we don’t need prescriptivists because speakers enforce the real rules themselves naturally. This doesn’t preclude language change, because the rules can change, but at any one time, the language is stable enough for us to communicate. 

Friday 18 November 2011

Linguistics in the news - really!

Normally when I write of linguistics being in the news, what I mean is that there is a news item with a linguistic angle that I can write about, or that some news item is about language and I can discuss the 'proper' linguistics behind it. This time, the BBC has attempted a genuine linguistics item.


Wednesday 16 November 2011

Kate Bush, Eskimos and snow words

It's a well-known linguistic myth that 'Eskimos have [insert high number here] words for snow'. This has been conclusively shown to be stupid*, and I think a lot of people now know this. But it's still a nice little 'factoid' and Kate Bush has made good use of it in her new album, 50 words for snow. Via Language Log, which documents this sort of thing, I found out about the album and now a link to listen to the song online.

Ben Zimmer at LL has put the link, together with the lyrics, online in a nice blogpost on the topic. The title song features Stephen Fry speaking the titular 50 words (English words and phrases, not Eskimo - although there is a Klingon one) and the results are really quite beautiful. The words are a mix of nice-but-nonsense and witty, like blown from polar fur, spangladasha and icyskidski.

*For many reasons, argued persuasively by Laura Martin some years ago. For instance, what do you mean by Eskimo? It's kind of a blanket term for a group of languages. What do you mean by 'word'? That family of languages is polysynthetic, which means there's a heck of a lot of affixes and you can make many words from a single root. In fact, a single 'word' can actually be a whole sentence, making the number of 'words' presumably infinite.

Tuesday 15 November 2011

Chomsky by Chomsky

I was very pleased to hear a linguistics-Chomsky question on University Challenge last night, so in celebration I thought I'd blog this great infographic. It presents the generative ('Chomskyan') view of language acquisition in a nice smart postery-type way, as an imaginary class on the topic.

Here's the image, or click the link to see it at the original site (worth doing as there's more to look at there).


Friday 11 November 2011

An hypothesis or a hypothesis?

I stumbled across the phrase an hypothesis in a book yesterday and it gave me pause. Surely, I thought to myself, that can't be right? I never use an with words beginning with the [h] sound myself, but even if you do, I thought, you wouldn't use it with hypothesis. Here's approximately what I assumed was the rule, never having troubled to learn it:
If the word once had a silent h (because it was borrowed from French), use an. Otherwise (f'rinstance if it's borrowed from Greek), use a. Therefore it would be an hotel but a hypothesis.
Not so. I looked it up. Nowadays, of course, the rule is to use a wherever [h] is pronounced (a hotel), and an wherever h is silent (an honour), and very sensible the rule is too. But if one did want to use an, one should properly do so with words longer than 'about three syllables' and which have an unstressed initial syllable. Hypothesis, for instance.

And it turns out not to be a stupid left-over-from-history rule either: it really is easier to say an when the syllable is not stressed, because it takes too much effort to stop after a and start again on the relatively weak [h] sound in an unstressed syllable.

Wednesday 9 November 2011

Ambiguous signage

Faced with this sign, what would you do?



Today I saw a woman frankly baffled by it. I'm not picking on the civic centre, really I'm not, but I do happen to pass there often and their signs are just so very unconventional. 

To clarify, the building has two sets of doors on either side of this sort of corridor. On each side, one of the sets is out of order due to the weight of the glass. This sign, saying 'We are sorry - this door is not in use', is stuck on the glass next to one of the sets of doors, with an arrow pointing towards that set of doors. The question is, is that the door that's out of order, or is it the door you should use? 

The answer is (a), it's the door that is out of order. Of course it is: this door (points) is not in use. But it's really quite odd to point towards the thing that you're supposed to be getting people to not use. Normally an arrow would direct people towards the correct door to use. This leads to a conflict between apparent meaning and expected use of arrows, and people get baffled. 

(The sign is, of course, intended to be the precise bridge between these two conditions: it should be placed on the out-of-use door, with the arrow pointing towards the door that people should use instead. It's just been the unfortunate victim of poor sign-redeployment.)

Tuesday 8 November 2011

Handy IPA tool

While writing yesterday's post, I needed some IPA symbols, as I frequently do (far more often than I should, not being a phonologist). I stumbled across this handy website which allows you to type your text in a text box and then copy and paste into whatever you're writing. You can toggle between this simplified keyboard which gives you what you need for English, and a full IPA one with loads of symbols. There are also keyboards for other languages, such as an Italian one with accented vowels (not IPA) and an Icelandic one which gives you thorn and so on.

Monday 7 November 2011

Odd abbreviation reveals spelling over phonetics?

It was my friend's birthday the other day so we met in a pub for lunch, which is the accepted correct way to celebrate a birthday. Another friend hadn't been to the pub before and an interesting misunderstanding ensued.

We were going to a place called LYH. It's a nice pub, though you wouldn't know it to look at it from the outside. It does good grub and nice beer. This particular friend not only hadn't been there before but hadn't even heard of it before. However, we have been several times to another pub, called Mr Lynch. (Also a nice pub, though different - less about the beer, more about the partying, but still good grub.) This friend thought that in my text message 'LYH' was an abbreviation for 'Lynch'. Don't panic folks, she realised in time and made it to the correct venue. But the question is, how did this misunderstanding occur?

I wouldn't have abbreviated that word anyway, as it happens, but if I did, I think it'd be to 'Lch' or something similar. It would have included the important sounds of the word, the initial and final consonants in this case. The particularly odd thing about abbreviating it to 'Lyh' would be that the last letter, the 'h', doesn't even represent a sound of the word 'Lynch'. In broad phonological transcription, the word is [lɪntʃ]. That last sound, the 't' and the long 's', together make the sound we write as 'ch'. At no point in saying the word 'Lynch' do you make the sound [h], which is the sound you make if you say 'huh'. This is evident if you try to say 'Lyh' as a word - doesn't sound good, does it? 


So it would have been a strange choice for me to abbreviate to if I was going off the sounds in the word. But this highlights the fact that in written communication we can dissociate ourselves from the sounds of words and refer only to the spelling. Some people do this more than others, I think, though I don't know what makes the difference. Perhaps those who read more do it more. You know sometimes on 'Come dine with me', the participants read an unfamiliar item on the menu, say it's 'taramasalata', and instead of reading out what's there, they instead guess a word they know, like 'tiramisu'? I think it's the opposite of that. 

Wednesday 2 November 2011

Kindle is for lowbrow, hard copy for highbrow?

That's the main message of this Telegraph article, which did some kind of statistically dodgy survey and found that while 71% of the books (real ones) on respondents shelves were 
autobiographies, political memoirs and other weighty non-fiction titles,
the most popular books on Kindle are - surprise - the popular genres like mystery, thriller, romance and fantasy. It's supposedly because if you're using an ereader, people can't judge a book by its cover, as it were. It frees readers from the shackles of their intelligent, thoughtful public image and allows them to indulge their mucky desires for fluffy pink romance and heaving flesh. 

Tuesday 1 November 2011

Day of the Dead

Today (and tomorrow) is Day of the Dead, celebrated in Mexico and elsewhere. Sadly we don't go in for it much in the UK, because the artwork is really cool. It's not as creepy as it sounds: the idea is that you remember family and friends who have died. In many places it's traditional to visit graves and take offerings including flowers and the dead person's favourite food, and on the second day the spirits come to enjoy the gifts and festivities prepared for them. Sugar skulls are popular too, as gifts and tokens. 

Like all festivals in parts of the world that aren't the UK and therefore rainy, there are parades and street parties and people gather to remember their dearly departed. The Day of the Dead is a national holiday in Mexico, so the whole day is one long celebration. And it is a celebration, not a sad day.   

The name for the festival in Spanish is Dia de los Muertos (our name is a literal translation). Although it coincides with the Catholic festivals of All Saints' Day and All Souls' day, which is also about honouring the dead, it's an indigenous festival with its origins in Aztec worship of the goddess Mictecacihuatl (says Wikipedia).She's queen of the underworld (possible sacrificed as an infant) and these lovely ladies are catrinas, the modern representation of her. Apparently her jaws are often depicted agape because she swallows the stars during the day. 


This post is not strictly about linguistics, I admit (and nor is tomorrow's, actually, though it's a good one) so let's try and drag it back on-topic:

That tl sound at the end of the goddess's name is pretty common in Central American languages (like the language Nahuatl). It's on the  end of the word for chocolate too, chocolatl - which is what Lyra calls it in the His Dark Materials trilogy. And it's pronounced something like the ll sound in Welsh.

Monday 31 October 2011

14 punctuation marks that you never knew existed

I did know that some of them existed, actually. The full list is here. It's an annoyingly ad-full page but worth persevering with to learn that marks like these exist (comments are from the original site, not mine):


Because Sign

Because Sign
This one's so cool. It's like the "Therefore" sign, but upside-down, and it means because.

Exclamation Comma

Exclamation Comma
Just because you're excited about something doesn't mean you have to end the sentence.

Hedera

Hedera
Hedera is Latin for ivy. Why that is relevant here is not very clear at all, but this little glyph was used back in the day to mark paragraph breaks. Seems like it was probably really hard and annoying to draw, but it looks nice.

Sunday 30 October 2011

Setting exam questions

My job for today is to set exam questions for my portion of a first-year module. I'm in charge of the phonology bit (lord knows why) so I need ten reasonably easy exam questions on phonological transcription, stress, intonation, assimilation, etc. It's hard to know what they will find far too easy or far too hard, so I'm trying to vary the difficulty a bit. They're made easy by virtue of the fact that it's an online exam, so they're all either multiple choice or short answer. I won't put them on here, for obvious reasons (I don't think any of my students read my blog, but you never know, and that might cause a rather sticky situation...).

Saturday 29 October 2011

'All the protagonists'?

When I was young I liked to learn facts about language. I was also a smug little git so what I liked best of all was to learn a fact that would allow me to say to people 'You're using that word wrongly'.

I'm still a smug little git but now I know that this is prescriptivism and a Bad Thing. Linguists try to avoid it, and instead describe language usage As She Really Is. For this reason, I've decided to revisit a prescriptive usage that I remember learning as a small child. This was an extra-special one for me, because it involved knowing the  etymology of a word and applying it to modern usage, and this is always a good way to make people feel stupid. (What, you don't know classical Greek? You utter ignoramus.) Yeah, I was insufferable. 

Wednesday 26 October 2011

Really quite shocking grammar fails

Really, it's astonishing what errors you can find on printed material. In many cases it shows a real lack of any care at all. Take this photo:
It's a poster for the Xmas party nights on at Newcastle City Hall. This is a local government poster, then, which goes some way towards explaining it; it was almost certainly made by someone with inadequate training, low salary and even less interest in doing it well.

Let's look at what's wrong with it.

Monday 24 October 2011

Active voice allows speaker to hide agency


People often get over-excited about politicians using the 'passive voice' to hide their errors. There was an example of just this in last week's Fry's Planet Word, when Armando Iannucci (the famous linguist....oh no wait, he's a comedy writer) said that politicians say things like
Mistakes were made
and he, much amused and outraged, hooted
By who?
OK, that is a terrible way to apologise for anything, and it doesn't fool anyone. If people even say that kind of thing any more they're idiots. But the point is that it's not the passive that causes agency (the person who did the thing) to be hidden - you can easily say
The cakes for this week's charity sale were all eaten by me, and I'm very sorry.
Geoff Pullum has ranted about this much more extensively and accurately than I can, and he has myriad examples of stupidity. So on to the topical bit.

Saturday 22 October 2011

Learn to type

I don't know if they teach typing in school these days, in the same way we used to be taught handwriting (and hopefully children still are!). If they don't they definitely should, because it's not something you just pick up, as is evident from the zillions of two-finger typists.

As a PhD student I type a lot, either emails and web browsing or actual writing of text documents. And I spend a fair amount of my leisure time typing too, doing things like write this blog. Furthermore, I (and many other people) need to type for my job - when I used to work for a talking newspaper, much of the day was spent typing articles to record later.

I can type quite adequately for my purposes. I just tested my speed and I got 59wpm, which is just in the range of 'average professional typist' (50-80wpm) according to Wikipedia. But this wasn't always the case: when I started working for the talking newspaper I was another two-finger typist. I was actually really quick, and Wikipedia tells me that such typists can reach 60-70wpm in bursts, but I was always aware that I could be better, and it would be faster and more comfortable.

One day, I can't really remember why, I googled free online typing course and just picked one, and have never looked back. It took me about two hours to complete the course, and it was probably the most useful two hours I've ever spent. I urge you to do the same if you're like me. Do the course as slow as you like - I wanted to get maximum gains in as little time as possible, but I could have been more thorough. But do it, because the time it saves you and the difference it makes to your writing is astonishing.

It's not just the increase in speed and accuracy, either. If you can touch-type, you don't have to be looking at the keyboard the whole time so you can focus on what you're actually writing. And secondly, if you can type quickly and without real conscious effort, you can concentrate on the content instead of thinking about where the next key is.

Friday 21 October 2011

Please rejobulate me

This completely amazing letter comes to you from Letters of Note via Language Log.
This poor chap was violently dejobbed in a twinkling, and he is very bewifed and much childrenised! He hopes that he will be rejobulated with as much alacrity as may be compatible with [the recipient's] personal safety. How could you refuse?

Thursday 20 October 2011

Reanalysis of cheese as an adjective

My brother-in-law (sort of - my partner's brother) sent me a text recently that included the name of a type of a squeezy cheese that comes in a tube, called Primula. It looks like this (this is one that - bizarrely - has prawns in it):

He was texting to tell me that his cat likes it, if you're interested. There's no accounting for taste, I suppose.

Anyway, point is, he spelt it Primular. (He has a non-rhotic accent, being from North-East England so it sounds the same as the actual brand name.)

So is this spelling error due to the reanalysis of the name Primula as an adjective, primular? Would it be a type of cheese, primular cheese? I wonder what it means. It sounds a bit like rectangular, regular, circular; I wonder if it's to do with the shape?

Perhaps it's more linked to primary, and it is the supreme squeezy cheese, the squeezy cheese above all other squeezy cheeses. I wouldn't know, as there is no way in the world I'm going to eat that.

Monday 17 October 2011

Chinese show movement of travel right to left, and email is becoming unmarked

Just two things that I've noticed today, with no real comment added.

First, my attention has just been drawn by Valdemar over at The Door in the Wall to a blog called Ministry of Tofu. It covers all sorts of Chinese issues, and is written in an absolutely adorable style with ever-so-slightly misused metaphors and so on. For instance, in a story about a hitch-hiking student, a driver "let him into the truck with a grain of salt". Lovely.

Anyway, what struck me was this image of the student's journey. Simple enough graphic, but as the text explains, he started on the right and ended on the left. I find that to be the wrong way round, and presume that the difference is due to Chinese writing going the other way from English.
003

Secondly, I was looking for a contact address (postal) for a company. These things are hard to find on the company's website, so I Googled it. In a forum someone else was asking the same thing, for "an address" for the company, and they were given the postal address. They replied with thanks but said that they would prefer an email. So they meant email address when they said simply address. Normally I've found that to be the case only when the context of online communication is clear, not generally. So this is an interesting new development in what the unmarked sense of the word is. For me, it's still postal (I would use email if I wanted an email address) but for this person, email address has become the unmarked sense of address. The times they are a-changin', as Dylan is probably saying to himself as he reads this.

Saturday 15 October 2011

Texes? New plurals in English

I'm having some trouble with my mobile phone tariff at the moment, and I rang up the other day to ask how it's going. The woman I spoke to in the call centre had an accent from south-east England; let's say it was a London accent, though I can't really tell the difference between accents down there. She told me that she had added some texts on to my account, but she pronounced the word something like /'teksɪz/.

Thursday 13 October 2011

Word count shock - higher than expected

It's been a while since I've done a post about word counts. Since I moved here from Tumblr I've tried to be, you know, interesting, so there's been fewer posts about the trials of writing a PhD (which is what this blog began life as).

However, this new development is so shocking that it merits a post. I recently compiled all my comments on all my drafts, and taking a tip from Monica Macaulay's excellent Surviving Linguistics (for some reason not easy to get hold of anywhere but from the publisher), I printed them out and filed them. While doing this I added up all the word counts and it came to.... 37,774. This is astonishing, considering I would have said I had around 20,000, so I'm pretty chuffed with myself. However, I now have to write the other 40-odd thousand, so perhaps I should stop congratulating myself and get on with it.

Monday 10 October 2011

SFTY1ST in the N.T.


Superlinguo over at Tumblr has posted a list of number plates that police cars in Australia's Northern Territory will carry to promote road safety. Superlinguo says: 
You’ve gotta admit, these winners have used incredibly creative word combinations to create a message within a seven-character limit.
How many can you decipher? How does your brain deal with the lack of spaces, case markings and vowels?

•    SPDKILS
•    BUCKLUP
•    DRVSAFE
•    INDIC8
•    WATCHNU
•    PATROLN
•    NO2DUI
•    COPPA
•    N4SIR
•    BSAFE
•    YDUI
•    MYISONU
•    DNTSPD
•    RDSAFTY
•    BSAFEM8
•    KEEPLFT
•    SBRBOB*
•    NOFONE
•    NOSPEDN
•    CLKCLAK
•    NO*BUZE
•    YSPEED
•    TAKITEZ
•    BELTUP
•    DONTDUI
•    NOTXTN
•    SOBABOB
•    SLOWDWN
•    KPNSAFE
•    WATCHIN  
* We’ll give you a hint with this one, because we found it a bit obscure and had to look it up: Sober Bob is a long-running campaign by the Northern Territory police to discourage drink driving by urging people to organise their ride home before they go drinking, i.e. nominating a ‘Sober Bob’ option early, to make sure they get home safe.Thanks to Alice Springs resident Emily for forwarding us the list.
It's easy enough to work out what most of them mean, though doing it at speed with only a brief glance at the plate might prove harder (MYISONU would be particularly tough). Some of them are also more effective than others.

I especially like CLKCLAK, which I imagine is the equivalent of our 'clunk click' slogan, meant to represent the sound of a seatbelt buckle. That's a much better message than BUCKLUP or BELTUP, which are just simple instructions. CLKCLAK works because it's memorable, sticks in your head and you know immediately what it means.

Another personal favourite is BSAFEM8, perfectly reflecting the Australian dialect. Nowhere else in the world would you call anyone and everyone 'mate', especially from a policeman to a member of the public. We do have 'mate' as a generic term of address in the UK, though usually more among men, and in this case it's fine to use it with a stranger if you're both on an equal footing socially. It would be just about possible for a policeman to say it to someone they didn't know, but it would be for a reason, like if they wanted to calm them down and make themself more approachable. It can't be used as universally as it can in Australian English.

And finally, those SBRBOB and SOBABOB ones. As the note says, they're referring to a character called Sober Bob. What interests me is that there are two spellings, one reflecting a rhotic (r-pronouncing) pronunciation of 'sober' (SBR) and one not (SOBA). I think Australian English is generally non-rhotic, so I wonder if there are some varieties that are rhotic, or whether it's simply because of the spelling, and the R is not intended to be pronounced.

Friday 7 October 2011

Original OED

The Oxford English Dictionary celebrated its centenary a while ago, and to commemorate it they published this limited edition print run of the first edition, from 1911.

The Concise Oxford Dictionary: The Classic First Edition

There's a nice blog post here about the differences between then and now, which as always make for a fun read. While it's quite nice to learn about the words that haven't made it this far (marconigram, kinematograph and so on), my favourite thing is the style of the older edition. It seems so quaint nowadays.

A terrier, for instance, is a
kind of active & hardy dog with digging propensity
And to greet is to
accost with salutation
And relatedly, the blog post notes that while the OED defined cancan simply as 'indecent dance', Percy Scholes, editor of the first Oxford Companion to Music (1938), wrote that
Its exact nature is unknown to anyone connected with this Companion.
Quite right too.

Tuesday 4 October 2011

Shallow Fry

Heh. The post title is a reference to Stephen Fry, so I will now probably receive hate mail for daring to criticise the sainted National Treasure. In his defence, before we start, he didn't write Fry's Planet Word, but he did put his name to it so he has to answer for it.

The programme mentioned above is a BBC2 series currently running, on language. The second episode is the one linked to above, and the first episode is here. I think the first episode was better, actually, but both have the same good and bad points. I'll begin with what's good about these programmes.

Saturday 1 October 2011

Elephants yell 'Bees!'

I don't have anything to add to this Language Log post on the news that elephants have an alarm call for bees. I just like the idea of elephants yelling 'BEES!' when they see bees. Apparently African bees are one of the few things that are dangerous to elephants (these are presumably the terrifying African killer bees that they every now and then tell us are headed for the UK). The bees can sting them round the trunk and eyes and as they travel and attack in swarms, can be very dangerous indeed. Elephants (African ones only, of course) have a specific alarm call that means 'BEES!'. Or possibly 'BEES!! Run away, run away!!'

Thursday 29 September 2011

Ban pens to improve language skills

Yes, I thought it was odd too. Apparently, according to this BBC article, lots of kiddiwinks (over 50% in socially deprived areas) start school without having developed the ability to speak in 'long sentences'. This is a slightly vague term, but the article claims that a class of 5- and 6-year olds took six attempts to unjumble this sentence:
past the walked we shops
I wouldn't even call that a long sentence, so if this is true it's a bit worrying. The children, claims Wendy Lee from the Communication Trust, are only using short phrases and single words, and say things more typical of much younger children, such as:
went shops
The school in question is in Wythenshawe in South Manchester, but this applies to any area where there are high levels of social deprivation. Essentially, the kids aren't being talked to at home so they aren't developing language skills at the same rate as more well-off children.

So what about this no pens thing? The school, along with 99 others, is having a No Pens Day to try to encourage greater use of longer sentences. Sounds counter-intuitive, getting children not to write, but when you think about it, it makes sense. They're only young so when they're writing they're not using long sentences. And if they can't do it in speech, it's unreasonable to expect them to write lengthy accounts of a shopping trip. Instead, all the lessons on that day are discussion-based, encouraging the kids to talk more. The questions are open-ended rather than requiring single-word answers or short phrases. Who'd have thought it, wanting kids to talk more in lessons?

Tuesday 27 September 2011

This band are...

It's well known that some noun phrases that are grammatically singular but semantically plural (like the government, the staff, the band) can occur as the subject of either a plural or a singular verb form (with pronouns obligatorily matching the verb in number):
The government has said it will cut taxes.
The government have said they will cut taxes.
*The government has said they will cut taxes.
*The government have said it will cut taxes.
This is often said to be a US/UK thing, although you do hear both on both sides of the Atlantic. I noticed a restriction which I'm not sure I've seen discussed before, and that's how it works when you have a demonstrative (or a pseudo-demonstrative, a term I've just made up, for when a word such as this is used in a non-deictic way, as in "So I saw this band, called The Semantic Plurals, last night").

I think, and this is true for me though it may not be for everyone, that if you have a singular demonstrative determiner (this rather than plural these), you can't have a plural verb, it has to be singular to match the determiner:
??This band are going to be playing.
This band is going to be playing. 
But even more interestingly, you just can't have a plural demonstrative determiner - it's far worse:
*These band are going to be playing.
*These band is going to be playing.
So the semantic plurality of a noun can influence number on the verb, but not the determiner - the determiner has to match the grammatical number of the noun. This is presumably because the features percolate upwards and you'd have a clash at DP level if they didn't match.

Monday 26 September 2011

Crowd-sourcing ancient transcription

There's a really great project going on which you can help with. Here's the text from the website telling you what it's about:

For classics scholars, the vast number of damaged and fragmentary texts from the waste dumps of Greco-Roman Egypt has resulted in a difficult and time-consuming endeavor, with each manuscript requiring a character-by-character transcription. Words are gradually identified based on the transcribed characters and the manuscripts' linguistic characteristics. Both the discovery of new literary texts and the identification of known ones are then based on this analysis in relation to the established canon of extant Greek literature and its lexicons. Documentary texts, letters, receipts, and private accounts, are similarly assessed and identified through key terms and names. Furthermore, an immense number of detached fragments still linger, waiting to be joined with others to form a once intact text of ancient thought, both known and unknown. The data not only continues to reevaluate and assess the literature and knowledge of ancient Greece, but also illuminates the lives and culture of the multi-ethnic society of Greco-Roman Egypt.
The data gathered by Ancient Lives will allow us to increase the momentum by which scholars have traditionally studied the collection. After transcriptions have been collected digitally, we can combine human and computer intelligence to identify known texts and documents faster than ever before. For unknown documents, we can isolate them and begin the long process of identification.
Like any other scientific project, the data will require a lengthy process of vetting and analysis. There are no quick answers or discoveries. We want to make sure our findings are accurate. However, instead of just a few scholars going through the collection one fragment at a time, users of Ancient Lives are allowing professionals to process large batches of data at any given time. These papyri, as owned and overseen by the Egypt Exploration Society, will then be published and numbered in the Society’s Greco-Roman Memoirs series in the volumes entitled THE OXYRHYNCHUS PAPYRI.
They're just getting lots and lots of people to transcribe the hard-to-read texts into digital text, so that they can read them much more quickly. And it doesn't matter if a few get it wrong, because there'll be enough that those are easily spotted and ignored. This is a brilliant use of crowd-sourcing for research purposes.

Thursday 22 September 2011

Words I have learnt

I always learn some new vocabulary when I'm at LAGB. Sometimes from the language tutorial (every year there's an in-depth look at an unfamiliar language), sometimes just from examples in the papers. Last year, for instance, I learnt that Swahili for lion is simba, presumably where the lovable yet headstrong (and somewhat dim) character in The Lion King gets his name.

This year I learnt that Turkish for man is adam. It's also the same in Hebrew, I think, as the name of Adam (the Biblical one) is supposed to be from Hebrew. People seem to disagree over what it means though - it also means red, like the earth (which Adam was supposedly made from). However, this new word is not surprising, only mildly interesting.

I also learnt the Tundra Nenets word for bread, which is na'an. Tundra Nenets is a Samoyedic Uralic language spoken in Russia. We get the word naan from Urdu (or Persian, according to the OED), which is a whole lot different from Nenets. I don't think there's been a lot of contact between northern Russian peoples and Urdu speakers, so what the heck is going on here? Could it be just coincidence?

Wednesday 21 September 2011

Is that a fish in your ear?


There's a new book out which I haven't read yet. However, that never stopped anyone posting an Amazon review, so I'll throw my thoughts into the pot. It's called Is that a fish in your ear?: Translation and the meaning of everything, by David Bellos. His son Alex wrote a book called Numberland, which I also haven't read but is always on Waterstone's featured displays.

I've got another book called The meaning of everything (which is excellent, by the way - by Simon Winchester, about the Oxford English Dictionary), so no points for the sub-title. Points for the title though, which references the Babelfish from Douglas Adams' Hitchhiker's guide to the galaxy.

There was an extract of this book featured in the Independent the other day, describing how Google Translate works. Google Translate is a much-mocked tool, and originally rightly so. It could be relied upon to give you absolute garbage, no matter what you put into it. Hours of fun could be had translating text from one language to another and back again, and sniggering at the Chinese whispers result. Even better fun if you put it through more than one language on the way. These days, however, Google translate is disappointingly good. It gets translations pretty much completely accurate most of the time (NB It still should NOT be used to translate if you don't know the output language - you cannot guarantee it isn't utter nonsense).

The section featured in the Independent describes how it works. Here's an extract from the extract:
  
In fact, at bottom, it doesn't deal with meaning at all. Instead of taking a linguistic expression as something that requires decoding, Google Translate (GT) takes it as something that has probably been said before.
The corpus it can scan includes all the paper put out since 1957 by the EU in two dozen languages, everything the UN and its agencies have ever done in writing in six official languages, and huge amounts of other material, from the records of international tribunals to company reports and all the articles and books in bilingual form that have been put up on the web by individuals, libraries, booksellers, authors and academic departments.
It uses vast computing power to scour the internet in the blink of an eye, looking for the expression in some text that exists alongside its paired translation. Drawing on the already established patterns of matches between these millions of paired documents, Google Translate uses statistical methods to pick out the most probable acceptable version of what's been submitted to it.
This is fascinating, and obviously a good way to do it. After all, people do speak and write in fairly formulaic chunks a lot of the time. It's an efficiency device, so that we don't have to create new expressions from scratch all the time. This is why you get annoying cliches like at the end of the day and in any way, shape or form. It's also why you have standard greetings (how's it going) and ways of expressing yourself like I'm so sick of (X).

And as the author points out, human translators basically work this way too: they can often pre-empt the person they're translating and guess what will come next, based on frequently-used expressions. But this way of translating assumes that everything we say or write (or almost everything) has been said before. One of the first things we tell beginning linguistics students is that we can come up with a completely new sentence, that's never been uttered before, and any speaker of English can understand it. The standard practice is then to come up with some ridiculous sentence, like All of my armadillos have been put through too hot a wash and have shrunk.

I suppose that, faced with this sentence, Translate would take its constituent parts and translate them. So, for instance, it might find the string too hot a wash, or even have been put through too hot a wash, paired with a translation, somewhere in its corpus.

In fact, I just tried it and it didn't fare so well. I put it through an English-French-English process and it came back with this translation:
All my tattoos have been too hot to wash one and have narrowed
If you fiddle with the alternate translations you get there eventually, though I'm not sure how idiomatic it is. Ah well. There's jobs for human translators yet.

If you're waiting for the paperback edition of this book, in the meantime I highly recommend Mouse or rat?: Translation as negotiation by Umberto Eco. I have read that one, and it's utterly engrossing.