Monday, March 27, 2017

Half of the world's languages are not Indo-European (but almost half of the population speak an Indo-European language)

Big Think recently published an article titled "Half of All Languages Come from One Root Language. How it Spread Is Something of Debate" by Philip PerryThis is a nice article that provides an introduction to the stormy debate on the origins of the Indo-European languages family to the public. It is always appreciated to see current research debate covered in popular press!

The article informs the reader that "half of the languages spoken today by some 3 billion people come from a single root language".  This is unfortunately misleading, I'll provide some context here. 

TL;DR Only 6% of the 7,000-ish languages alive today are Indo-European. However, 46% of people speak an Indo-European language. Languages are not evenly distributed across the population, i.e. not half of the world's languages are Indo-European even if almost half of people speak it.

The writer is referring to the Indo-European language family, the most studied and well-known language family in the world. The Indo-European family includes English, Hindi, Sanskrit, Greek, Swedish, Russian, Nepali and many more well-known languages. (It does not however contain the european languages Hungarian, Basque, Finnish or Maltese.)

While it is true that slightly less than half (46%) of earth's population speak an Indo-European languages as their first language, it is not true that half of the world's languages are Indo-European. How can this be? Let's learn about the distribution of languages across the earth's population!

A language family is a group of languages that are hypothesized to share a common ancestor. Similarly to how groups of people can be traced back via DNA to common ancestors, linguists use vocabulary, sounds and other features of language of tracing the history of languages. This is not exactly comparable to genetics, but the two approaches to learn about human history have their similarities. 

Now, here are languages of the Indo-European family:
Capture of map of the distribution of Indo-European languages at
(NB that they include contact languages, like Nigerian Pidgin.) The colours represent sub-groupings.
And here's the world:
Map of languages of the world by Ethnologue, one dot per language.

Let's tease this out! How come that the language family that is spoken by half the population does not represent half of the languages? Well, it all comes down to the fact that languages are not evenly distributed across the population, most people speak one of a set of few, but very large languages, and a small group of people speak a lot of different languages!

The majority of the people of the world speak one of  9 languages: "Chinese"/Mandarin, English, Spanish, Russian, Hindi, Japanese, Portuguese, Bengali or "Arabic" (click here to understand more). 

We have a great diversity of languages alive today, roughly 7,000 languages and 140-260 language families (depending on which historical linguist you trust). Most of the world's languages (3,517) are however spoken by less than 1 thousand people each. 

What about the families then? We can learn from  the Ethnologue catalogue of languages that 87% of the world's population (5,9 billion people) speak a language from one of only 6 language families. Linguists think that there are between 141-260 language families in the world, so this is just a small subset of the total diversity of families (read more here). Below follows numbers taken from the latest edition of Ethnologue*. 

The 6 language families with the most speakers

Language family Living languages Number of speakers
Count Percent of all languages  Total Percent of all speakers
Indo-European 440 6.2% 3,077,112,005 46.32%
Sino-Tibetan 452 6.37% 1,355,708,295 20.41%
Niger-Congo 1,526 21.5% 458,899,441 6.91%
Afro-Asiatic 366 5.16% 444,845,814 6.7%
Austronesian 1,224 17.24% 324,883,805 4.89%
Dravidian 85 1.2% 228,108,690 3.43%
Total 4,093 57.67%  5,889,558,050 88.66%

(It's worth noting that there is a language family that has more languages than Dravidian, but fewer speakers: the Trans-New Guinean language family as 478 languages, but "only" 3,553,780 speakers.)

While Indo-European is the language family with most speakers, the reader will notice that it is not the family with the most languages(!). Only 6% of the living languages of the world are Indo-European (440/7,099).  6% is not half. 46% is almost half though!

Of the people who speak an Indo-European language (46% of the world's population), most of them (54%) speak one of only 6 languages. The table below gives the 6 most populous Indo-European languages today and their speaker numbers in millions. 

Language Speaker population in millions
Spanish 437
English 372
Hindi 260
Bengali 242
Portuguese 219
Russian 154
Total 1,684

It's not only Indo-European languages that are very big, "Chinese"/Mandarin, Japanese, "Arabic" and Lahnda etc are also massive (see more here). 80% of the total earth's population speak one of only 100 languages. 

To learn more about how linguists classify languages and families, please see this previous post on practicalities of counting in the two catalogues of languages, Glottolog and Ethnologue.

Remember, there's roughly 7,000 languages out there. What about the other? Well, 6,660 languages in the world are in fact spoken by a total population of only 1,3 billion people. Most of them (3,517) are spoken by less than 1 thousand people each. Most research is carried out on these massive families, but in order to understand human history, we're going to have to dig into the other languages and families as well! 

Linguists do not rank languages by importance depending how many speakers they have, each language carries a full system of expression and it can reveal the history of humankind, they all need to be studied. A most diverse sample gives us a more accurate picture.

These languages with less than 1 thousand speakers are increasingly losing ground, we're losing the heritage of thousands of years of humankind's history at a rapid rate. Within 100 years, most of these languages will be gone. Let that sink in. Welcome to Monday.

Also, thanks Philip Perry for writing this article and giving us a chance to enlighten everyone's day with some language stats!


Here are two bonus maps that illustrate the above discussion that languages are not evenly distributed over the world's population. The first map shows languages per country, the second shows population per country. As you can see, the two maps are not the same.

Map from Worldmapper, where each country is scaled relative to how many languages it has.
© Copyright Sasi Group (University of Sheffield) and Mark Newman (University of Michigan).

Map from Worldmapper, where each country is scaled relative to how many people live there.
© Copyright Sasi Group (University of Sheffield) and Mark Newman (University of Michigan).
Simons, Gary F. and Charles D. Fennig (eds.). 2017. Ethnologue: Languages of the World, Twentieth edition. Dallas, Texas: SIL International. Online version:

* Ethnologue is not infallible and has its problems, but it is the most well-covering source of population statistics on languages that I know and that is freely available. If you have any other suggestions, please do contact us.

Thursday, March 16, 2017

Neat map illustrations from new internet friend

We've made a new friend on the Internets (yes, the internet can be a great place to make new friends, it's not just arguing with Drunk Uncles and cats). The new friend is Stephan Steinbach from the blog Alternative Transport. His blog does very interesting work on data visualisations, in particular maps, go check it out.

He recently made a post in relation to our old post about illustrating questions about linguistic diversity. For other posts from us with data visualisation/illustration, go here.

While we're on the topic of visualising geographic data (i.e. data somehow tied to a place, such as number of languages etc), go check out Speckman's lab in Eindhoven. They've made a neat kind of map visualization called "necklace maps". If you're really into map visualisation you can even play their game on making cartograms!

Friday, March 10, 2017

Podcasts of linguistic seminars from CoEDL

Happy news, it is possible to access lectures from the Centre of Excellence for Language Dynamics (CoEDL) on any podcasting app! Now you can also look this smart while listening to lectures about linguistic diversity on your normal podcast app!
For more on this map, go here.
The centre has long had lectures up at iTunes U, but it wasn't until now that I figured out how to get them out of the apple-bubble and into podcasting apps for android etc (i.e the RSS URL)*. I tried it out yesterday, I ended up listened to Russell Gray's talk on grand challenges of linguistics while grocery shopping at Aldi - a throughly pleasant experience that I wish you all! (I'm a podcast freak, every alone moment spent not working is spent listening to podcasts.)

The centre has several different programmes and projects, and the lectures are organised into different "courses" or "podcasts" accordingly. Below, I've included links to them all and some information about each podcast. 

In order to subscribe to these with your podcasting app, you need to enter the RSS URLs below under the options for "adding new podcasts". It's not very difficult, I promise. One of the most popular podcast apps for android phones, "podcast addict", let's you do this easily (just look for the "add RSS feed" option under adding). Non-smartphone users don't despair, it is also possible to listen to podcasts via desktop apps. If you need further assistance, comment here and we'll help.

The lectures are from the public seminar series of the centre. Please note that when they are video feeds, you might not be able to play only audio as background (depends on your player).

Naturally, there are other institutions that offer lectures freely online via iTunes U, podcasts etc. Before I moved to Australia, I listened to the Linguistic Diversity-podcast/iTunesU by La Trobe for example (iTunes link & RSS link). This post is about the podcasts of CoEDL, since it's where I'm at currently and I just found this out, but I'd be happy to make posts about other seminar series in future. Leave recommendations in the comments! 


Language and society

Any analysis of language as a dynamic and evolving system must confront a self-evident truth: languages are spoken with intention by living, breathing human beings. The linguistic choices of individual and groups point to the diverse ways that we understand our world and the complex meanings that we exchange with another.



Language evolution

Just as Darwin showed that species are not fundamentally immutable, it is now known that languages continue to evolve over time, adapting to societies and their environments. In recent years a wide range of disciplines have contributed new ideas towards understanding the evolution of language. We are uniquely poised to build on these new initiatives to develop a general theory of language evolution. 

Language evolution operates over many levels and time-spans: from evolution of language as a communicative system, which took place over tens or hundreds of millennia, to evolution of specific languages across generations and within speech communities. Our research aims to link the cognitive capacity of individuals and how they process language to the use of language as a public and social product in a specific cultural and ecological context. This will therefore integrate our understanding with how language works at the level of the individual to the level of the community or nation.


This podcast is from the evolution program of the centre, more about that here.


Language learning

When children and adults learn a language, they engage with its internal complexity and varietal characteristics (see Shape), use human cognitive abilities for processing which change over their lifespans (Processing), inspire computational models of potential technological application to building adaptive systems that know how to learn, and contribute over generations to language change (Evolution).

The learning program integrates all these elements, but with a twist: it is putting a spotlight on how children and adults learn languages in contexts that are acutely under-researched, but which are of social, educational, and economic importance for Australia and its place in our region.


This podcast is from the learning program of the centre, more about that here.

Language processing

How does our language processing ability enable us to rapidly perceive, produce and understand language given the massive diversity observed across both speakers and languages?

We examine language processing in a breadth and depth previously unmatched in any one project. We will map processing at multiple levels of description and observation, in monolingual and multilingual individuals, and in typical and impaired populations. All these investigations will take place across a range of languages and dialects representing the unrivalled diversity in the Indo-Pacific region.


This podcast is from the processing program of the centre, more about that here.

Language shape (aka diversity, description and documentation)

How widely do languages differ, why do they differ, and what do these differences tell us about people and their diverse communicative needs? Currently only around 10-15% of the world’s 7,000 languages are well described, and many of the remaining 85-90% are highly endangered, including almost all of the languages of our region.

The Shape program is exploring the design space of language by investigating a strategic selection of little-known languages of our region. We will push forward efforts to document this language heritage by a broad range of methods, drawing on innovative approaches and technologies; building the first large corpora of Indigenous Australian and Papuan languages; and initiating new research on how intergenerational variation can reveal different design solutions evolving in languages to solve similar social communicative problems.


Language, technology & archiving

Research technologies in the language sciences are in a period of unprecedented development, and the judicious use of new technologies can result in rapid advances, even paradigm shifts in the nature and scope of research. Big data is being collected by citizens (crowdsourcing), while corpora visualisations techniques facilitate the modelling of language as it evolves. Technology has also become vital for the assessment of language and hearing, using eye tracking, ultrasound and/or iPad-based interactive activities. Perhaps the most common application of digital technology in language research is in archiving linguistic data in the form of acoustic recordings together with script-based lexical and interlinear analyses. Due to advances in computation, archives that were once regarded as simple repositories have no been repurposed as powerful tools for corpus analysis.



* It turns out that if you subscribe to an iTunes U channel in the iTunes app and you view it in your library and right-click the item in the list, you get the option "copy iTunes U URL" which is all you need to then feed into your podcasting app. For some weird reason, iTunes does not really explain that this is how you do this, and their Applecare customer support didn't even know this. But yeah, that's one way of getting the RSS URL. I've copy-pasted them in here for these podcasts for your convenience. 

Thursday, March 9, 2017

International Women's Day 2017

Last year, Hedvig wrote a post about the extremely prolific Joan Bresnan, co-founder of the formal grammatical framework Lexical Functional Grammar. Since the 8th of March is over in Hedvig's time zone, I thought I would make a small post with a few grammar goodies about mothers. Just because I happen to be one and being a mother is one of the important roles women take on.

from Bàsáá
híɣìí  m-ùràá      ɲ´ɛn               à-ŋ-gwés              wèè       mán.
every 1-woman 1EMPH.PRO 1.AGR-PRS-love 1.POSS 1-child
'Every woman loves her child.'

Hamlaoui, Fatima, and Emmanuel-Moselly Makasso. (2015). Focus marking and the unavailability of inversion structures in the Bantu language Bàsàá (A43). Lingua 1:35-64. p. 50.

from Tadaksahak
barr-én     i=yyasáf     s(a)        i=tǝ-keen(í)
child-PL  3p=prefer   COMP   3p=FUT-sleep
i=n           nan-én         ǝn       áaṣi-tan           ka. 
3p=GEN  mother-PL  GEN   belly.side-PL   LOC
'children prefer to sleep against the belly of their mothers.'

Christiansen-Bolli, Regula. (2010). A Grammar of Tadaksahak, a Northern Songhay Language of Malip. PhD dissertation, Leiden University. 210

from Siwu
Mmā ɔso ne, la losarɛ.
Because of my mother I am looking well.

Atsu, John. (2006) Siwu language learning course. Hohoe, Ghana: Volta Region Multi-Project. p. 53

from Tuwuli
a-ma                        lɛ-l-aa-do                                                     nɔ   fɔtɔ        o-sĩ
2SG.POSS-mother  NP.SUBJ.FOC-NEG.FUT-FUT-put.inside you message 2SG-refuse
'if your mother gave you an instruction, you wouldn't refuse (to do it)'

Harley, Matthew W. (2005). A descriptive grammar of Tuwuli, a Kwa language of Ghana. PhD dissertation, SOAS, University of London, p. 452.

from Mesqan
ɑj      dɑkko   tə-nə-tʃɛɲɲ               dɑkko-ɛɲɲɑ  ɑn-tə-tʃ’tʃ’ɑwɛt-Ø
CON mother  SUB-1P-come.JUS  mother-my      NEG-2-play.IPV-SM
ɑf-ɑhɛ                   jə-mɛst                    jɛ-bɑr-ɛt-e                 səlɛ-hɛnɛ 
mouth-your.2SM  3-horrible.IPV-SM  DAT-say.PV.3SF-1S  reach.PV.3S-be
ɑn-tətʃ’tʃ’ɑwɛt-Ø       jə-bbən-ɑ
NEG-2-play.IPV.SM  3-say.IPV.SM-3SF
‘He told her about what his mom told him about his rude language and that he was told not to speak.’

Getachew, Alemayehu. (2011). Mesqan folktales: A contribution to the documentation of the Mesqan language. MA thesis, Addis Ababa University. p. 48

from Lele (Chad)
na      me   lee          yé          dí-nì                           ná         me  jè        má          lay
HYP  2F   eat:FUT  mother  GEN:PL-1PL.EXCL  ASSC  2F  IMPF  die:FUT  also
'If you eat our mother, you will also die.'

Frajzyngier, Zygmunt. (2001). A grammar of Lele. Stanford: CSLI Publications. p. 186

See Hedvig's post again on why celebrating International Women's Day is important. Random lessons we can learn from these random grammar goodies: 1) it's OK to co-sleep, you are just listening to your child; 2) thank your mother for being a good person; 3) don't touch our mothers, you will suffer. Love and freedom to all!

Wednesday, March 8, 2017

Spurious correlations

*apologies for pay-walled links ahead*

I was first confronted by spurious correlations in language and culture during the EVOLANG 10 conference in Vienna in 2014, where I think I saw a poster on the relationship between tense marking and economic behaviour. If I remember correctly, this poster build on the famous findings of Keith Chen, who published a paper in 2013 on the relationship between obligatory future tense marking and various types of social and economical decisions people take.

The paper was very controversial before it was even published, with posts on Language Log (see links at bottom of the page for more posts) a reply on Language Log by Keith Chen and a variety of media coverage that can be found here and here and here and here.

Chen found that languages lacking a distinct future tense: "save more, retire with more wealth, smoke less, practice safer sex, and are less obese" (abstract). He explains his findings as follows "[...] being required to speak in a distinct way about future events leads speakers to take fewer future-oriented actions. This hypothesis arises naturally if grammatically separating the future and the present leads speakers to disassociate the future from the present. This would make the future feel more distant, and since saving involves current costs for future rewards, would make saving harder. On the other hand, some languages grammatically equate the present and future. Those speakers would be more willing to save for a future which appears closer. Put another way, I ask whether a habit of speech which disassociates the future from the present, can cause people to devalue future rewards." (section 1).

At EVOLANG, I felt completely flabbergasted at these findings. How could such complex individual decisions, including on how much to save for later life, dietary habits, sex habits, and smoking habits, be related to a tiny aspect of the language that one speaks? It seemed completely unrealistic to me, although Chen (2013) goes through some efforts to explain the mechanisms through which this connection would work.

Save, but only if you speak German or another future-less language

Then in 2015, the Chen (2013) study was partly refuted by a follow-up study by Seán Roberts, James Winters, and Keith Chen. Lead author Seán Roberts wrote two blog post about their study, found here and here. They point out several criticisms of Chen's (2013) paper, but focus on whether the correlation between less savings and a distinct future tense remained when controlling for the historical relatedness of the languages included in the study. As it turned out, the correlation was no longer significant when the control was included. Their point is that both language and culture have to be considered in light of history: languages are likely to inherit a particular way of marking future tense from their ancestors, and populations are likely to have economic and dietary habits similar to the populations from which they descend. Once this genealogical signal was taking into account, the relationship between the predisposition to save less money and having obligatory future marking became insignificant.

(BTW, The person who is aiming to shed more light on this is Cole Robertson, who wrote a follow-up post on a new study on the topic op future tense and economic decision taking.)

The problem I talk about here is not new. Seán Roberts and James Winters wrote an article in 2013 warning against spurious correlations between cultural traits - correlations between traits that are very likely accidents of history as there is no functional mechanism that could explain the association between the two types of behaviour. They illustrate the existence of spurious correlations by showing that these exist between morphological complexity and having a siesta or not, and between the presence of acacia trees and tone languages within countries. The problems they identify include the following:

1. Galton's problem: the need to control for historical relatedness and diffusional associations in order not to overestimate the number of independent datapoints;
2. Distance from data: In many cases, the data on a language has been collected by one individual and this data is subsequently categorised into (sometimes coarse) variables, creating a distance between the dataset and reality;
3. Inverse sample size problem: given that culture data is often incomplete, complex, and based on inconsistent data, the noise-to-signal ratio increases rather than decreases in larger datasets.

They warn that correlational studies should bear in mind a realistic hypothesised mechanism for the correlation, and should attempt to control for alternative explanations, especially those relating to diffusion and historical descent. Here are some cool pictures bringing across this point, and there is far more on spurious correlations on Replicated Typo.

Use trees!

However, the story does not end here. To my surprise, their are A LOT of papers that report correlations between aspects of language and sociological aspects of culture that seem far-fetched, and clearly they have not read Roberts & Winters (2013). I am not talking about the overview provided by Ladd et al. (2014), blogpost here, who present a review of studies that look at correlations between languages and non-linguistic forces acting on language, including variables such as the amount of second-language speakers and the type of climate. The kind of demographics that they cover in their review make much more sense to me: population size, for instance, should be expected to have an effect on certain aspects of language as it makes a huge difference whether you speak your language within a village of 100 people, or within a state with 100 million people.

I am talking about studies like the following. In 2013, Santacreu-Vasut and colleagues published a paper entitled "Do female/male distinctions in language matter? Evidence from gender political quotas" in Applied Economics Letters. This paper investigates the correlation between an index of gender variables from the World Atlas of Language Structures and the presence of a legislated quota of female members in the lower house of parliament. They find that such a correlation exists, in their words: "Countries with a higher emphasis of female/male distinctions in their dominant language (higher GII) are therefore more likely to regulate women's political participation." (p. 497). I am not even going to get started on the gender index they used, which does not adequately measure whether languages make a male/female distinction in their grammar - I am just going to say that the claim is very likely to be a spurious one. The data on gender quotas indicates that most European countries as well as many sub-Saharan African countries have gender quotas - both areas are 'gender hotbeds' (Nichols 1992: 132), as gender is both areally and phylogenetically stable in these places. So the national languages of these countries are probably driving the effect found by Santacreu-Vasut et al., making the finding an historical accident rather than a true link between gender and legalised political engagement of women.

Unlike the data on propensity to save money used by Chen (2013), the data is collected on a country level, taking the 'most spoken' language as the appropriate variable to associate with country-level variables on political participation, the Human Development Index, and the number of years since women were first allowed to run for election. I have a huge problem with this. Most countries harbor speakers of tens or hundreds of different languages (remember the map on number of languages per countries?), although most have a limited set of national languages. It is completely inappropriate to set any national language to be THE language associated with country level demographics and propose their correlational findings to be in line with theory on language shaping cognition.

Non-gray countries have some form of legislated candidate quotas, from

Things would not be so bad if, as Roberts & Winters (2013) say in their paper: "Since some of these studies are receiving media attention without a widespread understanding of the complexities of the issue, there is a risk that poorly controlled studies could affect policy." Santacreu-Vasut et al. (2013) has been cited over 20 times so far. I simply do not have the guts to look at detail to all of these, but some of them are:

Lucas van der Velde et al. (2015) 'Language and (the estimates of) the gender wage gap', from the conclusion: "We hypothesized that in countries where language has a more marked distinction between genders, differences in labor market outcomes will be larger. [...] The results robustly confirm the hypothesis." and explicitly on policy: "From a policy perspective, the major message of our study is that gender wage gap may be driven by some deep societal features stemming from such basic social codes as language. This suggests that if reducing GWG [gender wage gap, AV] was a policy objective, education on gender equality is needed already at early stages of education, when language characteristics are absorbed by children and translated into societal norms."

Hicks et al. (2015) 'Does mother tongue make for women's work? Linguistics, household labor, and gender identity'; from the abstract: "We use a novel approach relying on linguistic variation and document that households with individuals whose native language emphasizes gender in its grammatical structure are significantly more likely to allocate household tasks on the basis of sex and to do so more intensively."

Davis and Reynolds (2016) 'Gendered language and the educational gender gap'

Shoham et al. (2017) 'Encouraging environmental sustainability through gender: A micro-foundational approach using linguistic gender marking'

Malul et al. (2016) 'Linguistic gender marking gap and female staffing at MNC’s'

Seán Roberts also commented on an earlier paper by some of the authors featured above here. At the bottom of this post he says: "To put it cynically, it’s as if gender inequality is only due to humans being slaves to their language, rather than centuries of active patriarchal societies. The hypothesis doesn’t seem to have a good reason why distinctions in gender should disfavour women over men. Perhaps most disturbing is the authors’ clear appeal for these findings to be used in policy"

Indeed, this is scary stuff.

A more popular paper, albeit with perhaps less dire consequences on policy is Kashima and Kashima (1998), who find a positive correlation between pronoun drop, a grammatical phenomenon where pronouns can be left out of otherwise well-formed sentences, and collectivism, a tendency for society to look after in-group members. As was the case for Santacreu-Vasut et al. (2013), the unit of analysis is the country, with the national language taken to supply information about pronoun drop. This paper has been cited over 300 times and seems to be in high standing in cross-cultural psychological research, as it is being cited in handbooks and otherwise mostly in papers that do not pertain to this exact hypothesis, suggesting it's  results are taken as a given. There is also a follow up by the same authors, Kashima and Kashima (2003), including a larger set variables on economy, climate, and geography, and an erratum to the 1998 article from 2005.

This map was made by Gert Jan Hofstede, son of Geert Hofstede, both famous social scientists focusing on cross-cultural differences (see

The established nature of these types of papers is underlined by the fact that the attention to potential spurious correlations is not confined to specialist journals such as Journal of Cross-Cultural Psychology and Applied Economics Letters. Science published a paper in 2014 by Talhelm et al. entitled 'Large-Scale psychological differences within China explained by rice versus wheat agriculture'. From the abstract: "We tested 1162 Han Chinese participants in six sites and found that rice-growing southern China is more interdependent and holistic-thinking than the wheat-growing north." See a popular write-up of the paper here and here.

Luckily, we can count on the absolute hero Seán Roberts to provide a commentary. See here for a blog write-up. Roberts shows that it is very likely that at least part of the correlation reported by Talhelm et al. can be explained by linguistic history, again showing the need for a control for cultural contact and genealogy.

I am sure there are more papers in this vein - please comment if you know of any! I am of half a mind to track them all down, get their data, and show that most if not all of these relations disappear when controlling for genealogical descent and/or diffusion. On the other hand, clearly this would be a waste of time as the premise of most of these papers is that the national or 'most used' language of a country can be of influence on population-level findings on social and economic behaviour, a premise which I think is inherently flawed. The mechanism behind such relationships is simply unfathomable, despite the efforts taken by the authors of these papers to demonstrate the contrary. I am thinking of the level of multilingualism in most countries, let alone the amount of different communities with different psychological and economic behaviours.

Turns out that the paper on rice and wheat agriculture by Talhelm et al. (2014) is in fact an improvement on at least this issue, as it looks in detail at agricultural practices and psychological measures within a set of provinces in one country, China. However, it still doesn't account for the contingencies that arise when comparing communities that are both closely related and in close contact.

There really is no longer any excuse for not incorporating information on geographical distance and genealogical descent in cross-linguistic and cross-cultural analysis. There are reference trees available for all language families in several typological and reference databases (Glottolog, Ethnologue, AUTOTYPE, WALS) - the awesome Dan Dediu has made this extremely easy for you as explained here, as well as Bayesian posterior samples of phylogenetic trees for over 10 language families. On Glottolog, latitude and longitude of the location of a speaker populations are freely available. D-PLACE has cultural variables for many societies around the world linked to language and various phylogenetic trees. All the people that have worked on potentially spurious correlations in language and culture know how to use statistics, well great! - you can use multilevel models and regressions that correct for genealogical relatedness and geographic distance and do things the way you should be doing things.

Vortices, Principia Philosophiae, René Descartes, 1644.

EDIT: Seán Roberts alerted me to the newly published The Palgrave Handbook of Economics and Language, with a paper by Nigel Fabb that is critical of these studies entitled "Linguistic Theory, Linguistic Diversity and Whorfian Economics". His main critiques are 1) the linguistic data is simplified to such an extent that it may no longer represent linguistic facts; and 2) the studies do not present a demonstration of causation despite their claims, and need to present a far more rigourous case both in theory and in experimentation. I am really happy this critical paper was published in a venue which will surely be read by economists, yay!

Selected references

Chen, M. K. (2013). The Effect of Language on Economic Behavior: Evidence from Savings Rates, Health Behaviors, and Retirement Assets. American Economic Review 103.690–731.

Kashima, E. S. & Kashima, Y. (1998). Culture and language: The case of cultural dimensions and personal pronoun use. Journal of Cross-Cultural Psychology 29.461-486.

Kashima, Y. & Kashima, E. S. (2003). Individualism, GNP, climate, and pronoun drop: Is individualsm determined by affluence and climate, or does language play a role? Journal of Cross-Cultural Psychology 34.125-134.

Ladd, D. R.,  Roberts, S. G. & Dediu, D. (2014). Correlational studies in typological and historical linguistics. Annual Review of Linguistics 1.221-41.

Nichols, J. (1992). Linguistic diversity in space and time. Chicago: University of Chicago Press.

Roberts, S. G. & Winters, J. (2013). Linguistic diversity and traffic accidents: Lessons from statistical studies of cultural traits. PloS One 8.e70902.

Roberts, S. G., Winters, J. & Chen, K. (2015). Future tense and economic decisions: Controlling for cultural evolution. PloS One 10.e0132145.

Santacreu-Vasut, E., Shoham, A. & Gay, V. (2013). Do female/male distinctions in language matter? Evidence from gender political quotas. Applied Economics Letters 20.495-98.

Tuesday, March 7, 2017

Listen to the world's languages pt 4!

There's a lot of website around where you can literally listen to the diversity of the world's spoken languages, we've covered a few of them before under the tag Listen To The World's Languages.

Since we last made a post, we've found some more sites for you to enjoy. Just to reiterate here's a list of the sites we already mentioned before:
Now, to new additions in this category. We've got six for you this time, all quite spectacular! 

Radio garden

Radio Garden is a Transnational Radio Knowledge Platform traveling online exhibition designed by Amsterdam-based Studio Moniker and developed by the Netherlands Institute for Sound and Vision. It’s a project that aims to connect people worldwide through shared experiences.

This is a really cool thing, I'm thoroughly enjoying this. You can literally listen to radio stations from all over the world, both speech and music. Very cool, very good art project.

Interactive map of languages
of the International Phonetics Association

This might not sound as cool, but trust me it is. The journal of the International Phonetics Association has a standard format for people to submit phonological descriptions of languages in. It includes that certain words, minimal pairs and a certain story is recorded (the north wind & sun). These language illustrations are here mapped out, so you can click and listen to them! It is Marija Tabain from La Trobe University in Australia and colleagues have developed this map. It's neat because it features many lesser known languages and the actual scientific publication associated with the phonological description. I took particular pleasure in finding Nen, one of the languages people here at ANU work on (in particular my PhD supervisors Nick Evans).


This site is quite similar to Language Landscape, it lets you upload audio samples that are geo-tagged and then you can browse the world map and listen to other clips. It's very nice, but maybe not so new.. Sorry.  It's got more than 18.000 clips, which is more than Language Landscape's 708 recordings. I must confess though, beside the size I'm not really understanding what the big differences are between these two sites. Language Landscape have you log in, and Localingual does not require that. That has advantages (more people submit) and disadvantages (more crap submitted).

Both are cool, go to both.

**EDIT** I hadn't noticed the up and down voting at Locallingual until after writing this. That is very good, and it gets rid of a lot of the troubles people have been raising with the site. Thumbs up!!

These next three are databases of languages of a different kind, these contain material collected in various research projects - primarily on endangered languages. It's not at all as easy to just click and listen, unfortunately, but they do contain more material and more rigorous scientific description of the languages and material!

Endangered Languages Archive (ELAR)
ELAR is an archive for endangered languages hosted at the School of Oriental and African Studies in London. It currently contains 517 languages and is a very important institution.

Pacific and Regional Archive for Digital Sources In Endangered Cultures (PARADISEC)

PARADISEC is a database of material on endangered languages, with primary focus on the Asia-Pacific region. It features 800 languages currently, both text and audio material. It is the #1 site for language material in this region.

Dokumentation bedrohter Sprachen (DOBES)

DOBES contains 68 languages and is a project for documenting endangered languages sponsored by the VolkswagenFoundation and hosted by MPI-Nijmegen.


We hope you enjoyed that, if you've got suggestions be sure to comment or tweet at us.