If it doesn’t make sense in light of all the facts, it doesn’t make sense

download (1)Bayesian statistics has made The New York Times, The Odds, Continually Updated. One illustration of the utility of Bayesian methods left out of the piece is in phylogenetics. For example, Mr. Bayes. Just to see how far we’ve come, I like to retell a story from a professor of mine. When he was in grad school ~15 years ago it was assumed that they’d never be able to implement the far less exhaustive maximum likelihood method due to limitations of computational power.

But the reason I want to highlight this article, aside from that it is a good article overall, is this section:

Take, for instance, a study concluding that single women who were ovulating were 20 percent more likely to vote for President Obama in 2012 than those who were not. (In married women, the effect was reversed.)

Continue reading the main story
Dr. Gelman re-evaluated the study using Bayesian statistics. That allowed him to look at probability not simply as a matter of results and sample sizes, but in the light of other information that could affect those results.

He factored in data showing that people rarely change their voting preference over an election cycle, let alone a menstrual cycle. When he did, the study’s statistical significance evaporated. (The paper’s lead author, Kristina M. Durante of the University of Texas, San Antonio, said she stood by the finding.)

Cairo University class of 1978, no headscarves

Cairo University class of 1978, no headscarves

There are plenty of sexy papers published on ovulation and menstrual cycles and how they correlate with a particular outcome. If you look for correlations enough they will come (assuming you use p = 0.05). That’s common sense. But another issue to consider here is that you have a model, and the predictions that the model makes don’t hold in a rather simple case where the cause and effect seem obvious (i.e., voting patterns should change over time since ovulation changes over time). This sort of sanity check is important when you go drudging through statistics, but also when you are tackling complex phenomena at a high level.

Whenever I talk about Islam people offer up opinions about how the Koran serves as a sort of template or guidebook in terms of behavior, and that explains Islamic civilizations real pathologies. This is not an implausible model, and I held to it myself when I was younger.

Cairo University class of 2004, lots of headscarves

Cairo University class of 2004, lots of headscarves

There are two problems. First, most of the people making this assertion don’t know enough history or religion to even plausibly evaluate the model in their own head. That’s just a fact, and why I’m so dismissive of so many people. The limits of your knowledge are the limits of your model building. Second, there’s a deeper issue which I first encountered in the mid-aughts: there’s a good deal of evidence from cognitive psychology that people barely understand in a coherent manner the ‘messages’ of their scriptural texts. This is outlined in books like In Gods We Trust: The Evolutionary Landscape of Religion. One also has to remember that for almost all of human history, and to a great extent today, most humans were either illiterate or functionally so. More importantly psychological experiments which attempt to ferret out exactly how scriptural texts would impact peoples’ beliefs show that there’s no real ratiocination going on. Rather, it seems to be that reasoning in a religious context is a process of collective rationalization.

Classical evolutionary biology

9780198504405When it comes to a field like genomics there’s really no point in reading a textbook beyond the elementary level because it’s moving so quickly that things get out of date within the year. But that’s not always the case. I quite like Dover Books for their math section. Math is true, even if it was written in 1920 (the prose is often a bit stilted, but that’s not what you’re focusing on in any case). Evolutionary biology is somewhere between genomics and math. There is much that gets out of date rapidly, as science proceeds, but there is a broader scaffold which remains true no matter the passing decades. For intelligent lay persons who are interested in evolution my own suggestion is to just read The Origin of Species, and then R. A. Fisher’s The Genetical Theory of Natural Selection. After that popular works by Richard Dawkins make a lot more sense.

Only text on evolutionary biology I've ever read (as opposed to evolutionary genetics)

Only text on evolutionary biology I’ve ever read (as opposed to evolutionary genetics)

But why stop there? On Twitter there is a hashtag, #EvoBioClassics, which will be of interest to anyone who wants to understand evolutionary biology. These are papers which are highly cited and referred to, but often not read as much as they should be. One of the great virtues of them being classics is that you can usually find ungated versions of the paper somewhere.

Here are my suggestions:

Fisher, R A. 1918. “The Correlation Between Relatives on the Supposition of Mendelian Inheritance”. It reconciled genetics and evolutionary biology.

Price, George 1970. “Selection and Covariance”. It’s a paper that W. D. Hamilton would have wanted to write. It condenses a lot of concepts into a very elegant form.

Trivers, R. L. 1971. “The Evolution of Reciprocal Altruism”. This is a term that’s used a lot. So it benefits one to go back to the source.

In any case, check out the hashtag. Also, this page has a great list.

George F. Will, American atheist

Several years ago George Will declared that he was an agnostic on the Colbert Report. Last week he pulled no punches:

RCR: Do you believe in God?

GW: No. I’m an atheist. An agnostic is someone who is not sure; I’m pretty sure. I see no evidence of God. The basic question in life is not, “Is there a God,” but “Why does anything exist?” St. Thomas Aquinas said that there must be a first cause for everything, and we call the first cause God. Fine, but it just has no hold on me.

RCR: Were you raised with any religion?

GW: My father was the son of a Lutheran minster, and therefore he was an atheist. What I mean by that is — he went to so many church services, his father preached in many churches up near Antetum, eastern Ohio, Pennsylvania — my father had had his full of religion. He used to sit outside his father’s study and listen to him wrestle with members of the church over reconciling grace and free will. I think that’s where my father got his interest in philosophy.

I majored in religion in college. I was very interested, but I just came to a different conclusion. I’m married to a fierce Presbyterian and she raised our kids fierce Presbyterians.

I’m an amiable, low-voltage atheist.

RCR: Does that present a problem for you as a conservative?

GW: No. The Republican Party’s base is largely religious. It would be impossible for me to run for high office as a Republican. Since I have no desire to run for office, it’s a minor inconvenience! I think William F. Buckley put it well when he said that a conservative need not be religious, but he cannot despise religion. Russell Kirk never quite fathomed this, which is one of the reasons why I’m not a big fan of The Conservative Mind. For him, conservatism without religion is meaningless.

RCR: Your friend Charles Krauthammer likes to say he’s an agnostic.

GW: I think he’s an atheist. He flinches from saying it. I saw what he said: “I don’t believe in God, but I fear him greatly.” Oh, please!

Reality has a bias against your bias

gsi2-chp1-9It has come to my attention that Bill Maher has been making some pronouncements against Islam, and this has resulted fierce blow back from the likes of people such as Reza Aslan. Normally I don’t follow Maher too closely. I used to watch his show, Politically Incorrect, back in the 1990s, and though he had his moments of wit, humor, and insight, by and large his stock in trade was superficial buffoonery. So I generally do not pay attention to him. More recently he’s been espousing views which make him a fellow traveller with New Atheists. As a disagreeable person who enjoys some biting polemic I do appreciate the New Atheists for the role they play in the ecology of ideas. They do not hide behind the post-modern fixation on “tolerance” and “diversity.” But my ultimate judgement about them is that their foundational propositions about human nature are wrong. In other words, I stand with cognitive anthropologists such as Scott Atran as to the roots of religion. Though in the God Delusion Richard Dawkins exhibits some familiarity with this literature, in the end his rhetoric and central thesis seems to take it for granted that religion is a contingent cultural invention, and adherence is a feature of improper implementation of the principles of rationality. My own position, in line with cognitive anthropologists, is that supernatural ideas are relatively inevitable human intuitions given the architecture of our minds, which are far less dominated by the ability to reflexively reason than 18th century rationalists would have believed. The more elaborate specific institutional aspects of religion are also probably rather inevitable given the needs of mass society after the Neolithic Revolution. In other words telling people to stop being stupid probably won’t have the effect that the New Atheists think it should. People are just…well, stupid. I do have to admit that there seems a bit of irony in this, insofar as the New Atheists promulgate a world-view predicated on adherence to the empirical facts, but have the normal human bias to discount those data which conflict with their prior model.

But this is not just an issue with New Atheists. Many who disagree with the New Atheists on the cultural Left seem averse to grappling with the empirical facts when it comes to Islam, where because of the New Atheists’ lack of interest in social conformity they express truths as if they’re the child who sees the naked emperor. Richard Dawkins regularly makes bold and laughable assertions, outrunning his own knowledge base whenever he talks about things not biological. But sometimes those who rebut his claims also outrun the facts in their eagerness to “debunk” his unpalatable views. About a year ago I got into a Twitter conversation with financial journalist Heidi Moore, who basically decided that she had to correct my misguided views about Islam. Though I agreed that Dawkins’ contentions were rather excessively general and deterministic, I believed her own apologia for Islam was based on just as rickety a factual foundation. Somehow in the wake of 9/11 American liberals, and to a lesser extent the mainstream more generally, have transformed themselves into Hujjat al-Islam, or “Proof of Islam,” whenever confronted with “ignorance.” The curiosity here is that yes, their interlocutors are expressing ignorance. But in their rebuttals there is also a great deal of ignorance.

In the exchange above Bill Maher in contrast has clearly done his homework. The majority of the world’s Muslims hold quite illiberal views. Not all Muslims. And there are regions where Muslims hold views in line with Christian societies which have undergone secularization. But overall Pakistan is closer to the central tendency than Bosnia, least of all of because there are nearly 200 million Pakistanis today. You can read the Pew survey which Maher referenced yourself, it’s been out for years.* He’s clearly conversant with the details. The usual rejoinder from liberals out to the mainstream is “but Christians too….” Maher points out that this sort of equivalence is just not plausible. Rather, it’s a ploy. No ex-Christian atheist fears for their life, though they may experience social ostracism.

The flip side of this of course is that some Christian conservatives and New Atheists argue for a Platonic and fixed character for Islam. For the New Atheists this follows from their thin and spare model of religious belief, which derives from elementary axiomatic errors. For many Christian conservatives it is derived from their religious beliefs, which they assume to be true. Islam, being false, is always going to be false. But taking a step back from the perspective of someone who believes all religions are fictions, and accepts a model of more cognitive and cultural complexity, it seems striking exactly how pliable religion itself is. If you read The Northern Crusades (against the pagan Balts) you may be struck by the similarities to the behavior of the Islamic State. And you don’t need to go back nearly 1,000 years, the Thirty Years War is more than sufficient in terms of barbarity. Religions are not special creations of god, they evolved from the history and minds of men.

It is true that not all Muslims present views which make one recoil. The problem is that in places like Pakistan enough do that if you violate the blasphemy law you may be killed rather quickly by those who have a less broad perspective. Even in Turkey, which is on the more liberal side in regards to religion, the ascendant Islamists have conservative views which lead them to chide women laughing in public. Depending on your views of the term “bigot” it is or isn’t bigotry to assert that the majority of the world’s Muslims are deeply illiberal, so it is not entirely surprising that atavistic neo-medieval violence periodically explodes out of the nether regions of the faith. But, it is also critical to question whether Islam is constitutionally so. Being that it is made up, like all other religions, I am quite skeptical of that. So there is hope if one keeps the faith that what goes down must eventually come up.

Does, on the whole, Bill Maher express obnoxious and superficial opinions? Probably, from what I’ve seen and heard. But the evidence above suggests that he’s not constitutionally incapable of honest insight.

* By and large Iraqi Shia are actually rather conservative in the broader Muslim world. I wonder of the low support (relatively) for the death penalty for apostates is a function of the rise of sectarian violence in the mid-2000s, where they saw exactly where a proliferation of takfiris leads.

DNA, history, and genealogy

downloadIn about a week my friend Christine Kenneally will have a new book out, The Invisible History of the Human Race: How DNA and History Shape Our Identities and Our Futures. The scope of the work is pretty diverse, from individual personal stories, to the sorts of grand historical narratives which the Reich lab is spinning from their numerous publications. I had the pleasure of reading early drafts of the work, and what struck me is that Christine does a very good job of making the case for why genealogy is not silly, a common problem that people in the field encounter. Honestly I didn’t give much thought to genealogy until recently, but then I’m one of the people who is rather certain of the near-term genealogy of my family. When your past is more clouded these issues can loom much larger. It’s only silly when you’re confident of your background. The role that DNA can play in constructing the larger portrait is pretty straightforward.

Aside from the human element threaded through the science hardcore DNA junkies won’t find much to surprise. Christine touches base with the usual suspects in personal genomics, as well as those who work in an academic setting. But if you are a more general layperson who is sometimes befuddled by the jargon in my posts, this would be a pretty good taste of the field, and where the “post-genomic era” is leading us all.

We’re at peak haplotype

Credit: Razib Khan

Credit: Razib Khan

Sometimes when I see treatments of the history and development of evolutionary genetics from outsiders I notice how jargon creeps into their descriptions in a way that’s not adding much value. For example, several times over the past year I’ve seen people refer to how one can construct genetic clusters using “haplotypes.” The fact is that haplotypes are not necessary for the construction of genetic clusters; any form of genetic variation will do. Haplotypes can add something of value, but they’re not necessary. Terms such as “haplotypes” or “SNPs” might percolate into broader public discussion, but too often it seems that they’re used like the term Abracadabra!, an incantation.

But it did get to me thinking, how common has utilization of the term haplotype become in the scientific literature? When asking a question like this I did what I usually do: go to Google Scholar and see how many hits I get for a term by year. As you can see the use of the term levelled off in the mid-2000s, as the HapMap took off and became part of the background furniture of human population genomics.

Here’s the raw data….

Year Hits
1970 27
1980 1110
1990 2500
1995 3810
2000 6830
2001 7730
2002 8900
2003 11300
2004 13900
2005 14600
2006 18100
2007 18700
2008 20800
2009 21000
2010 21300
2011 21700
2012 21600
2013 20300

Books are best to read with, Jesus told me so

220px-Devil_codex_GigasPeriodically in my Facebook feed I get people posting articles like this, Science Has Great News for People Who Read Actual Books. By “actual books” the author means a physical book, and in particular a codex. Apparently at a conference last month a study was presented where a sample of 50 individuals produced a result where there was more recall of plot points from 30 page story better when reading on a book than an e-reader (N = 25 for each treatment). The report in The Guardian finishes:

The Elizabeth George study included only two experienced Kindle users, and she is keen to replicate it using a greater proportion of Kindle regulars. But she warned against assuming that the “digital natives” of today would perform better.

“I don’t think we should assume it is all to do with habits, and base decisions to replace print textbooks with iPads, for instance, on such assumptions. Studies with students, for instance, have shown that they often prefer to read on paper,” she said.

First, someone who is presenting a huge result based on N = 50 (and a W.E.I.R.D. one at that) has a lot of chutzpah in advising caution at such broad general statements. The only true digital natives today are under the age of 10 when it comes to reading on devices such as the Kindle. These sorts of studies seem keen on reiterating the prejudices of the contemporary median readership. Who cares what median students prefer today? A few years ago MySpace was preferred. I believe that if these studies were going on in the 4th century A.D. then you’d see just how much the well educated Roman preferred the scroll to the uncouth codex (though a well educated Greek slave was probably the most “haptic” and “serendipitous” reading device of all!). The “actual book” is actually an innovation, as widespread utilization of the codex format took centuries to become the norm. The Christian Bible was one of the first books habitually in the codex format, and the spread of Christianity has been credited with the popularization of the codex in relation to the scroll. The point is that a “book” is an abstraction. The codex, scroll, or e-reader, is its concrete manifestation. Perhaps it is true that the codex format is ideally optimized for human comprehension. I suspect not. Humans are much more prejudiced toward their habits than they are optimized toward reading. Humans didn’t evolve with reading.

5dc5c4169There are real problems with the e-reader format. I dislike being unable to jump between pages “naturally” too. But, I’m rather sure that these problems will be solved at some point. The codex has had 2,000 years. Give e-readers at least another 10.

Also, let’s keep it real, the average American does not read very much. The main reason I’ve mostly switched to e-reader format is that I hate having to lug around many books (and I am not a hoarder, I sell/discard books regularly). If the mean number of books read is 12, while the median is 5, you know the distribution isn’t normal. There are many people who don’t read at all, and a few who read a lot. Of those 12 books many are going to be paperbacks. It’s pretty easy to imagine storing a dozen paperbacks. I have a lot of textbooks, as well as academic press books. And I’m on the mild side compared to people who are older or have more of a hoarding habit. I wonder how much an acceptance of the convenience of the e-book formats correlates with people who read too much to not clutter their houses if they stick with the traditional physical formats.

Addendum: If my hardcover books could be compressed somehow so they took up minimal space I might prefer hardcover to e-book. The main thing I would miss is the search features, but I might trade that for being able to jump easily between pages (there are indexes!). I’m not sure that my daughter or son would make the same decision though.

How Turan invented Islam

Much of the mythology of the pre-Islamic Persia involves the tension and conflict between Iran and Turan. In modern parlance “Turan” has become synonymous with Central Asia and the Turk, but in its original meaning it involved two groups of Iranian peoples who were distinctly geographically situated. The eruption of the Turkic tribes can be dated to approximately the middle of the first millennium A.D., so they post-date the mythological era of the Iranian peoples, though they coincide with the arrival of Islam to Central Asia. Lost Enlightenment: Central Asia’s Golden Age from the Arab Conquest to Tamerlane is really the chronicle of the last 500 years of the cultural efflorescence of classical Turan, the ancestors of the people we today term Tajik, as well as nearly extinct groups such as the Sodgians. Though there are numerous ‘call-backs’ to the pre-Islamic era, as well as the requisite scene setting chapters, the heart of the matter occurs during Islam’s Golden Age, in particular of the Abbasid Caliphate. The last few centuries, from the rise of more self-consciously Turkic political actors to the period of Timur, get’s short shrift, and the story is tidied up rather quickly.

k10064Lost Enlightenment is also unapologetically a history of intellectuals. Social, cultural, and diplomatic events serve as background furniture. They’re noted in passing and alluded to, but ultimately they are not the center of the story. They’re for intellectuals to be situated within. The key fact which serves as the cause for a book like this is many are not aware that an enormous disproportionate number of the intellectuals of the Golden Age of Islam were ethnically Iranian and from Central Asia. I say ethnically Iranian, because it is not quite accurate to state they were Persian, because the Iranian languages and ethnic groups differ considerably. Abū Rayḥān al-Bīrūnī was a native of Khwarezm, the Iranian language of which was close to Sogdian, and therefore closer to modern Ossetian. The author observes that because intellectuals from Islam’s Golden Age habitually wrote in Arabic most moderns assume they must be Arabs (perhaps more accurately, the names “look Arabic”, unless they are unrecognizable transliterations). But this is an error of the same class as presuming that because Western scholars utilized Latin as a lingua franca until recently they must have been Latins. A quick perusal of Wikipedia’s entry on the philosophy and science of the Islamic Golden Age will disabuse you of this notion. Though the central focus of Lost Enlightenment is on Iranians from Turan, it is important to remember that many individuals of note don’t quite fall into this exact category but exhibit affinities which might surprise. Though the figure behind the most widespread school of Islamic law, abu Hanifa, is well known to have had his ancestry among the Persians of what is today Afghanistan, ibn Hanbal, founder of the austere Hanbali school (arguably the ancestor of the Wahhabi and Salafi movements) was descended from Khorasani Arabs. In other words, even many of the Arabs had eastern affinities.

41OxoLpuNyL._SY344_BO1,204,203,200_To understand why, you need to realize that to a rough approximation the shift between the Umayyad Caliphate to the Abbasid involved a orientation of the Islamic world away from the Mediterranean world and toward Central Asia, Turan. This is summarized by the reality that the capital shifted from Damascus in Syria to Baghdad in Iraq, but this small distance does not do justice to the shift in mentality. The Abbasids were brought to power by armies and social movements with roots in Khorasan and further north and east. It was in a sense a revenge of the mawalis, non-Arab converts to Islam who were marginalized as second class citizens under the Umayyads. Traditional Muslims sometimes refer to the Umayyads as the “Arab Kingdom” because of the ethnic nature of their polity (evidenced by the fact that there were instances where Arab Christians were privileged over non-Arab Muslim converts). Though the Abbasids were an Arab Caliphate, their ruling culture was much more ethno-linguistically cosmopolitan. Over time the dynasty began to rely more and more upon Turks from Central Asia to man their armies, while the domain of culture and politics was heavily inflected by Iranians and Arabicized Iranians. For a period the caliph al-Ma’mun relocated the locus of the Caliphate to Merv, in modern day Turkmenistan. It is not surprise that al-Ma’mun’s mother was a Persian from Khorasan.

downloadThe culturally Turanian color of the Abbasid world is critical because I think it is plausible to argue that Islam as we understand it emerged during the Abbasid period. On the face of it this sounds strange. Islam as a religion obviously dates to the time of Muhammad, in the early 7th century. Salafi purists would purge all that came after the mid-7th century, the period of the “Rightly Guided Caliphs” (i.e., the pre-dynastic period). But to say Islam was formed in this period is like saying Buddhism dates to the time of the Buddha, in the middle of the first millennium B.C., or that Christianity dates to the time of Jesus down to the writing of the Synoptic Gospels a few decades later. No matter what religionists may aver religions evolve organically through time, and some of their most seminal aspects develop considerably later. Among Christians this is acknowledged by the repeated attempts to recreate “Primitive Christianity,” that is, the Church before it became co-opted by Roman Imperial culture. But even before the conversion of Constantine Christianity had transformed into a gentile religion with Jewish roots, rather than a Jewish sect. The institutional superstructure of the Christian Church and its theological basis were totally transformed by the immersion of sectarian Judaism in the Greek and Roman world (one could say that this is true of both Christianity and modern Judaism!).

In modern Sunni Islam (~90 percent of Muslims) in comparison to Christianity theology plays a relatively minor role in relation to law, shariah. One of the primary bases of shariah are the hadith, the sayings of the prophet. It so happens that the two most respected collections of these sayings for Sunni Muslims were authored by Persians from Khorasan. The author of Lost Enlightenment chalks up the prominence of Turan in the compilation of hadith to the pre-Islamic cultural and religious norms, in particular on the prominent Buddhist tradition of translation and collection. Though never explicit the argument seems to be that this region so essential in the development of Islam as we know it remained religiously plural, with Buddhists, Zoroastrians, Christians, and pagans prominent for centuries, and this cultural background could not but help shape the beliefs and practices of local Muslims, many of them converts. But the connections are often not made concrete, but are more suggestive. For example the connection between Buddhist viharas and the later madrasas. Because the Buddhists of Turan have no modern day cultural descendants it can be quite difficult to comprehend just how prominent this religion was during this period, but it is well known that under the early Abbasids the influential Barmakid family were  relativley recently converted Buddhist functionaries. Rather than the specifics though I think the fixation in Lost Enlightenment on the non-Muslim milieu that persisted in Turan down to ~1000 A.D. is to emphasize that during Sunni Islam’s formative period the religious culture looked east as much as it did to the west, that is, the world of India. The connections between the Near East, Central Asia, and India, are ancient, going back to records of Indian merchant communities settled in Sumeria. It does not take a leap of imagination to wonder if Sufi mysticism may have been influenced by Indian practices and beliefs (some early Sufi mystics do report Indian, or perhaps more accurately Turanian Buddhist, mentors). And there are curious currents in the other direction, “Greek medicine” as transmitted by Central Asians is still practiced in India.

Islamic civilization beginning with Muhammad is at its foundation “West” facing. Muhammad engaged the ideas and thoughts of Christians and Jews, and his foreign travels took him to the margins of Syria. The details of prayer positions among contemporary Muslims reportedly derive from the practice of Syrian monks. The eastern fringe of the Islamic world at its founding was that of the magians, the Zoroastrians, who were also clear influences. But if you accept the proposition that much, most, of Islamic civilization dates to the Abbasids, then your understanding of West and East must shift. Here the West is the world of Persia-verging-upon-Mesopotamia, Iran, and the East is India, and to a lesser extent China. The center is Turan. This is a somewhat tendentious position, but I do think it is defensible, should make us reconsider the genealogy of Islamic culture and civilization.

But one of aspects of Lost Enlightenment that I found irritating is prefigured by the title, and that is the Whiggish attempt to shoehorn Turanian civilization into the stream of ascending scientific and mechanical complexity of the West. I do think it is interesting that Turanians contributed overwhelmingly in the domains of medicine an the natural sciences, and far less to what we might term the humanities. The author argues rather aggressively that this is due to the fact that the environment of Central Asia requires city-scale hydraulic civilization, putting a premium upon the mechanical sciences. I am moderately skeptical of environmentally deterministic arguments, but they are reasonable. What is harder to excuse is harping upon the same thesis so often, as well as showing your own philosophical preferences so clearly. The author, like myself, is biased toward those scholars with a peripatetic method in regards to the natural sciences. Though making the case for Turan’s role in the formation of Islamic orthodoxy, he is not positively inclined toward the anti-scientific legalist orientation ascendant after ~1000 A.D. Neither am I, nor are most Western readers of this work. If al-Biruni is the hero, then al-Ghazali, a Persian from Khorasan, is the villain. This sort of normative typology is not befitting a scholarly work of this level.

Finally, we have to address the fact that today Turan is not what it once was. The prominence in intellectual endeavors indicates a demographic robustness which is hard to see in modern day Central Asia. The short answer seems to be the Mongols. The author argues that the Mongols were particularly destructive in Central Asia, both in the areas of straightforward genocide and destruction of the material basis of Turanian urban society in the form of hydraulic engineering. It seems clear that this period also saw the shift from a mostly Iranian speaking populace, to a Turkic one, as the Turks, long recently dominant politically, became handmaids to the Mongols. Though Lost Enlightenment gives some space to early Turkic attempts at ethnic assertion (apparently they were segregated in Baghdad in the early years), it is a very secondary aspect. But it may be that ultimately Turanian civilization always had a sell-by date, because the geographic parameters for dense civilization in Central Asia are fragile and marginal. Situated at the center of Eurasia, and forcing its populace to engage in ingenious engineering to simply survive, Turan was bound to be a creative force. But its explosion may inevitably have been ephemeral.

You lose-they win & they win-you win

Colossal_octopus_by_Pierre_Denys_de_MontfortProPublica and This American Life have broken an expose of sorts about the spinelessness of the New York Fed in relation to the Wall Street banks which it is enjoined to supervise, specifically Goldman Sachs (which is basically the apotheosis of a Wall Street bank). But this is really all style and no substance: as Daniel Gross points out the New York Fed has always been a creature of Wall Street, there to do its bidding. The reason that this story is worth reporting on is that a whistle-blower recorded some of the meetings between Fed officials and Goldman Sachs, and therefore highlighted just how clear it is that the latter calls the shots for all practical purposes. But we all knew that after 2008. Wall Street socialized its losses, and came roaring back, privatizing the gains which accrued from the easy money doled out by the Fed, as well as the now explicit back-stop of the American government. They know we know, and they know we won’t do a thing about it. Basically it’s like Wall Street punched us in the face, and then sent us a bill for the injury. Also, they demand an apology whenever we besmirch their honor.

There will always be winners and losers, the high and mighty and the low. The key is that it is optimal for the many when the great gain honor through actions which spill over into the public good. The ‘innovations’ of the financial sector, and the bloat that has occurred in ‘inter-mediation’, do not fall into that category. There are only so many gains on the margin of improved allocation of capital. At some point the proliferation of professions meant to smooth the institutions of an advanced society end up devolving into a zero-sum game for finite resources. This is true with bankers and lawyers. Both these are honorable and necessary professions, but when there is a surfeit of both you know that society has gone sclerotic.

downloadThis is why I put my hope in Silicon Valley, and in particular men such as Elon Musk. Musk is as much a megalomaniac as a Wall Street “master of the universe,” but his ambitions and greed for glory drive him to found firms which aim to change the fundamental rules of our civilization. And for the better. It’s not a zero-sum game he is playing; he wants to explode the pie and grab a huge chunk of it. A high-risk high-reward endeavor.

Ultimately to fend off sharks you need killer whales. Our civilization is premised on capitalism, and growth. Without growth elite over-production leads to the rise of zero-sum competition for resources, and the brutal games of greed which led in part to the financial crisis of 2008. The hope is that Silicon Valley and other genuinely innovation sectors of society can hoover in enough talent, creativity, and ego, to change the rules so that the crass an Byzantine machinations that are on display in the activities of the New York Fed become blips upon our near term historical trajectory. In contrast, if we stagnate, except the games to get bloodier and more desperate.

Khorasan is kind of a big deal

k10064Last week the American armed forces attacked a Syrian branch of al Qaeda which went by the name Khorasan. If you read around the web you’ll be informed that the term, a geographic one referring to the lands of Islam’s east, along the fringes of Persia, Central Asia, and western South Asia, is freighted with historical resonance for jihadis whose ideology is strongly inflected by a romantic vision of Islam’s past. By coincidence over the past few weeks I’ve been reading Lost Enlightenment, a book which chronicles Central Asia’s contribution to early Islamic civilization, and therefore a story in which Khorasan looms very large. Of course you don’t need a book length treatment on an obscure historical topic (though I would argue Central Asian shouldn’t be obscure, it is) to understand why Khorasan is important in the imaginations of jihadis. 41OxoLpuNyL._SY344_BO1,204,203,200_To keep it succinct, though Salafis and their fellow travelers idealize the period of the Rashidun caliphs which ended ~660 A.D., the real historical basis of their movement in terms of an idealized period which is not mythological is that of the early Abbasids, after 750 A.D., and especially 800 A.D. And it is under the Abbasids that the motor engine of Islamic civilization shifted to the east, to Khorasan, the source of the armies which fueled their initial victories, and later of the soldiers and intellectuals who solidified their regime. Though Baghdad was the capital of the Abbasid Caliphate, the tendrils of influence and power always led back to the east so long as the polity was vigorous.

These extremist Islamic sects and movements always seem to deal in mythology and the legends of their own past. Though much of the fabric of their reality is fiction, there is often a thin scaffold of historical basis which serves as a skeleton around the narrative. I am not sure how critical it is to understand this scaffold, but it probably wouldn’t hurt. To some extent these radicals seem to speak in an inadvertent code, in that Western audiences as totally lacking in the historical consciousness that is necessary to properly interpret and comprehend considered and conscious semantic choices.

The phenotypic and genotypic

mexicanThe image to the left is the ‘average’ face of a Mexican woman as generated by the University of Glasgow Face Research Lab. Aside from the fact that the face is prettier than the typical human because of the well known tendency of averaging facial features removing unattractive asymmetry it is racially what you might expect, a synthesis of an Amerindian and European face, with an Amerindian skew. But a phenotypic average only tells you so much. Variation is one of the key ingredients in evolutionary processes, and by getting a sense of a population’s variation you can infer things about its past and possible future history. For example, if that variation is heritable, then it is amenable raw material for adaptation. In contrast, if the variation is due to environmental parameters then it is not going to be appropriate input for adaptation via natural selection. In a nation like Mexico we see the full range, from ‘typical’ Amerindian phenotype, to someone who looks to be fully European (with a small minority with visible African ancestry).

But if the phenotype is heritable, then underlying this variation is genotype. The extent that genotype controls the variation is contingent upon heritability. The heritability of behavioral phenotypes is often around ~0.5. But for physical traits such as height or pigmentation the heritability is much closer to 1, on the order of ~0.8 to ~0.9. That means 80 to 90 percent of the variation of the trait across the population is due to variation in the genes. When we code someone as “Amerindian” or “European” or “African” we are assessing phenotypes with a strong underlying genotypic component. A new study in PLOS GENETICS outlines just how this plays out in Latin America, a region of the world which has the virtue of being a living experiment in admixture between different geographic races over the past 500 years.

Admixture in Latin America: Geographic Structure, Phenotypic Diversity and Self-Perception of Ancestry Based on 7,342 Individuals:

The current genetic makeup of Latin America has been shaped by a history of extensive admixture between Africans, Europeans and Native Americans, a process taking place within the context of extensive geographic and social stratification. We estimated individual ancestry proportions in a sample of 7,342 subjects ascertained in five countries (Brazil, Chile, Colombia, México and Perú). These individuals were also characterized for a range of physical appearance traits and for self-perception of ancestry. The geographic distribution of admixture proportions in this sample reveals extensive population structure, illustrating the continuing impact of demographic history on the genetic diversity of Latin America. Significant ancestry effects were detected for most phenotypes studied. However, ancestry generally explains only a modest proportion of total phenotypic variation. Genetically estimated and self-perceived ancestry correlate significantly, but certain physical attributes have a strong impact on self-perception and bias self-perception of ancestry relative to genetically estimated ancestry.

The phylogeographic aspect of this paper is not too interesting to me, as it confirms what we’ve known (e.g., more Amerindian ancestry in northern Brazil, Mexicans are somewhat more Amerindian than they are European, etc.). Rather, the biggest findings are those which relate physical appearance, self-identity, and genetic ancestry. In Europe someone who identifies as “white” is invariably ~99% European when assessed using a genetic method (the ~1% balance is often from Iberia). More precisely, white Europeans are ~99% West Eurasian, since a non-trivial amount of trans-Mediterranean gene flow has occurred, meaning there isn’t a clear boundary between Europe and nearby regions. Similarly, in Sub-Saharan Africa someone who identifies as “black” is likely to be nearly all Sub-Saharan African. This is often not the case in Latin America. That is, those who identify as “white” or “black” often have substantial admixture from other geographic racial groups.

One of the major drawbacks of this study is that it relies on 30 ancestrally informative markers (AIMs). Though this is acceptable in forensics, some of the ancestry inferences made on an individual basis are a touch less accurate than they would be on a dense marker SNP chip (e.g., the 650,000 SNPs used on the HGDP). The modest correlations here are probably a little lower than they would be if the ancestry was more accurately adduced. But in the broad sketch the conclusions are likely defensible. One result which may surprise then is the very modest correlation between physical traits and ancestry. Here’s the quote from the paper:

Regression of phenotypic variation on genetic ancestry (taking Native American as reference) demonstrates a significant effect for most of the traits examined (p-value <10−3 using a conservative Bonferroni multiple testing correction, Table 2). Among the non-facial phenotypes (accounting for sex, country, age, educational attainment and wealth) higher European ancestry is associated with: increased height, lighter pigmentation (of hair, skin and eyes) (Figure S6), greater hair curliness and male pattern baldness. Hair graying approaches statistical significance (p-value 10−2). Higher African ancestry is associated with: increased height, higher skin pigmentation and greater hair curliness. The proportion of phenotypic variance explained by ancestry is highest for skin pigmentation (19%) followed by hair shape (8%) and color of eyes and hair (4% and 5%, respectively) but at most 1% for the other phenotypes.

As I said it could be that the AIMs aren’t quite as accurate as they should be, and are underestimating the ancestral fractions on the individuals at the extremes (e.g., someone who is 100% European is estimated to be 95% European, because the marker set lacks precision). So you might bump up the proportion of variance explained a bit, but likely this still seems way too low to you intuitively. There are a few things going on here. First, skin color is controlled between populations by a relatively small set of genetic loci. This means that in admixed populations the sample variance, the random draw of genotypes across the loci, is going to vary a lot even in individuals with the same ancestry. Because of the relatively small number of large effect loci skin color is a trait which shows a lot of variation within families where ancestry is geographically diverse. And within families, or at least across full siblings, total ancestry is not going to vary that much. Second, for some of the “traits” in question that are being measured there is just a lot of variation within geographic races. It makes sense that ancestry would explain only a small fraction within this pooled data set. And yet people can recognize a set of features which are clearly European or Amerindian or African. I think the answer here is that you are picking up on correlation structure across the traits. A suite of subtle facial contours for example connote “European” in a Gestalt manner, even if quantitatively each contour trait has a lot of variation within a population and overlaps across them.

Where this all “cashes out” though is in the intersection of the sociocultural and biological. Within the paper itself they observe a few trends which would not be surprising. Skin color and hair form are very salient characteristics, and lead individuals to shift their estimates of their own ancestries. Those with lighter skin tend to overestimate their European ancestry fractions, while those with curlier hair overestimate their African ancestry. These are traits which have the characteristics that they are quite ancestrally informative to particular geographic races, and, very visible (unlike, say, Duffy status). Within these data there are also particular patterns which are intriguing and less obvious; those with low amounts of Amerindian ancestry underestimate the fraction, while those with higher levels overestimate it. The details of these patterns are obviously contextual in terms of time and place (e.g., in Henry Louis Gates Jr.’s genealogy specials many celebrities seem to yearn for exotic lineages, which would not be the case in past decades). What is more interesting is that fine grain patterns of variation in genetic ancestry and how they deviate from perceived ancestry can finally allow social scientists to get a better grip on patterns of discrimination (or lack thereof). It is not entirely uncommon in Latin America for full siblings to sometimes be socially perceived to be different races because of the random segregation of salient characteristics. In the aggregate these sorts of cases would allow one to estimate the effect of social perceptions, slights, or advantages. With the genetic dimension one could also ascertain the possibility of group differences, because many subtle characteristics are going to track genome-wide patterns, rather than a few phenotypes which society privileges when sorting people by geographic origin.

Polygyny – maybe it’s agriculture

Citation: Lippold, Sebastian, et al. "Human paternal and maternal demographic histories: 4 insights from high-resolution Y chromosome and mtDNA sequences 5." Methods 1 (2014): 2.

Citation: Lippold, Sebastian, et al. “Human paternal and maternal demographic histories: 4 insights from high-resolution Y chromosome and mtDNA sequences 5.” Methods 1 (2014): 2.

Alexander Kim has already responded in depth to a new paper in Investigative Genetics, Human paternal and maternal demographic histories: insights from high-resolution Y chromosome and mtDNA sequences:

We identified 2,228 SNPs in the NRY sequences and 2,163 SNPs in the mtDNA sequences. Our results confirm the controversial assertion that genetic differences between human populations on a global scale are bigger for the NRY than for mtDNA, although the differences are not as large as previously suggested. More importantly, we find substantial regional variation in patterns of mtDNA versus NRY variation. Model-based simulations indicate very small ancestral effective population sizes (<100) for the out-of-Africa migration as well as for many human populations. We also find that the ratio of female effective population size to male effective population size (Nf/Nm) has been greater than one throughout the history of modern humans, and has recently increased due to faster growth in Nf than Nm.

The NRY and mtDNA sequences provide new insights into the paternal and maternal histories of human populations, and the methods we introduce here should be widely applicable for further such studies.

Comparing male and female demographic histories can be a mug’s game. But if one is appropriately cautious some insight can be gained, and in this paper the authors are appropriately cautious. It isn’t surprising that female effective population sizes are somewhat larger over the long term and across deep history than male ones for our lineage. We’re a mildly sexually dimorphic species, suggestive of possible mild polygyny at best, on average. In other words, males compete, but not that much. Far more interesting to me is what Alexander Kim keys in on:

Among the most interesting inferences is Holocene crash in male Ne, with no clear reflection on the mitochondrial side of things, everywhere but Oceania and America — most dramatically in the Middle East/North Africa:

Not from the Pleistocene

Not from the Pleistocene

As a speculative matter this might reflect the rise of “super-male” lineages that arose with agriculture and mass society. In other words, extreme levels of polygyny are a novel cultural evolution, which could only emerge with the level of stratification and power accumulation in patrilineages enabled by agricultural, or agro-pastoral, societies. Hyper-polgyny might also be correlated with the extreme mate guarding and sexual jealousy which is the norm among many Eurasian societies. The implication here is that many of the “regressive” social practices we associate with “traditional” Eurasian societies are simply cultural retrofits to adapt to new social circumstances enabled by mass society. Liberal individualism as an ethos may not be a novel innovation, as much as the emergence of long submerged instincts which evolved when collective institutions and interests were far weaker as forces in our day to day decision making.

Epigenetics, the glory and the hype

Epigenetics is real. But it doesn’t change everything. That needs to be said, because people seem to get the impression that everything is changed. In Trends in Genetics, Serving Epigenetics Before Its Time:

Society prizes the rapid translation of basic biological science into ways to prevent human illness. However, the premature rush to take murine epigenetic findings in these directions makes impossible demands on prospective parents and triggers serious social and ethical questions.

In their efforts to anticipate the eventual human applications of emerging areas of science, scholars of the ethical, legal, and social implications of genetics and genomics sometimes become too speculative to engage the immediate concerns of active scientists and policymakers. However, although evidence-based applications of human epigenetics may emerge in the future, premature epigenetic risk messaging is already here and its content and impact must be understood. The messages in circulation raise ethical and social concerns regardless of whether human epigenetic studies eventually confirm the murine results. Because the prospect for any successful human translation of epigenetic research depends as much on the management of these issues as on further human studies, they deserve close attention by all involved in their design, dissemination, and public consumption.

(the link is ungated)

Dissenting from American liberalism & conservatism

854270_w185I had a long discussion yesterday with an individual who has been reading me since 2003. We talked about lots of things. One issue which perhaps I need to reiterate because it’s implicit is that I dissent to a great extent from the premises which underlay both American conservatism and liberalism. Like American liberals I think the life outcomes of many Americans are not due to their choices simply understood. Rather they are the outcome of chance events, whether it be through social background, or, simple happenstance. Years ago I recall Nassim Taleb complaining that people would read The Millionaire Next Door, and believe that by doing everything those individuals did they too could become millionaires, as if there was no random component to such outcomes. The reality is that some people are in the right place and right time. And, some people are born in the right social positions.

Where I dissent from American liberals is the idea that all of the outcomes in our society, in particular inequality, are due to chance or inherited social position (e.g., race or class privilege). In The Son Also Rises Greg Clark reports on intriguing results which indicate that social competence in heritable. To some extent this is common sense. Personal dispositions are heritable, and some dispositions are more congenial to remunerative activities than others. Though many on the Left (though not all) are willing to acknowledge the arguments in Steve Pinker’s The Blank Slate in the abstract, in the concrete they get very little weight when it comes to social policy. To give an example, for many on the Left we can talk about differences between groups (whether it be cultural or biological) only when all social inequality is abolished. The catch in this though is that any persistent differences may also result in persistent social inequality or difference in outcome.

The_Blank_SlateWhen it comes to the American Right there are two distinct strands. The first is the child of classical liberalism, to some extent in a more thorough fashion than the American Left. For this element the pidea that capitalism is efficient in allocating resources, and that people receive their just desserts due to hard work, becomes such an all-encompassing narrative that other variables are neglected. This was clearly evident in 2008 when some conservative libertarians kept harping on the “free market” mantra because they literally had no other playbook. I recall specifically someone from the American Enterprise Institute on the radio arguing that bankers should keep their bonuses because that’s how capitalism works, even after the bailouts. When confronted by this he really had no response. He was literally dumbfounded. It is as if the market was the ends of the American political system, and all wealth is the product of the market. Though not as constitutionally hostile to the idea of heritable differences this sort of free market conservatism is not comfortable with the idea that not everyone is born with the same opportunities. The reality is that the liberal Left critique of the nature of the outcomes of a free market is correct in some deep sense, even deeper than American liberals may wish to acknowledge. Some people are born with the genetic deck stacked against them, not just the social one (and of course, as noted above there is a lot of random noise). That undermines some of the moral case for the virtue of the market, since it is not blindly arbitrating the outcomes of our choices, as opposed as sifting based on the accumulated weight of inherited history, some of which is due to the genetic lottery.

51RtznSRTALThe second strand in American conservatism is that of the Religious Right. The problem that it has is most clearly illustrated by the issue of gay rights. Though logically toleration of homosexual behavior and its innate or non-innate nature are not related, the Religious Right prefers that homosexuality be a choice for the purposes of moral censure. That is because though these Christians believe in original sin, they seem to espouse a sort of moral perfectionism where all men are equally endowed with the same sentiments and preferences (those sentiments being debased by Satan or the Satanic influence of culture). As opposed to Homo economicus, these Christians believe in Homo christianus. Though I personally espouse the bourgeois virtues of the Religious Right, their neglect of human diversity in disposition and sentiment leads us down the path of great disappointment, as many will miss the mark. A Religious Right which focused more on social cohesion in a general and collective sense, rather than personal and individual moral perfectionism, probably could produce better results (yes, it does take a village!). But the American radical Protestant model is fundamentally individualistic, and treats each human as equal and similar before Christ. And there I believe is the folly with moral crusades which attempt to turn every American family into the same American family. Such a world never was, and such a world will never be.

The Left looks to the perfect future which could be. The Right looks to the perfect past which was, and could be.

Dissenting from American liberalism and conservatism

I had a long discussion yesterday with an individual who has been reading me since 2003. We talked about lots of things. One issue which perhaps I need to reiterate because it’s implicit is that I dissent to a great extent from the premises which underlay both American conservatism and liberalism. Like American liberals I think the life outcomes of many Americans are not due to their choices simply understood. Rather they are the outcome of chance events, whether it be through social background, or, simple happenstance. Years ago I recall Nassim Taleb complaining that people would read The Millionaire Next Door, and believe that by doing everything those individuals did they too could become millionaires, as if there was no random component to such outcomes. The reality is that some people are in the right place and right time. And, some people are born in the right social positions.

Where I disagree from American liberals is the idea that all of the outcomes in our society, in particular inequality, are due to chance or inherited social position (e.g., race or class privilege). In The Son Also Rises Greg Clark reports on intriguing results which indicate that social competence in heritable. To some extent this is common sense. Personal dispositions are heritable, and some dispositions are more congenial to remunerative activities than others. Though many on the Left (though not all) are willing to acknowledge the arguments in Steve Pinker’s The Blank Slate in the abstract, in the concrete they get very little weight when it comes to social policy. To give an example, for many on the Left we can talk about differences between groups (whether it be cultural or biological) only when all social inequality is abolished. The catch in this though is that any persistent differences may also result in persistent social inequality.

When it comes to the American Right there are two distinct strands. The first is that it too is the child of classical liberalism, to some extent in a more thorough fashion. So the idea that capitalism is efficient, and that people receive their just desserts due to hard work, becomes such an all-encompassing narrative that other variables are neglected. This was clearly evident in 2008 when some conservative libertarians kept harping on the “free market” mantra because they literally had no other playbook. I recall specifically someone form the American Enterprise Institute arguing that bankers should keep their bonuses because that’s how capitalism works, even after the bailouts. When confronted by this he really had no response. It is as if the market was the ends of the American political system, and not the other way around. Though not as constitutionally hostile to the idea of heritable differences this sort of free market conservatism is not comfortable with the idea that not everyone is born with the same opportunities. The reality is that the liberal Left critique of the nature of the outcomes of a free market is correct in some deep sense, even deeper than American liberals may wish to acknowledge. Some people are born with the genetic deck stacked against them, not just the social one (and of course, as noted above there is a lot of random noise). That undermines some of the moral case for the virtue of the market, since it is not blindly arbitrating the outcomes of our choices, as opposed as sifting based on the accumulated weight of inherited history.

The second strand in American conservatism is that of the Religious Right. The problem that it has is most clearly illustrated by the issue of gay rights. Though logically toleration of homosexual behavior and its innate or non-innate nature are not related, the Religious Right prefers that homosexuality be a choice for the purposes of moral censure. That is because though these Christians believe in original sin, they seem to espouse a sort of moral perfectionism where all men are equally endowed with the same sentiments and preferences. As opposed to Homo economicus, these Christians believe in Homo christianus. Though in many ways I personally espouse the bourgeois virtues of the Religious Right, their neglect of human diversity in disposition and sentiment leads us down the path of great disappointment, as many will miss the mark. A Religious Right which focused more on social cohesion in a general and collective sense, rather than personal and individual moral perfectionism, probably could produce better results (yes, it does take a village!). But the American radical Protestant model is fundamentally individualistic, and treats each human as equal and similar before Christ. And there I believe is the folly with moral crusades which attempt to turn every American family into the same American family. Such a world never was, and such a world will never be.

The Left looks to the perfect future which could be. The Right looks to the perfect past which was, and could be.

Against contingency (in yeast traits)

Quanta Magazine has a piece up audaciously titled Evolution’s Random Paths Lead to One Place. It’s basically a review of the research published in the paper Global epistasis makes adaptation predictable despite sequence-level stochasticity. There’s a lot packed into the title. Here’s the important bit from Quanta:

Many biologists argue that it would not, that chance mutations early in the evolutionary journey of a species will profoundly influence its fate. “If you replay the tape of life, you might have one initial mutation that takes you in a totally different direction,” Desai said, paraphrasing an idea first put forth by the biologist Stephen Jay Gould in the 1980s.

The findings also suggest a disconnect between evolution at the genetic level and at the level of the whole organism. Genetic mutations occur mostly at random, yet the sum of these aimless changes somehow creates a predictable pattern. The distinction could prove valuable, as much genetics research has focused on the impact of mutations in individual genes. For example, researchers often ask how a single mutation might affect a microbe’s tolerance for toxins, or a human’s risk for a disease. But if Desai’s findings hold true in other organisms, they could suggest that it’s equally important to examine how large numbers of individual genetic changes work in concert over time.

There’s been a vogue of late for attacking the utility of mouse genetics for medical research. Perhaps studying flies, yeast, and bacteria to understand evolution is also misguided? Interesting research in any case.

The Islamic States

"Chop-chop square" in Riyadh

“Chop-chop square” in Riyadh

Like Joe Young, the Mormon missionary who becomes involved in the porn industry in the late 90s film Orgazmo, our involvement in the Mid-East is probably going to result in the violation of our purity (yes, that’s only in our self-conception as a nation; we’re mostly definitely only born-again virgins, not the real deal). It’s hard to read anything about the Free Syrian Army which portrays it as anything but hapless, disorganized, if often well meaning and milquetoast (well, when they’re not allying with the Nusra Front and being nasty to Alawites and Christians who support the regime which has been nasty to them). And of course this edition of the coalition of the willing involves our stalwart Western-leaning allies, Bahrain, Jordan, Qatar, Saudi Arabia and the United Arab Emirates. Yes, Bahrain, the sectarian regime dominated by a religious minority which suppresses the majority with foreign forces. Qatar, the Islamists’ number one ally in the Muslim world. The UAE, which is home to the dystopian techno-oligarchy that is Dubai, a glittering vision of the apotheosis of slavery coexistent with post-modernity. And of course there is Saudi Arabia, the only nation in the world which regularly decapitates individuals for capital crimes. Well, except of course the Islamic State if you count that as a state!

I won’t belabor the point. Let’s remember that the Saudi monarchy is quite notably medieval in its practices and institutional arrangement (it abolished slavery in the 1962). Our enemies, Iran, and the Syrian regime, are actually much closer to modernity as we’d understand it using the Saudis as the extreme case. As it is we have to ignore this because the Saudis are our bastards, neo-feudal creeps though they may be. And we’re trusting them to help train the Free Syrian Army? Of the 19 9/11 hijackers 15 were Saudi (a further two were from the UAE, our ally). This is not going to end well. We can’t admit that we’re helping the regime of Bashar al-Assad. Yes, he’s a murdering bastard, but he’s not our bastard.

The Islamic State is a nasty piece of work. Unlike Saddam Hussein’s late lamented dictatorship it also has the ability and ambitions to spread its tentacles of nastiness across the region right now. I won’t shed any tears over the pounding Raqqa is receiving from American cruise missiles. But let’s be clear that almost certainly this is going to benefit our Iranian enemies, as well as Hezbollah. Additionally, the Saudis and their Gulf allies will probably attempt to reshape the Sunni insurgency in their own image, which is not one which we in the West would term “moderate,” let alone free. Let’s go into this with eyes open, and acknowledge that it’s a choice between a bad option, and a worse option.

Shorter: America is on the side of the less evil guys. Go America! Also see this cri du coeur, The Barbarians Within Our Gates.

The end of the middle class era

51TpAsC5UlL._SY346_FiveThirtyEight and New York magazine have pieces which look at the prosperity which was the norm in the second half of the 1990s with a soft glow. I was not in the labor market back then, but I recall the excitement, and just how easy it was to get a job for those who wanted one. It seems that these articles reinforce the basic thesis of Tyler Cowen’s Average is Over, looking back to the last golden age for the middle class.

But there’s another angle on this. Would you go back to 1999 if you had twice the income you did today? Probably that depends on your income. But imagine going back to the technology of that era. For many of us the world would be a sluggish and gray place. Cowen would point out that this is not the typical person. But it’s a lot of us, and it isn’t as if smartphones and their various features are low penetration technologies in terms of cultural ubiquity. So a simple apples to apples comparison of income is somewhat misleading. Because of rising demand from China I’m not sure that decreasing expenditures on food is sustainable (though perhaps better GMO would help in that domain). But it does seem to me that public policy is what’s keeping housing prices high, in particular in Blue State urban areas. Make the libertarians happy by getting ridding of rent control. Make the conservatives happy by allowing more sprawl. Make the new urbanists happy by allowing more vertical residential housing.

I think the bigger issue here is not economic, but social. Economic productivity is such that a guaranteed minimum income is probably viable for society. A minority may work so that the majority may eat and recreate. But human psychology is such that it seems implausible that a democratic citizenry can be maintained by passive consumption by those who themselves are not economically productive. Many of these discussions about the passing of the middle class society focus on economic well being, or lack thereof. I don’t think that’s the major issue at all, because economic productivity will continue to increase, at least on the margins, and population growth outside Africa has tailed off. Rather, the larger change will be cultural and social. Even in antiquity when societies were highly stratified the great thinkers understood that social well being rested upon the broad shoulders of the free peasantry, who were the ultimate source of most economic activity. This is very different from the model of stratified societies of the future, where both power and productivity will be concentrated nearer the top of the status distribution.

ASHG 2014 meeting poster abstracts of note

Obviously colored by my interests.

Rapid radiation of common Eurasian Y chromosome haplogroups occurred significantly later than the out of Africa migration. M. Järve, International Consortium of the Estonian Centre for Genomics Estonian Biocentre and Department of Evolutionary Biology, University of Tartu, Tartu, Estonia.

   Human genetic diversity outside Africa is low, which is commonly ascribed to a recent out of Africa bottleneck and subsequent rapid colonization of the rest of the world. Previous studies of the male-specific Y chromosome have shown that haplogroups common throughout non-African populations all coalesce to a small number of shared ancestral lineages, the branching order of which is only partly understood. Using 475 high coverage whole Y chromosome sequences, including 317 newly reported here, we selected reliable regions within the Y chromosome based on coverage analysis, mappability and sequence class. Based on these data, we refined the Y chromosome haplogroup tree, applying phylogenetic methods to establish the branching order and temporal dynamics of splits in non-African Y chromosome haplogroups. Compared to the length of the branches that separate African and non-African diversity, the internal branches distinguishing continental and sub-continental differences outside Africa are generally short, consistent with the model of a rapid initial colonization of Eurasia and Oceania. Following the split between African and non-African haplogroups [90 KYA (95% CI: 87-94 KYA)], the differentiation of South and Southeast Asian haplogroups H, S, M, and C did not begin until around 43 KYA, and haplogroups N and R, widely spread among Northeast and Northwest Eurasian populations, started to diversify significantly later [17 KYA (95% CI: 16-19 KYA) and 26 KYA (95% CI: 25-28 KYA), respectively]. Many major phylogenetic groups in different geographic regions seem to originate from the period around 50 KYA.

Use of Long-read-sequence Aided Phasing for Inference of Ancestry Assignment in Admixed Populations. F. L. Mendez1, S. S. Shringarpure1, A. Moreno1, E. R. Martin2, M. L. Cuccaro2, C. D. Bustamante1 1) Department of Genetics, Stanford University, Stanford, CA; 2) Center for Genetic Epidemiology and Statistical Genetics, University of Miami, Miami, FL.

   Correct phase reconstruction of individual chromosomes is important for numerous genetic analyses, including inferring demographic parameters in admixture processes. Admixture is the result of interbreeding of previously differentiated populations. The chromosomes of admixed individuals are composed of segments that can be traced individually to one of the ancestral populations. The abundance and length distributions of these chromosomal segments provide crucial information on the admixture process; however, the correct inference of their length and ancestry requires phasing data from the admixed individuals. Phase reconstruction can be performed applying the rules of Mendelian segregation of variants or using statistical methods that rely on population data. Alternatively, molecular phasing (the observation of different polymorphisms in the same chromosomal sequence) provides direct evidence of phase. In this fashion, methods of molecular-phasing, like long-read sequencing, may be used to extend the range of confident phasing. We simulate long-read sequence data and explore improvement brought by molecular phasing on accuracy of ancestry assignment of chromosomal segments, in definition of segment boundaries, and in overall admixture inference. We then use genotype information from 5 trios of Latino individuals, including 10 individuals with long-read sequence information, together with three reference panels of haplotypes of European, African, and Native American ancestry to evaluate the effect of molecular phasing on inference of the admixture process.

Linear Mixed Model-Based Admixture Mapping. L. Brown, T. Thornton Biostatistics, University of Washington, Seattle, WA.

   Genetic studies in recently admixed populations can provide invaluable insight into novel risk factors contributing to disease. Population admixture results in combined genomes from previously isolated ancestral populations that may have discernible allele frequency differences due to natural selection and genetic drift. Genes that underlie ethnic differences in traits and that show differential risk by ancestry can be identified using admixture mapping. Compared to studies carried out in more ethnically homogenous populations, admixture mapping has potentially greater power to detect certain genetic variants. Linear mixed models have gained traction as a tool for genome-wide association studies. Mixed model methods have been shown to protect against spurious associations in structured samples, a common pitfall in genetic association studies, by directly accounting for sources of dependence including cryptic relatedness and population stratification. We present a linear mixed model approach for admixture mapping in the presence of population structure and hidden relatedness. We implement this method using local ancestry estimates based on genome-wide SNP data. We apply the method to analyze genetic associations with white blood cell count and C-reactive protein phenotypes in the African American cohort of the Women’s Health Initiative study. We demonstrate that our proposed linear mixed model method for admixture mapping provides a substantial improvement over widely used admixture mapping approaches.

Genetic evidence of archaic admixture in India. A. Basu, D. Das, S. Das, N. Biswas National Institute of BioMedical Genomics, Kalyani, India.

   Comparing high-coverage Denisova and Neanderthal whole-genome sequences has revealed significant admixture with all present day non-African populations. Microblade tool usage from central-India has been reported, yet no genetic-study examined archaic admixture in present day South-Asians. We report the first evidence of archaic admixture from whole genome sequence data of 4 present-day Indians. Four individuals, who are at the extremities of a two-dimensional Principal-Component plot summarizing the extant of genomic variance in 237 Indians belonging to 20 linguistically, ethnically diverse populations sampled from different geographic locations in India; were sequenced. Their population identities are Onge, Jamatia, Panniya and Birhore. All individuals showed slight excess of Denisova admixture (D-Statistic 1.6-1.99) when compared with Eurasians, with admixture evidence increasing from Jamatia(1.6) to Onge(1.99). Similar pattern was also observed when compared with Neanderthal. Our findings show evidence of archaic admixture in different present-day populations, not restricted to proximity of archaeological evidence, indicating wide spread admixture.

Inferring patterns of demography and assortative mating in the Thousand Genomes Project admixed populations from the Americas. E. E. Kenny1,5,6,7, C. Gignoux2, S. Baharian3, S. Musharoff2, B. Maples2, S. Shingarpure2, A. Auton4, C. D. Bustamante2, S. Gravel3, A. R. Martin2, The 1000 Genomes Consortium 1) Department of Genetics, Icahn School of Medicine Mount Sinai, New York, NY; 2) Department of Genetics, School of Medicine, Stanford University, California, CA; 3) Department of Human Genetics, McGill University, Montreal, Canada; 4) Department of Genetics,Albert Einstein College of Medicine, New York, NY; 5) Institute of Personalized Medicine, Icahn School of Medicine Mount Sinai, New York, NY; 6) Center of Statistical Genetics, Icahn School of Medicine Mount Sinai, New York, NY; 7) Institute of Genomics and Multiscale Biology, Icahn School of Medicine Mount Sinai, New York, NY.

   The phase 3 release of the 1000 Genomes project includes genotype and sequence data for 2,535 individuals from 26 populations around the globe. These include six populations from the Americas with mixed Native American, European and West African ancestry. We have identified admixture proportions in these six populations, which, include African Caribbean’s from Barbados (ACB), African American’s from south-west USA (ASW), Colombians from Medellin (CLM), Peruvians from Lima (PEL), Mexican-American from Los Angeles (MXL), and Puerto Ricans from Puerto Rico (PUR). We show presence of Native American, European and African ancestry in all six populations, in particular, we identify six ASW individuals with > 20% Native American ancestry. The European component of these individuals looks most similar to Nordic ancestry, rather than Spanish ancestry often seen in Hispanic/Latino individuals. Among the ACB, PEL, PUR, CLM and MXL populations, we find an excess of Native American and dearth of European ancestry on chromosome X compared to the autosomes, indicating a history of non-random mating in these populations. We have also inferred local ancestry tracts (LAT), identifying haplotype specific segments of ancestry across chromosomes. We assessed the accuracy of our tract calls and demonstrated accuracies >0.99, >0.98 and >0.97 in African, European and Native American tracts across all populations. By modeling the distribution of ancestral tract lengths, we inferred the timings of migration in the two populations from the America’s that are new to phase 3, ACB and PEL. We estimated the PEL have had more recent admixture with European and African individuals than other Hispanic/Latino groups in the Caribbean and throughout northern South America, consistent with known migration patterns. These analyses have given us an insight into the demographic history and migration patterns among admixed populations in the 1000 Genomes Project.


Sub-continental local ancestry inference in U.S. individual. B. K. Maples1,2, J. K. Byrnes3, J. M. Granka3, K. Noto3, S. Shringarpure1, M. L. Carpenter1, M. J. Barber3, R. E. Curtis4, N. M. Myres4, C. A. Ball3, K. G. Chahine4, C. D. Bustamante1 1) Genetics, Stanford University, Stanford, CA; 2) Biomedical Informatics, Stanford University, Stanford, CA; 3) AncestryDNA L.L.C., San Francisco, CA; 4) AncestryDNA L.L.C., Provo, UT.

   The United States was populated through a sequence of migratory waves including immigrants from numerous distinct source populations. This “melting pot” process has led to the majority of current U.S. residents being genetically admixed. Understanding this complex genetic diversity is of great interest to the field of population genetics and accounting for it is critical for medical genetics. Numerous methods have been developed for performing local ancestry inference (LAI) in which the ancestry of each genomic locus is estimated, but the majority of these methods are only accurate at the level of continental admixture (e.g. African Americans with ancestry from Africa and Europe). Sub-continental LAI is often more difficult as neighboring populations typically have reduced differentiation. In this study we apply the LAI method RFMix that has been shown to perform well at a sub-continental level (e.g. mixtures of Northern and Southern European ancestry). RFMix is seeded with a reference panel of samples with known origins, and then iteratively learns from a larger collection of test samples. The performance of this method greatly improves with larger reference panel and test sample sizes. Here we use more than 2,000 single-origin reference samples from Ancestry.com and 1000 Genomes, along with over 100,000 research-consented customer samples with admixed origins to train the model to perform inference on individuals with admixed European ancestry. We compare the genome-wide ancestry estimates from RFMix with pedigrees. Using pedigree data as a truth set, we tune the performance of RFMix. We then compare the performance of RFMix with results from the commonly used ancestry estimation method ADMIXTURE run in supervised mode with the same initial single-origin reference panel provided to RFMix. We also use single-origin samples to create synthetic admixed samples with known local ancestry patterns to assess the accuracy of RFMix to call individual segments in admixed Europeans. Finally, we apply the highly trained version of RFMix to the National Institute on Aging’s Health and Retirement Study data. We compare county-level geographic summaries of sub-continental ancestry estimates in this data to recent U.S. census data. We find strong evidence of fine-scale population structure with certain localities showing enrichment for particular ancestries (e.g. Irish ancestry in and around Boston and Scandinavian ancestry in the Midwestern states).

Inference of the demographic history of Japan using Approximate Bayesian Computation. C. D. Quinto1, K. R. Veeramah2, A. E. Woerner3, M. F. Hammer3 1) Graduate Interdisciplinary Program in Genetics, University of Arizona, Tucson, Arizona, USA; 2) Department of Ecology and Evolution, Stony Brook University, Stony Brook, NY, USA; 3) Arizona Research Laboratories, Division of Biotechnology, University of Arizona, Tucson, Arizona, USA.

   The genetic exchange between differentiated populations, termed admixture, has increasingly been shown to be an important process in human history. The formation of Hispanic populations in the Americas is one of the best-known examples of this phenomenon. Another important but less well-known example is the origin of modern Japanese. At least two distinct incoming migrations are known to have occurred during the prehistory of Japan. The first took place at least 10,000 years ago and established the Jōmon culture, which was characterized by a semi-sedentary hunter-gatherer way of life and one of the earliest uses of ceramics. Then, around 2,300 years ago, a second migration to the archipelago brought rice agriculture and iron, and established the Yayoi culture. The mixture of the people belonging to these two cultures is believed to have formed the ancestors of the modern Japanese population. Although archaeological records provide information about the time of arrival of the Yayoi people to Japan, the dynamics of the admixture process are still unclear. Previous genetic studies, focusing on mitochondrial DNA and the Y chromosome, have supported an admixture model for the origin of the modern Japanese population. While genome-wide data have being used to investigate this question, there are currently no studies that infer the parameters describing the dynamics of the admixture process. Part of the reason for this is that explicit population genetic modeling is problematic when utilizing genome-wide arrays because of the underlying ascertainment bias in the choice of SNPs. To address these issues, we genotype 500,000 SNPs in 282 samples from populations across the Japanese archipelago and East Asia. We then attempt to correct for the ascertainment bias by using whole genome sequencing data to approximate the discovery sample used to ascertain SNPs. We utilize the SNP genotypes from the different populations to identify ancestry blocks in Japanese samples. The distribution of these blocks provides insights about the time and proportion of admixture, and this information is used in an Approximate Bayesian Computation analysis to infer other key demographic parameters such as divergence times, migration rates and population sizes.

A Fine-Scale Comparative Analysis of Population Structure, Divergence and Admixture in in Han Chinese, Japanese and Korean Populations. S. Xu1, Y. Wang1, D. Lu1, Y. Chung2 1) Population Genomics, CAS-MPG Partner Institute for Computational Biology, Shanghai, Shanghai, China; 2) Integrated Research Center for Genome Polymorphism, Department of Microbiology, The Catholic University Medical College, Socho-gu Seoul 137-701, Korea.

   n East Asia, human origins and dispersals remain poorly understood and debatable. As the major ethnicities of East Asia, Han Chinese, Japanese and Korean people share many similarities in characteristics, language and culture. However, the genetic relationships, divergence times and subsequent gene flow among the three populations have not been well studied or quantitatively estimated. Here, we conducted a genome-wide study using over 900,000 single nucleotide polymorphisms (SNPs) and evaluated the population structure of 182 unrelated Han Chinese, 90 Japanese and 100 Korean individuals, and compared with 663 individuals representing 8 world-wide populations. Our analysis revealed that Han Chinese, Japanese and Korean populations have distinct genetic makeup and can be well distinguished based on the genome wide data, or a panel of ancestry informative markers (AIMs) screened from genome-wide SNPs, indicating they have been isolated for substantially long time. Interestingly, population structure is perfectly corresponding to the geographical distribution of the three populations, indicating geography was an important factor resulted in population differentiation. We identified a cline of north/south admixture, which is consistent with either a scenario of isolation by distance (IBD) or that of north/south migrations or both. We theorized that both IBD effect and migrations could have resulted in such a pattern. On the other hand, our analysis revealed patterns of admixture which occurred after initial splits of populations. We further estimated gene flows among the three populations. We concluded that the genetic structure of the present-day Han Chinese, Japanese and Korean people was shaped jointly by common origin, subsequent gene flow and local adaptation. Our results advance the understanding of the genetic relationship and population history in East Asia.

Analysis of autosomal and Y-chromosomal DNA Suggests West Asian Population Derivation from Northern Middle Eastern Populations in the post-Glacial Period. P. Zalloua1,2, F. Utro3, M. Haber1, L. Parida3, E. Matisoo-Smith34, D. Platt3 1) Genomics Laboratory, Gradute School, Beirut, Lebanon; 2) Harvard School of Public Health, Boston, MA, USA; 3) I.B.M. T. J. Watson Research Center, Yorktown Hgts, NY; 4) University of Otago, Dunedin 9054, New Zealand.

   Analysis of Y DNA J and E haplogroups in West Asians (Georgians, Armenians, Turks, Syrians, Lebanese, Jordanians, Saudi Arabians, Yemenis, Kuwaitis) suggest expansions coming primarily from the north (Turkey, Georgia, Armenia), with an early differentiation between those who headed south along the Tigris-Euphrates, versus those who headed south along the Levantine coast. We sought to resolve whether southern variations represented evolution within separate ice age refugia, or evolved from the same northern refugia as suggested by Y chromosome data by revealing population divergence times between Saudi Arabia and Yemen versus Turkey, Syria, and Armenia that predate the post-glacial expansions. We employed IRIS to compute times for grand most recent common ancestors applied between pairs of subjects drawn from Georgians, Armenians, Turks, Syrians, Lebanese, Jordanians, Saudi Arabians and Yemenis, as well as pair-wise FST’s based on the estimated times. We contrasted these results with raw SNP counts and pairwise FST’s obtained from those counts. We applied MDS and hierarchic clustering to identify geographically informative relationships, and observed a clear pattern of a north-to-south gradient. Within the western Middle East, our results suggest population differentiation dates consistent with post Last Glacial expansions, with subsequent population constriction into the Fertile Crescent in the presence of admixture. Our estimates show a north-to-south differentiation time of ~24,800-18,200 y.a., well within the Last Glacial Period. However, the time of J1/J2 haplogroups splits that mark this diversion are dated by BATWING well into the Last Glacial Period, around 31kya. These results place the genetic differentiation of the autosomal genome to be a bit more recent than the J1/J2 split. Expansions into Europe show a somewhat more recent record than those into Africa, with signals that show affinities with particular Middle Eastern regions, suggesting more recent trade impacts.

Population Genomics of the South American Andean Region. J. R. Homburger1, A. Moreno-Estrada1, C. R. Gignoux1, E. Sanchez-Rodriguez2, B. A. Pons-Estel3, E. Acevedo4, J. M. Cucho4, P. Miranda5, L. Catoggio6, M. A. García7, G. Berbotto8, A. Babini9, H. Scherbarth10, S. Toloza11, M. Alarcon-Riquelme2, C. D. Bustamante1 1) Department of Genetics, Stanford University, Stanford, CA, USA; 2) Centre for Genomics and Oncological Research (GENYO), University of Granada, Granada, Spain; 3) Sanatorio Parque, Rosario, Argentina; 4) Hospital Nacional Guillermo Almenara Irigoyen, Lima, Peru; 5) Facultad Medicina Occidente, Universidad de Chile, Santiago de Chile, Chile; 6) Hospital Italiano de Buenos Aires, Argentina; 7) H.I.G.A. General San Martin, La Plata, Argentina; 8) Hospital Eva Peron, Granadero Baigorria, Argentina; 9) Hospital Italiano de Córdoba, Córdoba, Argentina; 10) H.I.G.A. Oscar E. Alende, Mar del Plata, Argentina; 11) Hospital Interzonal San Juan Bautista, Catamarca, Argentina.

   The South American continent has experienced multiple migration and admixture events. Here, we examine the genetic history of the Andean region using 551 individuals from Colombia, Ecuador, Peru, Chile, and Argentina genotyped on Illumina SNP arrays. Combining these data with individuals from the 1000 Genomes Project and the Population Reference Panel (POPRES), we show that the admixed individuals have varying degrees of Native American and European ancestry. We use ADMIXTURE and principal component analysis (PCA) to study the genetic ancestry of the admixed South American individuals. We show that on average the Peruvian individuals have a higher amount of Native American ancestry while the Argentinian individuals had on average the highest amount of European ancestry when compared with the other admixed South American samples. We also find that Andean indigenous groups account for the largest proportion of Native American ancestry in the South American individuals. On the other hand, the largest proportion of European ancestry in admixed individuals is from Southern Europe and the Iberian Peninsula. We aim to estimate the specific timing and the subcontinental origin of ancestral components involved in South American admixture by applying ancestry-specific PCA and tract length analysis to admixed genomes.

Fast individual ancestry inference from DNA sequence data leveraging allele frequencies from multiple populations. O. Libiger1,3, V. Bansal1,2 1) Scripps Translational Science Institute, La Jolla, CA; 2) Department of Pediatrics, University of California San Diego, La Jolla CA; 3) MD Revolution, San Diego, CA.

   Estimation of individual ancestry from genetic data is useful for the analysis of disease association studies, understanding human population history and interpreting personal genomic variation. We describe a fast method for estimating the relative contribution of known reference populations to an individual’s genetic ancestry. Our method utilizes allele frequencies from the reference populations and individual genotype or sequence data to obtain a maximum likelihood estimate of the global admixture proportions using the BFGS optimization algorithm. It accounts for the uncertainty in genotypes present in sequence data by using genotype likelihoods instead of genotypes. Unlike previous methods, our method does not require individual genotype data from external reference panels and can utilize allele frequencies estimated from the analysis of homogeneous as well as admixed human populations. Simulation studies and application of the method to real datasets demonstrate that our method is 8-10 times faster than ADMIXTURE and has comparable accuracy. Using data from the 1000 Genomes project, we show that our method can estimate genome-wide average ancestry of admixed individuals using exome or low-coverage sequence data. Finally, we demonstrate that our method can be used to estimate admixture proportions using pooled sequence data making it a valuable tool for controlling for population stratication in sequencing based association studies that utilize DNA pooling.

How population growth affects linkage disequilibrium. A. Rogers Anthropology, University of Utah, Salt Lake City, UT.

   The “LD curve” relates the linkage disequilibrium (LD) between pairs of nucleotide sites to the distance that separates them along the chromosome. It is used to map disease genes and to search for adaptive evolution. But it also responds to the history of population size. The present research describes new theoretical results about the effect of population history. When a population expands in size, the LD curve grows steeper, and this effect is especially pronounced following a bottleneck in population size. When a population shrinks, the LD curve rises but remains relatively flat. As LD converges toward a new equilibrium, its time path may not be monotonic. Following an episode of growth, for example, it declines to a low value before rising toward the new equilibrium. These changes happen at different rates for different LD statistics. They are especially slow for estimates of σd2, which therefore allow inferences about ancient population history. For the human population of Europe, these results suggest a history of population growth.

The Structure of Linkage Disequilibrium in the Recently Admixed Populations. H. Zhang, J. Jung, B. Grant National Institute on Alcohol Abuse and Alcoholism, National Institutes of Health, Rockville, MD.

   The linkage disequilibrium (LD) in human population is found to be stronger with increasing geographic distance from Africa, which reflects the Africa origin of human history. Recently admixed populations (such as African Americans and Hispanic Americans) are more likely to harbor a larger number of genetic variants, relative to their inferred ancestral populations. However, the pattern of linkage disequilibrium in these admixed populations are not well studied. Here, we conduct an analysis of linkage disequilibrium at 659,184 single nucleotide polymorphisms (SNPs) in 924 unrelated samples from 11 Hapmap3 populations and 24 samples from Karitiana population (Native American in Brazil from Human Genome Diversity Project). African Americans (ASW) derive their genomic ancestry from African and European with an average of 77.3% African and 20.0% European ancestry. Hispanic Americans (MXL) lie on a cline of an average of 45.5% European ancestry, 42.9% Native American ancestry, 4.9% East Asian ancestry and 4.4% African ancestry. The mean of SNP based haplotype heterozygosity across the whole genome in these two admixed populations is greater than that of their major inferred ancestral populations. We further use r2 between all possible SNP pairs in various distance classes as a measure of LD and also focus on the proportion of SNP pairs with r2 greater than 0.8. Both of these two admixed populations show intermediate LD (as measured in r2 and the proportion of SNP pairs with r2>0.8), compared with their two major inferred ancestral populations. The extent of LD (r2) in African Americans (ASW) is more closer to that in African population (YRI) in the short distance classes, while the values of LD in African Americans (ASW) is more likely to be similar to the European Americans (CEU) with the increased distance classes. The amount of LD (r2) in Hispanic Americans (MXL) shows the similar pattern, but it is much closer to European Americans (CEU) in all distance classes. The findings on the structure of LD in admixed populations are helpful to better understand the evolution of human population and the design of the genetic association studies in admixed populations.

Using linkage disequilibrium to refine estimates of accelerating growth in human populations. M. Reppell1, J. Carlson1, S. Zöllner1,2, The BRIDGES Consortium 1) Department of Biostatistics, University of Michigan, Ann Arbor, MI; 2) Department of Psychiatry, University of Michigan, Ann Arbor, MI.

   Correctly modeling the effective size of a population is critical to making accurate inferences about mutation and migration rates, and the strength of selective pressures. In humans, several large sequencing studies have given us novel insight into a genome characterized by an abundance of extremely rare genetic variation, consistent with a history of recent massive population growth. These large sequencing studies offer us unprecedented resolution for distinguishing between models of recent growth. To improve on conventional inference methods we propose a novel likelihood based approach that incorporates pairwise r2, a measure of linkage disequilibrium, in addition to the site frequency spectrum. We observe that over short genetic distances, pairwise r2 is a function of the variance in ancestral tree branch lengths, and therefore contains information about ancestral population sizes lacking from the site frequency spectrum, which is a function only of the mean total ancestral branch lengths. Using simulations we show that with large samples, the inclusion of pairwise r2 improves the accuracy of demographic inference in populations that have undergone recent growth, relative to methods relying solely on the site frequency spectrum. We quantify how increasing sample size increases the accuracy of inferences about recent demography, and magnifies the improvement our method yields versus conventional approaches. Lastly, we apply our method to regions defined as neutral in whole genome sequence data from ~4,000 European ancestry individuals sequenced as part of the BRIDGES consortium. This dataset has ideal features for our purposes; providing both a large sample and non-coding genetic regions free from evidence of ongoing selection, a mixture unavailable from exome only or functional sequencing projects. We use a Monte Carlo method to estimate the likelihood of the observed data under a range of realistic growth models, including those incorporating continuous, accelerating, faster than exponential growth. With our data we are able to simultaneously make inferences about the mutation rate, μ, and the rate of accelerating growth experienced by the European population from which our sample is drawn.

Forensic Phenotyping in Brazilian population: SLC24A5 and ASIP as phenotypic predictors genes of skin, eye and hair color. C. Fridman, F. A. Lima, F. T. Gonçalves Dept of Legal Medicine, Ethics and Occupational He, University of São Paulo, São Paulo, São Paulo, Brazil.

   Pigmentation is a very variable and complex trait in humans and it is determined by the interaction of environmental factors, age, disease, drugs, hormones, exposure to ultraviolet radiation and genetic factors, including pigmentation genes. Many of these genes and their variants have been associated with phenotypic diversity of skin, eyes and hair color in homogeneous populations. SLC24A5, TYR, MC1R, SLC45A2, ASIP, OCA2 and HERC2 genes are noteworthy for their important contribution in pigmentation process. Prediction of phenotypes by using genetic information has benefited forensic area in many countries because it has made possible to infer physical characteristics from biological samples and, thus, lead criminal investigations. The aim of this study was evaluate polymorphisms in TYR, ASIP, SLC24A5 and SLC45A2 genes in a sample of 350 individuals of admixed population from Brazil, intending to use the data in forensic genetics casework in several situations. Volunteers answered a questionnaire where they self-reported their skin, eye and hair colors, sun sensitivity and lifestyle. No significant results were observed except for SLC24A5 and ASIP. The polymorphic homozygous allele of rs1426654 and rs6058017 in SLC24A5 (OR 32.88 p<0.0001) and ASIP (OR 8.68 p< 0.007) respectively, showed strongest association with fairer skin. Besides, the polymorphic homozygous allele in SLC24A5 exhibited relation to light eye color – green (OR 9.82 – p<0.0001), blond hair (OR 50.14 – p<0.0001) and also to increased sensitivity to sun exposure (OR 7.86 – p<< 0.0002). Our data suggests that polymorphic allele (A) in the SLC24A5 and ASIP genes is correlated with characteristics of light pigmentation, while the ancestral allele (G) is related to darker traits. Our findings corroborate previously published data on studies in European and African populations. These associations between pigmentation genes and skin, eyes and hair color shows that it is possible to use molecular information of an individual to access its phenotypic traits and use the obtained in attempt to help forensic investigations. Additional analyzes are ongoing as part of a project that evaluates 600 samples to check possible associations of phenotypic pigmentation in the Brazilian population with the mentioned genes. Financial Support: FAPESP (2012/02043-6), LIM 40/HCFMUSP and Department of Legal Medicine, Ethics and Occupational Health – FMUSP.

Maternal Age Effect and Severe Germline Bottleneck in the Inheritance of Human Mitochondrial DNA. M. Su1, B. Rebolledo-Jaramillo2, N. Stoler2, J. A. McElhoe3, B. Dickins4, D. Blankenberg2, T. Korneliussen5, F. Chiaromonte6, R. Nielsen5, M. M. Holland3, I. M. Paul7, A. Nekrutenko2, K. D. Makova1 1) Department of Biology, Penn State University, USA; 2) Department of Biochemistry and Molecular Biology, Penn State University, USA; 3) Forensic Science Program, Penn State University, USA; 4) School of Science and Technology, Nottingham Trent University, UK; 5) The Department of Integrative Biology, the University of California at Berkeley, USA; 6) Department of Statistics, Penn State University, USA; 7) Department of Pediatrics, College of Medicine, Penn State University, USA.

   The manifestation of mtDNA diseases depends on the frequency of heteroplasmy (the presence of several alleles in an individual), yet its transmission across generations cannot be readily predicted due to the lack of data on the size of mtDNA bottleneck during oogenesis. For deleterious heteroplasmies, a severe bottleneck may abruptly transform a benign (low) frequency in a mother into a disease-causing (high) frequency in her child. Here we present a high-resolution study of heteroplasmy transmission conducted on blood and buccal mtDNA of 39 healthy mother-child pairs of European ancestry (a total of 156 samples, each sequenced at ~20,000x/site). On average, each indivual carried one heteroplasmy, and one in eight individuals carried a disease-causing heteroplasmy, with minor allele frequency ≥1%. We observed frequent drastic heteroplasmy frequency shifts between generations and estimated the size of the bottleneck at only ~29-35 mtDNA molecules. Strikingly, we found a positive association between the number of heteroplasmies in a child and maternal age at fertilization, likely attributable to oocyte aging. Accounting for heteroplasmies, we estimate mtDNA germline mutation rate to be 1.3×10-8 mutations/site/year – lower than in previous pedigree studies but in agreement with phylogenetic studies, thus solving a long-standing controversy and informing the use of mtDNA in dating evolutionary events. This study takes advantage of droplet digital PCR (ddPCR) to validate heteroplasmies and confirm a de novo mutation. These results have profound implications for predicting the transmission of disease-causing mtDNA variants and illuminate mitochondrial genome evolutionary dynamics.

An estimate of the average number of recessive lethal mutations carried by humans. Z. Gao1, D. Waggoner2,3, M. Stephens2,4, C. Ober1,2,5, M. Przeworski6,7 1) Committee on Genetics, Genomics and Systems Biology; 2) Dept of Human Genetics; 3) Dept of Pediatrics; 4) Dept of Statistics; 5) Dept of Obstetrics and Gynecology, University of Chicago, Chicago, IL; 6) Dept of Biological Sciences; 7) Dept of Systems Biology, Columbia University, New York, NY.

   The effects of inbreeding on human health depend critically on the number and severity of the recessive deleterious mutations carried by an individual. In humans, estimates of the burden of recessive mutations per individual are based either on comparisons between consanguineous and non-consanguineous couples, an approach that confounds socioeconomic and genetic effects, or on carrier screening for disease-causing mutations, which suffers from other biases, notably the highly incomplete catalogue of disease-causing mutations. To circumvent these limitations, we sought to estimate a lower bound of the burden by focusing on recessive lethal disorders in a founder population with almost complete Mendelian disease ascertainment and a known pedigree. By considering all autosomal recessive lethal diseases recognized in the population and simulating allele transmissions along the pedigree, we estimated that each haploid human genome carries on average approximately one autosomal recessive allele that leads to severe disorders at or after birth in homozygous condition. When compared with previous estimates, our result suggests that recessive mutations that are lethal constitute a substantial fraction of the total burden of recessive deleterious mutations in humans.

Inference of mutation rates using hidden relatedness. P. F. Palamara1,2,3, P. Wilton4, M. Fromer5,6, G. Kirov7, S. McCarroll3,6,8, P. Sklar5,9, M. Owen7, S. Purcell5,6,10, M. O’Donovan7, J. Wakeley4, I. Pe’er11, 12 1) Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA; 2) Department of Epidemiology, Harvard School of Public Health, Boston, MA, USA; 3) Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Boston, MA, USA; 4) Department of Organismic and Evolutionary Biology, Harvard University, Boston, MA, USA; 5) Division of Psychiatric Genomics in the Department of Psychiatry, and Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, USA; 6) Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Boston, MA, USA; 7) Medical Research Council Centre for Neuropsychiatric Genetics and Genomics, Institute of Psychological Medicine and Clinical Neurosciences, Cardiff University, Cardiff, UK; 8) Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA; 9) Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA; 10) Analytic and Translational Genetics Unit, Psychiatric and Neurodevelopmental Genetics Unit, Massachusetts General Hospital, Boston, MA, USA; 11) Center for Computational Biology & Bioinformatics, Columbia University Medical Center, New York, NY, USA; 12) Department of Computer Science, Columbia University, New York, NY.

   Reliably estimating the mutation rate in modern humans has several implications for our understanding of demographic history (Scally and Durbin, Nature Reviews Genetics 2012). Recent estimates of the mutation rate obtained using de novo mutations in next-generation sequencing of families, however, were found to disagree with phylogenetic mutation rates derived from fossil evidence, motivating the development of new analytical methods. We describe an approach for the inference of mutation rates based on sharing of identical-by-descent (IBD) segments in sequencing data across purportedly unrelated individuals from a population. Using coalescent theory, we derive theoretical results for the distribution of mutation events found on IBD segments longer than a specified centimorgan threshold, for arbitrary demographic settings, under the SMC and SMC’ models. Leveraging the relationship between the length and the age of shared IBD haplotypes, we devise a method to estimate both genotype error rates and mutation rates. The proposed approach based on hidden relatedness offers a substantial increase in statistical power compared to family-based analysis of de-novo mutations. This gain in power occurs despite the fact that the fraction of genome shared through long (e.g. >1cM) IBD segments across purportedly unrelated individuals is usually small, since IBD regions harbor events which have occurred in the recent past, over tens to hundreds of generations. Furthermore, analysis of de-novo mutations in trio-based studies is limited to genomic regions transmitted through known pedigree relationships, while when accurately phased data is available, mutation events can be analyzed on IBD segments across the quadratically larger set of all pairs of unrelated individuals. We validate the proposed methodology using synthetic datasets for a variety of demographic scenarios, and analyze mutation rates in 1246 trio-phased unrelated individuals from a recent exome sequencing study (Fromer et al., Nature 2014) of schizophrenia patients.

Hundreds of shared ‘deletions’ in ancient hominins are polymorphic in modern human populations. D. Radke1,2, C. Lee3, S. Sunyaev1,2 1) Harvard Medical School, Boston, MA; 2) Brigham and Women’s Hospital, Boston, MA; 3) The Jackson Laboratory for Genomic Medicine, Farmington, CT.

   Deciphering the genetic uniqueness of modern humans in relation to distant hominins and other primates is one of the central goals of human evolutionary genomics. Recently, with the availability of high-coverage sequence data for both Neanderthal and Denisova, it is now possible to more precisely determine the particular loci responsible for modern human uniqueness. While much of the distinguishing variation may be due to single nucleotide variants, genomic structural variants may also play a crucial role. Structural variants can be a potent phenotype-shaping force, particularly for unbalanced events, such as deletions, as they can alter reading frames and remove regulatory component space. Analyzing sequence read depth across archaic genomes, we find hundreds of ‘deleted’ regions in Neanderthal and Denisova (including many shared deletions), which are polymorphic in modern human populations. Some shared deletions overlap genes, and shared deletions as a set have a significantly higher allele frequency in modern human populations. Because these deletions are polymorphic in modern humans, they may represent regions of modern human-specific insertion, regions lost in archaic human lineages, or deletions polymorphic in both modern and archaic populations.

Convergent mechanisms underlying hypoxia adaptation in Drosophila and Humans. A. R. Jha1,2,3, D. Zhou4,5, C. D. Brown1,2, G. G. Haddad4,5, M. Kreitman1,2,3, K. P. White1,2,3 1) Institute for Genomics and Systems Biology, The University of Chicago, Chicago; 2) Department of Human Genetics, The University of Chicago, Chicago, USA; 3) Department of Ecology and Evolution, The University of Chicago, Chicago, USA; 4) Department of Pediatrics, Division of Respiratory Medicine, University of California at San Diego, San Diego, CA, USA; 5) Rady Children’s Hospital, San Diego, CA, USA.

   The ability to withstand low oxygen (hypoxia) is a highly polygenic yet mechanistically conserved trait that has important implications for both human health and evolution. However, little is known about the diversity of genetic mechanisms involved in hypoxia-adaptation in evolving populations. We used experimental evolution and whole-genome sequencing in Drosophila melanogaster to investigate the role of natural variation in adaptation to hypoxia. Using a Generalized Linear Mixed Model we identified significant allele frequency differences between three independently evolved hypoxia-tolerant populations and normoxic controls for ~4000 single nucleotide polymorphisms. Many of these variants are clustered in 66 distinct genomic regions representing long-distance linkage in our populations. These regions are enriched for genes associated with metabolic processes and contain genes that are differentially expressed between hypoxia-tolerant and normoxic populations. Additional genes associated with open tracheal system development and notch signaling pathways also showed evidence of directional selection. Knocking down the gene expression of a handful of candidate genes showed striking enhancement in survival in severe hypoxia, demonstrating their functional relevance in hypoxia adaptation. Using whole genome genotyping data from three high-altitude human populations, namely— Sherpas, Tibetans, and Ethiopians , we show that the human orthologs of the genes under selection in flies are also under positive selection in all three high-altitude human populations. Therefore, comparative genomics approaches, such as the one we have taken here, can be powerful in revealing genes and pathways underlying evolutionarily ancient traits that have conserved functions for millions of years.

Evaluating the impact of recent human demography on the frequency spectra using numerical solution of time-inhomogeneous diffusion equation. E. Koch1, J. Novembre2 1) Department of Ecology and Evolution Unversity of Chicago, Chicago, IL; 2) Department of Human Genetics Unversity of Chicago, Chicago, IL.

   Differences in recent demographic history appear to be an important driver of observed levels of genetic diversity among human populations. Recent attention has particularly centered on how populations that went through the out-of-Africa bottleneck have lower heterozygosity and polymorphic sites that are proportionally more likely to be nonsynonymous or predicted to be damaging. These results have suggested differences in the frequency spectrum of deleterious variation are also caused by varying population demographic histories. To investigate these phenomena in more detail, we perform numerical solutions to time-inhomogeneous diffusion equations for the allele frequency spectrum under the Poisson Random Field Model. This allows us to efficiently examine how the frequency spectra has evolved through time under a large number of possible human demographies and distributions of selective effects. We also are able to easily stratify variation observed today by the age at which the variation was generated. Using these tools, we demonstrate the ability of natural selection and demography to produce observed patterns and evaluate the relative impacts of population bottlenecks, recent growth rates, and changing efficacy of selection on the abundance of different variant types. The results emphasize how human frequency spectra are far from equilibrium and make more clear how frequencies are affected by major human demographic events at different timescales. For instance, in out-of-Africa populations the impacts of the bottleneck on the frequency spectra are still being realized, even as more recent growth events lead to an overlaid influx of rare variants. We quantify these effects and discuss their importance for interpretation of human genetic variation patterns. .

The Genetic Architecture of Skin Pigmentation in the Southern African ≠Khomani San. A. R. Martin1, J. M. Granka2, C. R. Gignoux1, M. Lin3, C. Uren4, M. Möller4, C. J. Werely4, J. M. Kidd5, M. W. Feldman2, E. G. Hoal4, C. D. Bustamante1, B. M. Henn1,3 1) Genetics Department, Stanford University, Stanford, CA; 2) Department of Biological Sciences, Stanford University, Stanford, CA, 94305; 3) Department of Ecology and Evolution, SUNY Stony Brook, NY 11794; 4) Division of Molecular Biology and Human Genetics, Stellenbosch University, Tygerberg, South Africa; 5) Department of Human Genetics, University of Michigan, Ann Arbor MI.

   Skin pigmentation is one of the most recognizably diverse phenotypes in humans across the globe, but its highly genetic basis has been primarily studied in northern European, Asian, and African American populations. The Eurasian pigmentation alleles are among the most differentiated variants in the genome, suggesting strong selection for light skin pigmentation. Light skin pigmentation is also observed in the far southern latitudes of Africa among KhoeSan hunter-gatherers of the Kalahari Desert. The KhoeSan hunter-gatherers are among the oldest human populations, believed to have diverged from other populations 100,000 years ago, and maintain extraordinary levels of genetic diversity. It is unknown whether light skin pigmentation represents convergent evolution or the ancestral human phenotype. We have collected ethnographic information, pigmentation phenotypes, and genotype data from 136 individuals in the ≠Khomani San from the Kalahari. To understand the genetic basis for light skin pigmentation, we have also exome sequenced 83 ≠Khomani San individuals to high coverage, generating one of the largest indigenous African exome datasets sequenced outside of the 1000 Genomes Project. In this study, ≠Khomani individuals have 11.5% admixture with Europeans and 10.9% admixture with Bantu speakers on average. European ancestry significantly lightens skin and explains 13.3% of the variance in pigmentation, and Bantu ancestry significantly darkens skin and explains 16.1% of the variance in pigmentation on average. We estimate that pigmentation is highly heritable (h2 = 0.887 ± 0.188 standard error) and find that most of the heritability can be explained by 50 known pigmentation genes (0.527 ± 0.310 or 64.1% on average). After controlling for admixture with European and Bantu-speaking populations, a linear mixed model GWAS approach does not identify variants significantly associated with pigmentation. However, pigmentation genes are among the most globally differentiated between the ≠Khomani San and European or Bantu individuals, and aggregating differentiation with association data improves power to detect variants influencing selected traits. We identify highly differentiated variants between the ≠Khomani and both European and Bantu populations in multiple canonical pigmentation genes, including OCA2 and MITF. Our results highlight the strength of diverse population studies to explain phenotypic variation impacted by human evolutionary history.

Association study confirms that two OCA2 polymorphisms are involved in normal skin pigmentation variation in East Asian populations. E. Parra, K. Eaton, P. Kavanagh, M. Edwards, S. Krithika Dept Anthropology, Univ Toronto, Toronto, ON, Canada.

   The last decade has witnessed dramatic advances in our understanding of the genetic architecture of normal skin pigmentation variation in European populations. However, evidence is much more limited for East Asian populations. Recently, we carried out a study aimed at identifying putative signatures of positive selection in pigmentation candidate genes in populations of East Asian ancestry. Based on the list of genes that show putative signatures of selection in East Asia, we prioritized a number of polymorphisms based on 1/ allele frequency information (e.g. differences in frequency between East Asian and non-East Asian populations) 2/ potential functional effects (e.g. Polyphen, SIFT and CADD scores) and 3/ conservation (e.g. GERP++ scores). The panel of SNPs selected includes 3 markers in the LYST gene (rs3754234, rs7522053 and rs4659610), one marker in the MLPH gene (rs2292881), 2 markers in the OPRM1 gene (rs1799971 and rs6917661), one marker in the EGFR gene (rs2227983), 4 markers in the BNC2 gene (rs9406647, rs3739714, rs10756778 and rs10962591), one marker in the TH gene (rs4930046), 3 markers in the OCA2 gene (rs1800414, rs74653330 and rs7497270), one marker in the TRPM1 gene (rs3809578) and 2 markers in the MC1R gene (rs33932559 and rs885479). We evaluated the association of these polymorphisms with skin pigmentation measured quantitatively using a DSM II colorimeter in a sample comprising 452 individuals of East Asian ancestry. Two previously described nonsynonymous polymorphisms within the OCA2 gene, rs1800414 (His615Arg) and rs74653330 (Ala481Thr) were strongly associated with melanin levels in this sample. Under an additive model, the common rs1800414 G allele, coding for Arginine, is associated with a decrease of 0.9 units in melanin levels. The rs74653330 A allele, coding for Threonine, is present at low frequency in East Asia (around 3% in our sample) and has a stronger effect on melanin levels than rs1800414 (decrease of 1.3 melanin units). No significant associations with skin pigmentation were observed for any of the other variants.

Neanderthal Origin of the Haplotypes Carrying the Functional Variant Val92Met in the MC1R in Modern Humans. Q. Ding1, Y. Hu1, S. Xu2, C. Wang1, H. Li1, R. Zhang1, S. Yan1, J. Wang1, L. Jin1,2 1) State Key Laboratory of Genetic Engineering and Ministry of Education Key Laboratory of Contemporary Anthropology, School of Life Sciences, Fudan University, Shanghai, China; 2) CAS-MPG Partner Institute for Computational Biology, Shanghai Institute for Biological Sciences (SIBS), Chinese Academy of Sciences (CAS), Shanghai, China.

   Skin color is one of the most visible and important phenotypes of modern humans. Melanocyte-stimulating hormone and its receptor played an important role in regulating skin color. Here we present evidence of Neanderthal introgression encompassing the melanocyte-stimulating hormone receptor gene MC1R. The haplotypes from Neanderthal introgression diverged with the Altai Neanderthal 103.3 KYA, which postdates the anatomically modern human – Neanderthal divergence. We further discovered that all of the putative Neanderthal introgressive haplotypes carry the Val92Met variant, a loss-of-function variant in MC1R that is associated with multiple dermatological traits including skin color and photoaging. Frequency of this Neanderthal introgression is low in Europeans (~5%), moderate in continental East Asians (~30%), and high in Taiwanese aborigines (60-70%). Since the putative Neanderthal introgressive haplotypes carry a loss-of-function variant that could alter the function of MC1R and is associated with multiple traits related to skin color, we speculate that this Neanderthal introgression, together with the previously reported Neanderthal introgression at HYAL2, may have played an important role in the local adaptation of modern Eurasians to sunlight intensity.

Whole genome sequencing to uncover adaptation to high altitude in the Andes. M. Muzzio1,2, K. Slivinski3, M. C. Yee4, T. Cooke5, C. D. Bustamante5, G. Bailliet1, C. M. Bravi1,2, E. E. Kenny3,4,6,7,8 1) Consejo Nacional de Investigaciones Cientificas y Tecnologicas, La Plata, Buenos Aires, Argentina; 2) Facultad de Ciencias Naturales y Museo, Universidad Nacional de La Plata, Argentina; 3) The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, NY; 4) Dinneny Lab. Carnegie Institution of Washington. Department of Plant Biology, CA; 5) Stanford University School of Medicine, CA; 6) Department of Genetics & Genomic Sciences, Icahn School of Medicine at Mount Sinai, NY; 7) The Center for Statistical Genetics, Icahn School of Medicine at Mount Sinai, NY; 8) The Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, NY.

   There is interest in human adaptation to a diversity of environments, including finding the genetic basis to phenotypes favorable to pressures such as hypoxia. We have preliminary Illumina Exome Array data on a set of 43 individuals from high altitude villages in the Andes from the Humahuaca area, Argentina (~2500 meters above sea level) and 11 individuals from a neighboring lowland population, Tartagal, Argentina (less than 500 meters above sea level), all with over 90% Native American ancestry estimated using the Admixture software. Currently, we are sequencing full genomes of 10 individuals from each of these populations, in search for new population-specific variants. We will use the population branch statistics (PBS) to identify highly differentiated genomic regions between the highlanders (Andean) and lowlanders (Chaqueños). We will discuss the results of our scan in light of related work on the adaptation of Tibetans, Ethiopians, and other Andean populations to hypoxia.

IFNL3/IFNL4 region shows evidence for recent positive selection specific to Asian populations. G. L. Wojcik, C. D. Bustamante Department of Genetics, Stanford School of Medicine, Stanford, CA.

   Hepatitis C virus (HCV) is a global health burden, chronically infecting 130-150 million people and causing 350,000-500,000 deaths per year from HCV-related liver disease. Twenty-five years after the discovery of HCV, there is no vaccine and treatment remains ineffective in a large proportion of individuals. Heterogeneity in clinical outcomes such as spontaneous clearance of the virus, as well as sustained virologic response (SVR) after treatment, has been observed between individuals of different genetic ancestry. Previous genetic studies have pinpointed a single nucleotide polymorphism (SNP) in the interferon-λ 3 and 4 (IFNL3/IFNL4) region (rs12979860) as being strongly associated with clinical outcome. While the derived and favorable allele of rs12979860 (C) is present globally, its frequency is greatly differentiated by continent with the lowest in African populations (34-49%), and the highest in Asian populations (89-96%). To determine if these differences are due to selective pressures, data from the phase 3 release of the 1000 Genomes Project (TGP) was analyzed for population-specific signatures of selection. Derived allele frequency (DAF), Fst, nucleotide diversity (π), and haplotype structure were examined and compared in populations from Europe, Africa, Asia, and the Americas. A 5 kilobase (kb) region around IFNL3/IFNL4 showed decreased nucleotide diversity, high DAF, and increased haplotype homozygosity in Asian populations. This pattern is not found in Native American populations, suggesting recent positive selection specific to Asia. Historical selective pressures from HCV, or likely a related ancestral virus, may have driven the favorable rs12979860 allele to near fixation. However, Asia currently has disproportionately high HCV-related morbidity and mortality despite this adaptation, suggesting further evolution of the virus. Differences in clinical outcomes within Asian populations may therefore be also due to non-IFNL3/IFNL4 genetic variation. Further studies are needed to identify additional genetic associations that will better our knowledge of how HCV interacts with the human immune system.

A genome-wide natural selection scan using 1000 high-coverage, Alzheimer’s-specific whole-genome sequences. M. Ebbert1, H. Smith1, T. Dawson1, S. Grossman2, M. Norton3, J. Tschanz3, R. Munger4, C. Corcoran5, P. Ridge1, J. Kauwe1, ADNI 1) Department of Biology, Brigham Young University, Provo, UT; 2) Broad Institute of MIT and Harvard, Cambridge, MA; 3) Department of Family Consumer and Human Development, Utah State University, Logan, Utah; 4) Department of Nutrition, Dietetics, and Food Sciences, Utah State University, Logan, Utah; 5) Department of Mathematics and Statistics, Utah State University, Logan, Utah.

   Natural selection studies have impacted genetic research and our understanding of human adaptations, including malaria resistance, skin pigmentation, and others. More recently, Grossman et al. discovered adaptations to bacterial response and specific human phenotypes by performing a genome-wide selection scan using the 1000 Genomes data—identifying specific adaptive mutations without foreknown, adaptive phenotypic traits. This scanning approach successfully reversed the study type from a hypothesis-driven to a hypothesis-generating study. While the genome-wide scan was successful, there are potential limitations: (1) the 1000 Genomes data has only 179 whole-genome sequences; (2) the sequences were low coverage (2-6x average coverage); and (3) genotypes for the 1000 Genomes data may be inaccurate due to low coverage and because they were not genotyped using modern ‘joint-calling’ algorithms. We are performing an updated analysis including 1000+ Alzheimer’s-specific, whole-genome sequences with 37x average coverage. Our data set includes 152 Alzheimer’s disease (AD) cases and 211 ‘super controls’. The ‘super controls’ are APOE ε4 positive individuals aged 75+ that do not exhibit AD symptoms. Using our large, high-coverage data set, we will explore whether larger sample size and deeper coverage reveals previously undiscovered loci under selection. We will also explore whether using an AD-specific data set will enhance selection signals related to AD under the premise that AD-related loci are known to be under selection. As such, AD may be the result of a conflicting pleiotropic effect of an otherwise beneficial genotype. After joint calling all samples using GATK’s HaplotypeCaller, we will perform a genome-wide natural selection scan using the Composite of Multiple Signals (CMS) algorithm on our data set to identify specific loci under selection. These results will be compared to Grossman et al.’s previous results to determine whether any new loci show evidence of selection and whether any previously identified regions were eliminated (potential false positives). Previous and newly identified loci will be examined for potential AD implications based on known disease associations and functional annotations. Top candidates will be tested using an association test. Natural selection studies reveal important genetic artifacts for observed phenotypes. Many AD-related genes are under selection and there are likely other undiscovered AD-related genes.

Evolutionary history of pigmentation candidate gene diversity in a Melanesian population. H. Norton, E. Werren Department of Anthropology, University of Cincinnati, Cincinnati, OH.

   Pigmentation of the skin, hair, and eyes are complex phenotypic traits determined by multiple loci. Human skin pigmentation is a trait that is believed to have evolved under strong natural selection in response to varying levels of ultra-violet radiation (UVR) intensity. Lighter skin color has evolved multiple times in human evolutionary history, but it is unclear if the darker skin color observed in many high UVR populations is also the result of evolutionary convergence (suggesting that population-specific mutations may have been favored by positive selection) or if instead ancestral variants associated with darker skin color have been maintained in high-UVR populations via purifying selection. To begin to address this question we compare DNA sequence variation from multiple pigmentation candidate genes in a Melanesian population to variation observed in European, East Asian, and African populations sequenced in the 1000 Genomes Project. Summaries of the site frequency spectrum, including Tajima’s D (TD), for three genes, ASIPOCA2, and TYRP1, do not indicate that any of these genes were targeted by positive selection in the Melanesian population (ASIP TD = 0.037, OCA2 TD = -0.85,TYRP1 TD = -0.55). With the exception of a single novel haplotype in the OCA2 locus observed at a frequency of ~10% there is little evidence that Melanesians exhibit any high frequency population-specific haplotypes at these loci, suggesting that if an independent adaptation to high UVR conditions occurred in Melanesians then other pigmentation loci are responsible. However, there is also little evidence that Melanesians are similar to Africans at these loci, which one might expect if Melanesians share ancestral haplotypes with other high UVR populations: pairwise FST estimates between Melanesians and Africans for the pigmentation loci examined here range from 0.043-0.443, and the majority of Melanesian haplotypes are common haplotypes shared between Africans, Europeans, and East Asians. We explore these patterns of sequence variation and inter-population divergence at pigmentation loci in the context of evolutionary models for pigmentation change in the human species and with consideration to Melanesian population history.

Inference of the strength of purifying selection based on haplotype patterns. D. Ortega Del Vecchyo1, K. E. Lohmueller1,2, J. Novembre3 1) Interdepartmental Program in Bioinformatics, University of California, Los Angeles, CA; 2) Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA; 3) Department of Human Genetics, University of Chicago, IL.

   The strength of purifying selection is a central factor underlying levels of genetic diversity in a population and is important to characterize to understand the expected genetic architecture of disease traits. Recent sequencing studies with large sample sizes have revealed a much higher proportion of non-synonymous variants among rare versus common variants in human populations. This finding suggests that natural selection is acting against such variants to keep them at low frequencies in the population. To estimate the strength of purifying selection, we have developed a method that uses the lengths of pairwise haplotype identity among rare-variant-carrying haplotypes. Unlike previous approaches, our method conditions on the present-day frequency of the allele and is based on the intuition that alleles under purifying selection are on average younger than neutral alleles and, therefore should have higher average levels of haplotype identity among variant carriers. To obtain the probability distribution on the lengths of pairwise haplotype identity, one needs to perform two integrations: one over all possible allele frequency trajectories and another one over all pairwise coalescent times given a certain allele frequency trajectory. The integration over the space of possible allele frequency trajectories is done using a fast importance-sampling algorithm while the integration over the coalescent times is done using an analytical solution. Using the probability of the lengths of the haplotypes under different selective coefficients, we can calculate the likelihood for a selective coefficient for a single variant or set of variants. We use simulations to test how accurately the method estimates the selective coefficient under different demographic scenarios, such as a constant population size and a realistic model of European population growth. Variants with the same selective coefficient are harder to differentiate from neutral variants in scenarios of recent population growth. These methods will be applied to a set of 202 drug target genes sequenced in 14,002 individuals (Nelson et al, 2012, Science) to identify which genes are most likely to harbor damaging variants that may predispose to disease.

Highlighting strongly differentiated regions using three high coverage genomes each from a set of worldwide human populations. L. Pagani1,2,3, T. Kivisild1 1) Division of Biological Anthropology, University of Cambridge, Cambridge, Cambridgeshire, United Kingdom; 2) The Wellcome Trust Sanger Institute, CB10 1SA, Hinxton, UK; 3) Molecular Anthropology Lab, Department of Biological Geological and Environmental Sciences, University of Bologna, Italy.

   Following the steady reduction in sequencing costs, several international projects will shortly make available sets of 2-4 high coverage genomes each from hundreds of worldwide human populations. While these resources allow for refining the demographic histories of the studied populations, little can be done to detect signatures of differentiation, possibly driven by natural selection, on these populations. The selection scan methods available to date indeed focus on various genomic components (SNPs, Haplotypes, LD blocks), yet relying on genome frequencies rather than on the full sequence information. Here we show how the top 1% of genic regions analysed using only three genomes each from two populations (CEU and YRI) contains as many as 25% of the top 5% FST candidates obtained using 160 low coverage individuals from the 1000 Genomes Project. The three genomes from each chosen population are combined in three pairs, and FST based on average pairwise differences is calculated between populations. The average FST is computed on a sliding window of 10000 or 50000 bp across all the pop1-pop2 sets of genomic pairs. The top 1% windows showing the highest differentiation were selected and inspected for their gene content. Of the 1785 genes identified by the FST scan based on the 160 low coverage individuals (taken as the gold standard), 98 were found among the 439 genes included in the top 1% 50000bp windows of the YRI-CEU pairs. This 2.4-fold enrichment was found significant with a chi-squared test (p=10-19). The empirical ranking nature of the gold standard did not allow a formal assessment of the false positive rate of our newly developed method. However, the overlap between the top genes retrieved using the 10000 and 50000 bp windows showed a significant enrichment in high ranking FST signals. In summary the proposed approach based on three genomes per population is capable of retrieving at least 25% of the genes under putative natural selection found from traditional methods. Ongoing power assessment will also inform on the optimal number of high coverage genomes per population required to further reduce the false positive rate. These promising results, given the limitations imposed by the small sample sizes, make our method suitable to be applied on newly sequenced populations (expected to be released on Mid June 2014, during the SMBE conference).

Searching for soft selective sweeps in worldwide human populations. Z. A. Szpiech1, R. H. Hernandez1,2,3 1) University of California, San Francisco, San Francisco, CA; 2) Institute for Human Genetics, University of California, San Francisco, San Francisco, CA; 3) Institute for Quantitative Biosciences (QB3), University of California, San Francisco, San Francisco, CA.

   There is ample debate about the strength and mode of natural selection that has occurred in recent human evolution. This is particularly so for classical hard sweeps, during which an adaptive allele quickly drags a single haplotype to high frequency. An alternative model of adaptation involves soft sweeps, whereby multiple haplotypes are brought to high frequency (i.e. when a previously segregating neutral or slightly deleterious allele becomes adaptive in a new environment). Existing haplotype-based tests—such as the integrated haplotype score (iHS) that scans for positive selection by tracking the decay of haplotype homozygosity—work under the assumption that a positively selected region will be dominated by a single haplotype. However, iHS is expected to lose power under a soft sweep. Here we develop a statistic, inspired by iHS and recent work in Drosophila population genetics, designed to detect recent soft sweeps by tracking the decay of homozygosity of multiple haplotypes away from a core locus. We evaluate our statistic with rigorous simulations under multiple realistic models of human demography. We find that it has high power to detect both hard and soft sweeps and has improved power compared to iHS. In particular, for a fixed selection coefficient, our simulations suggest that we have greatest power to detect soft sweeps in African populations, which have been understudied to date. We apply this statistic and iHS to a large human genotype dataset of 1,728 unrelated individuals spanning 20 worldwide populations from the 1000 Genomes Project. A large number of regions identified by our statistic are not identified by iHS, in particular in African populations. This suggests a possibly important role of soft sweeps in recent human evolution.

Natural selection at the melanocortin-3 receptor gene loci. I. Yoshiuchi Dept Diabetes Mellitus and Medicine, Yoshiuchi Medical Diabetes Institute, Kamakura, Kanagawa, Japan.

   Obesity is significantly associated with type 2 diabetes mellitus, metabolic syndrome, hypertension, stroke, and cardiovascular diseases. The worldwide prevalence of obesity is increasing steadily. Obesity is highly heritable disease that causes serious health problems. During the traditional cycles of feast and famine, natural selection of obesity-related genes would be significant because these genes control body weight and fat levels. Human adaptation to environmental changes in food supply, lifestyle, and geography may have influenced the selection of genes associated with the metabolism of glucose, lipids, carbohydrates, and energy. The melanocortin-3 receptor (MC3R) gene is one of obesity-associated genes, and MC3R mutations have been shown to be associated with obesity. MC3R-deficient mice showed increased fat mass. Here, We aimed to uncover evidence of selection at the MC3R gene loci. We performed a three-step method to detect selection at the MC3R gene loci using the HapMap population data. We used Wright’s F statistics as a measure of population differentiation, the long-range haplotype test to test extended haplotypes, and the integrated haplotype score test to detect selection at the MC3R gene loci. We observed natural selection at the MC3R gene loci by the integrated haplotype score test in the African population. This finding provides evidence of natural selection at the MC3R gene loci. Further discoveries are warranted on the adaptive evolution of obesity-associated genes.

Positive selection in smallpox associated genes among Mesoamericans. O. A. Garcia1, K. Arslanian2, D. Whorf1, M. Shriver3, L. G. Moore4, T. Brutsaert5, A. W. Bigham1 1) Department of Anthropology, University of Michigan, Ann Arbor, MI; 2) Department of Anthropology, Yale University, New Haven, CT; 3) Department of Anthropology, Penn State University, University Park, PA; 4) Department of Obstetrics and Gynecology, University of Colorado, Aurora, CO; 5) Department of Exercise Science, Syracuse University, Syracuse, NY.

   During the colonization of Mesoamerica, one of the major causes of death was the introduction of novel infectious diseases. Among the most lethal infectious diseases was smallpox. Therefore, studying signatures of natural selection in genes related to smallpox infection and immune response not only provides a window to our evolutionary past but is also a particularly attractive strategy to identify host factors for modern infectious disease. To characterize host risk factors within Mesoamerican populations, we interrogated 906,600 SNPs assayed using the Affymetrix 6.0 genotyping array for signatures of natural selection in 231 immune response genes. Our populations included: Mesoamerican: 25 Maya and 14 Nahua, Mixtec, and Tlapanec speakers from Mexico, Andean: 25 Aymara from Bolivia, and 24 Quechua from Peru. Additionally, we used available data from 60 Europeans of northern European ancestry and 90 East Asians from China and Japan. We applied three statistical tests to identify signatures of natural selection: locus specific branch length (LSBL), the natural log of the ratio of heterozygosities (lnRH), and Tajima’s D. Furthermore, we analyzed partial and hard sweeps with two haplotype texts: integrated haplotype score (iHS) and cross population extended haplotype homozygosity (XP-EHH). We determined statistical significance based on an empirical distribution. Among our strongest results for positive selection were CD74ZAP-70, and IKZF1 that were significant in all the statistical tests at the 5% and 1% level for Mesoamericans between East Asians and European Americans comparisons. Furthermore, they were statistically significant in comparison to the Andean populations. CD74 is major histocompatibility complex class II (MHC II) invariant chain. Studies have shown CD74’s protein to function as a receptor for cytokine MIF, a critical immune response factor. ZAP-70 is an integral part of the T-cell signaling pathway thereby regulating adaptive immune response. Several studies have shown CD74 and ZAP-70 expression to be correlated. IKZF1 has mostly been studied as in autoimmune disorders as part of the pathway regulating haematopoiesis. The results of this study will aid future studies by pinpointing candidate genes for infectious disease susceptibility and resistance in Mesoamerican populations.

Selection and reduced population size cannot explain higher amounts of Neanderthal ancestry in East Asian than European human populations. B. Kim1, K. Lohmueller1,2 1) Ecology and Evolutionary Biology, University of California Los Angeles, Los Angeles, CA; 2) Interdepartmental Program in Bioinformatics, University of California Los Angeles, Los Angeles, CA.

   Understanding the Neanderthal ancestry of modern humans may provide crucial insights into the evolution of different human populations. It is believed that Neanderthals admixed with European and Asian populations to a much greater degree than with African populations. Additionally, recent studies show a higher frequency of Neanderthal alleles in East Asians relative to Europeans. Several hypotheses to explain this difference have been proposed. One hypothesis posits that there was a single admixture event in the population ancestral to modern Europeans and East Asians and that many of the Neanderthal alleles were weakly deleterious in modern humans. Because East Asians have historically had smaller population sizes than Europeans, purifying selection may have been less effective at removing the Neanderthal alleles from East Asian populations, leading to the observed higher proportion of Neanderthal ancestry in East Asians. Here we test this hypothesis using forward-in-time population genetic simulations. These simulations include plausible models of European and East Asian population history which have been estimated from data as well as models of the fitness effects of Neanderthal alleles in humans that include different dominance scenarios and a distribution of selection coefficients. Starting with the same amount of Neanderthal ancestry in both populations, we find that the differences in population size between European and East Asians combined with purifying selection cannot lead to the observed increase in the amount of Neanderthal ancestry in East Asian populations. Furthermore, when starting with the same initial amount of Neanderthal ancestry in both populations, realistic population size changes alone are insufficient to decrease or increase the Neanderthal ancestry in one population relative to the other. The observed data must be explained by some other process, such as additional waves of Neanderthal admixture into East Asian populations.

Asian diversity project: a survey of population structure and local adaptations in Asian populations. X. Liu1,2, D. Lu3, W. Y. Saw1, T. H. Ong1, C. Simmons4, P. Suriyaphol5, S. Tongisma6, B. P. Hoh7, N. Kato8, Y. Y. Teo1,9 1) Saw Swee Hock School of Public Health, National University of Singapore, Singapore; 2) NUS Graduate School, National University of Singapore, Singapore; 3) Max Planck Independent Research Group on Population Genomics, Chinese Academy of Sciences and Max Planck Society Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences,, Shanghai, China; 4) Oxford University Clinical Research Unit, Hospital for Tropical Diseases, Ho Chi Minh City, Viet Nam; 5) Division of Bioinformatics and Data Management for Research, Mahidol University, Bangkok, Thailand; 6) Genome Institute, National Center for Genetic Engineering and Biotechnology, Pathumtani, Thailand; 7) Insitute of Medical Molecular Biotechnology (IMMB), Faculty of Medicine, Universiti Teknologi MARA (UiTM) Malaysia, Sg Buloh, Selangor, Malaysia; 8) Department of Gene Diagnostics and Therapeutics, Research Institute, National Center for Global Health and Medicine, Tokyo, Japan; 9) Department of Statistics and Applied Probability, National University of Singapore, Singapore.

   As the largest continent on Earth, Asia hosts more than 60% of the human populations in the world. Great genetic diversity exists in the Asian populations. The HUGO Pan-Asian SNP consortium provided a valuable genetic resource of Asian populations and performed a thorough survey of genetic diversity and population history of Asian populations. However, the sparse coverage of SNPs made the analysis of natural adaption difficult to perform. In this study, we collected dense genotyping data from 46 populations across Asia. More than 4093 individuals from East Asia, Central Asia, Southeast Asia and South Asia were genotyped on various genotyping platforms. Principal components analysis (PCA) and admixture analysis were performed to elucidate the population structure in ADP populations. It was revealed that geographic played an important role in shaping the population structure of Asian populations; and the ADP populations were further grouped into East Asian, Central Asian, Southeast Asian and South Asian subgroups. We performed a genome wide scan of positive selection signals in the ADP populations using iHS, XP-EHH and haploPS. A total of 669 candidate selection regions were detected across the 46 ADP populations. A PCA analysis on the selection signals were performed to investigate the degree of sharing of the selection signals in the 46 populations. It was found that clustering of populations by selection signals resembles the clustering inferred from population structure analysis. East and Southeast Asian groups share the largest number of selection signals; and the South Asian group possesses distinct selection signals from the rest of the Asian populations. For selection signals shared by multiple populations, we studied the origin of the selection, ie. either the selection originated from a single mutation in the common ancestor followed by subsequent gene flow, or it was the result of convergent evolution, where the selection emerged separately from multiple mutation events. The origins of positive selection signals were investigated by calculating the haplotype similarity index. The haplotype similarity index identified 36 selection regions under convergent evolution, and most of them involve aboriginal populations from Southeast Asia.

The pleiotropic effects of EDARV370A in an admixed Uyghur population. Q. Peng1, J. Li1, J. Tan2,3, Y. Yang2,3, Y. Guan4, L. Zhang4, Y. Jiao4, P. Sabeti5,6, L. Jin1,2,3, S. Wang1 1) CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China; 2) MOE Key Laboratory of Contemporary Anthropology, School of Life Sciences and Institutes of Biomedical Sciences, Fudan University, Shanghai, China; 3) CMC Institute of Health Sciences, Taizhou, Jiangsu Province, China; 4) Department of Biochemistry, Preclinical Medicine College, Xinjiang Medical University, Urumqi, Xinjiang, China; 5) The Broad Institute of Harvard and MIT, Cambridge, USA; 6) Center for Systems Biology, Department of Organismic and Evolutionary Biology.

   An adaptive variant of the human Ectodysplasin receptor, EDARV370A, showed one of the strongest signals of recent positive selection from genome-wide scans. In transgenic mice and in humans, it is found that EDARV370A affects ectodermal related phenotypes, including hair thickness and shape, active sweat gland density, and teeth formation. However, previous human studies were all based on East Asian populations, in which the frequencies of ancestral allele 370V are low. It is inconclusive whether the genetic model of EDARV370A is additive or dominant. The lack of power was due to the low presence of 370V homozygotes, which made it impractical to explore a large spectrum of potentially affected ectodermal related phenotypes. In this study, we took advantage of an admixed population between East Asian and European – the Uyghur, to investigate the pleiotropic nature and the genetic model of EDARV370A. By examining a series of ectodermal related phenotypes and the EDARV370A genotype in 294 Uyghur samples, we replicated the previous association findings in incisors shoveling (P=5.76×10-12) and hair straightness (P=3.37×10-03), and further confirmed the association is following an additive genetic model. We also found EDARV370A associated with novel phenotypes including higher total sweat gland density (P=0.03) and triangular earlobes (P=2.05×10-04). By revealing more pleiotropic effects of EDARV370A and confirming its genetic model, our study provides a more complete picture for the adaptive evolution of EDARV370A in human history.

A hidden Markov framework to estimate the timing of selection for hard sweeps. J. Smith1, M. Stephens2, M. Przeworski3, G. Coop4, J. Novembre2 1) Department of Ecology and Evolution, University of Chicago, Chicago, IL; 2) Department of Human Genetics, University of Chicago, Chicago, IL; 3) Department of Biological Sciences, Columbia University, New York, NY; 4) Department of Evolution and Ecology, University of California–Davis, Davis, CA.

   Dispersal across the globe has resulted in humans occupying a wide range of ecological habitats. Natural selection seems to have played a role in this process, as current methods have identified a number of well supported loci that have undergone a recent selective sweep. In some cases, comparing estimates for the timing of selection with events in the historical/archeological record can provide a more clear picture of the ecological context driving adaptation in a population. For example, an overlap between cultural shifts towards dairy food production with the timing of selection on the lactase persistence allele has helped evaluate a possible cause for the observed selective sweep. As a result, there is substantial interest in methods to infer the age of a positively selected allele. A key principle for allele age estimation is that due to recombination and mutation, the signature of a selective sweep decays at a constant rate per generation. Current methods to estimate the age of selective sweeps either rely on a heuristic estimate of the length of the selected haplotype or employ a simulation-based framework to identify the distribution of ages that produce the observed summary statistics of the complete data. In practice the confidence intervals for these estimates are large. Here, we provide methods for inferring the ancestral haplotype of the selected allele and the recombination breakpoints off of this haplotype in order to provide more refined estimates of allele age. We do so using a hidden Markov model framework which allows us to integrate over uncertainty in recombination breakpoints. This framework uniquely uses both the present day length distribution of the ancestral haplotype and the number of derived mutations to estimate the number of generations since the sweep occurred. The joint use of haplotype lengths and derived mutations increases the total number of observed events and provides more narrow confidence intervals for the age estimate. Using this joint estimator on simulated data, 95% quantiles for estimates of sweep ages from 400 to 500 generations are within 35 generations of the true value. Whereas estimates based on derived mutations or haplotype lengths alone provide 95% quantiles ~70 generations from the true value. Future applications will revisit the timing of selection for lactase persistence in Northern Europeans, skin pigmentation alleles in Europe and Asia, and malaria resistance at the G6PD locus in Africa. 

Genome wide survey of positive selection signals in African Americans since admixture. H. Wang1, Y. Choi2, X. Wang3, B. Tayo4, u. Broeckel5, C. Hanis6, S. Kardia7, S. Redline8, R. Cooper4, H. Tang2, X. Zhu1 1) Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH; 2) Department of Genetics, Stanford University, Stanford, CA; 3) Departments of Preventive Medicine, Biomedical Informatics, and Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY; 4) Department of Public Health Science, Loyola University Medical Center, Maywood, IL; 5) Human and Molecular Genetics Center, Medical College of Wisconsin, Milwaukee, WI; 6) 6 Department of Epidemiology, Human Genetics and Environmental Sciences, University of Texas Health Science Center at Houston, Houston, TX; 7) Department of Epidemiology, University of Michigan, Ann Arbor, MI; 8) Department of Medicine, Harvard Medical School, Boston, MA, USA.

   In an admixed population such as African Americans, over or deficient ancestry in a local genomic region may suggest natural selection. We scanned three large African American cohorts of 20,153 individuals but failed to identify any genome-wide significant over or deficient signals. We showed that the failure to identify any significant selection signals can be attributed to the estimated variance of the test, which consists of two components: variance due to sampling error and variance due to genetic random drift. The proportion of variance due to genetic random drift increases when sample size increases. Thus, a test based on examining local ancestry excess is not efficient and its power will not increase when increasing sample size. We also showed that the high correlations of local ancestries between different cohorts are due to the historical recombination and genetic random drift. Assuming African-Americans have been admixed for 8 to 12 generations, we estimated the effective population size as between 32,000 to 48,000.

Khoisan hunter-gatherers have been the largest population throughout most of modern human demographic history. HL. Kim1,2, A. Ratan2, GH. Perry3, A. Montenegro4,5, W. Miller2, SC. Schuster1,2 1) Singapore Centre on Environmental Life Sciences Engineering, Nanyang Technological University, Singapore; 2) Center for Comparative Genomics and Bioinformatics, Pennsylvania State University, PA, USA; 3) Department of Anthropology, Pennsylvania State University, PA, USA; 4) Department of Geography, Ohio State University, OH, USA; 5) Campus do Litoral Paulista, Unesp – Univ Estadual Paulista, Brazil.

   We sequenced the complete genome sequences of five Khoisan hunter-gatherers from the Kalahari Desert and one Bantu-speaking agriculturalist individual also from southern Africa, with a high accuracy. Compared the 420K SNP genotyping dataset from 490 worldwide individuals, admixture analyses showed that three of our Khoisan genomes from the Ju/’hoansi group (northern Khoisan) have no or minimal admixture from non-Khoisan populations, allowing us to assess the early demographic history of the human species. Population genomic analyses for our complete genome sequences along with those from eight non-Khoisan humans were performed to infer their effective population sizes and demonstrated that the Ju/’hoansi population have maintained their large effective population size and been the people most isolated from all the other human populations, since the earliest population split between the Khoisan and other populations ~100-150 thousand years ago (kya). In contrast, all other human populations, including the ancestral Bantu-speaking agriculturists (currently the largest population within Africa in terms of census size), have experienced severe bottlenecks and lost more than half of their genetic diversity from ~120 to 30 kya. According to paleoclimate records and models, west-central Africa became drier, while southern Africa experienced increases in precipitation, ~80-100 kya. We hypothesize that these climate differences might be related to the divergent ancestral population history within African human populations.

The Kalash isolate from Pakistan. Q. Ayub1, L. Pagani1,2, M. Mezzavilla1,3, C. Tyler-Smith1 1) The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom; 2) Division of Biological Anthropology, University of Cambridge, Cambridge, United Kingdom; 3) Institute for Maternal and Child Health — IRCCS “BurloGarofolo” — Trieste, University of Trieste, Trieste, Italy.

   The Kalash represent an enigmatic isolated population that has been living for centuries in the Hindu Kush mountain ranges of present-day Pakistan. Previous uni-parental (Y and mitochondrial) DNA markers provided no support for their claimed Greek descent following invasion of this region by Alexander III of Macedon, and analysis of autosomal loci provide evidence of a strong genetic bottle-neck. To understand their origins and demography further, we genotyped 23 unrelated Kalash samples on the IlluminaHumanOmni2.5 BeadChip and sequenced a male individual at high coverage on an Illumina Hi-Seq 2000. Comparisons with neighboring populations confirmed results based on genotyping 650,000 common single-nucleotide polymorphisms in the Kalash samples from the Centre Etude Polymorphism Humain (CEPH) Human Genome Diversity Project (HGDP) Cell Line Panel. However, we observed no evidence for admixture as suggested recently by Hellenthal et al. The mean time of divergence between Kalash and other populations currently residing in this region, that also speak Indo-European languages, was estimated to be 11.8 (10.6 -12.6) KYA. Since the split the Kalash have experienced little, or no, gene flow from their geographic neighbors and have maintained a low long-term effective population size (2,247-2,780). They could represent some of the earliest migrants into the Indian sub-continent.

Identifiability and efficient inference of population size histories and locus-specific mutation rates from large-sample genomic variation data. A. Bhaskar1,2, Y. X. R. Wang3, Y. S. Song1,2,3,4 1) Simons Institute for the Theory of Computing, University of California, Berkeley, Berkeley, CA; 2) Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA; 3) Department of Statistics, University of California, Berkeley, Berkeley, CA; 4) Department of Integrative Biology, University of California, Berkeley, Berkeley, CA.

   Several recent large-sample human genetics studies have found a massive excess of rare variants compared to predictions of previously inferred demographic models of human history. A widely cited explanation is that such polymorphism patterns are indicative of explosive and accelerating population growth in recent human history. Using the site frequency spectrum (SFS), a summary of genetic variation in a set of sequences that counts the segregating sites as a function of the mutant allele frequency, we develop an efficient method for inferring recent population demography that can scale to samples involving tens or hundreds of thousands of individuals. Using analytic results for the expected SFS under the coalescent and by leveraging the technique of automatic differentiation, we develop a very efficient algorithm to infer piecewise-exponential models of the historical effective population size from the distribution of sample allele frequencies. Our method is orders of magnitude faster than previous demographic inference methods based on the frequency spectrum and can also accurately estimate locus-specific mutation rates. We show that our method can accurately infer multiple recent epochs of rapid exponential growth, a signal which is difficult to pick up with small sample sizes. We apply our method to a recent large-sample exome-sequencing dataset of 11,000 European individuals and find evidence of rapid recent exponential population growth of 1.5% per generation during the last 370 generations. We also study the statistical identifiability aspect of this inference problem. It has been recently shown that very different population demographies can generate the same SFS for arbitrarily large sample sizes. Although in principle this non-identifiability issue poses a thorny challenge to statistical inference, the population size functions involved in these counterexamples are arguably not biologically realistic. We revisit this problem and show that the SFS of even moderate-sized samples uniquely determines the population demography when the population size is piecewise-defined with each piece belonging to some family of biologically-motivated functions. In the cases of piecewise-constant, piecewise-exponential, and piecewise-generalized-exponential models, which are often assumed in population genomic inferences, we provide explicit values for the sample sizes that are sufficient for identifying the demographic model from the SFS.

Identity by descent segments within and across worldwide populations from sequence data. S. R. Browning1, B. L. Browning1,2 1) Department of Biostatistics, University of Washington, Seattle, WA; 2) Department of Medicine, Division of Medical Genetics, University of Washington, Seattle, WA.

   Segments of identity by descent (IBD) shared by individuals within and across populations provide information on key aspects of demographic history, such as effective population sizes and migration rates. 
   Sequence data present opportunities and challenges for IBD analysis. Sequence data are more informative than SNP array data, improving power to accurately detect smaller IBD segments and hence obtain higher levels of information about demographic history. On the other hand, low-coverage sequence data have high rates of error, whereas SNP array data are usually extremely accurate. 
   We recently developed two IBD segment detection methods: Refined IBD and IBDseq. Refined IBD is a haplotype-frequency-based method designed for SNP array data, while IBDseq is an allele-frequency-based method designed for low-coverage sequence data. Both methods were developed in the context of samples from a homogeneous population. When using frequency-based methods in a heterogeneous setting we expect increased rates of false-positive IBD within sub-populations. 
   We use 1000 Genomes Project data and simulated data to investigate the performance of the IBDseq and Refined IBD methods when analyzing sequence data from world-wide populations. We find that the allele-frequency-based IBDseq method suffers from increased rates of false positive detected IBD segments due to population heterogeneity, whereas the haplotype-frequency-based Refined IBD approach is much less affected. We develop a strategy using multiple runs of Refined IBD and a process of filling small gaps between adjacent detected segments in order to recover near-complete large IBD segments while having high power to detect short segments. Our approach enables powerful IBD detection in the 1000 Genomes project data.

The Population Genomic Landscape of Human Genetic Structure, Admixture History and Local Adaptation in Peninsular Malaysia. L. Deng1, B. Hoh2, D. Lu1, R. Fu1, M. Phipps3, S. Li4, A. Nur-Shafawati5, W. Hatin6, E. Ismail7, S. Mokhtar2, L. Jin4, B. Zilfalil5, C. Marshall8, S. Scherer8,9, F. Al-Mulla10, S. Xu1 1) Max Planck Independent Research Group on Population Genomics, Chinese Academy of Sciences and Max Planck Society (CAS-MPG) Partner Institute for Computational Biology (PICB), Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shangh; 2) Institute of Medical Molecular Biotechnology, Faculty of Medicine, Universiti Teknologi MARA, Sungai Buloh Campus, Jalan Hospital, 47000, Sungai Buloh, Selangor, Malaysia; 3) Jeffrey Cheah School of Medicine and Health Sciences, Monash University (Sunway Campus), Selangor 46150, Malaysia; 4) Ministry of Education (MOE) Key Laboratory of Contemporary Anthropology, School of Life Sciences and Institutes of Biomedical Sciences, Fudan University, Shanghai 200433, China; 5) Department of Pediatrics, School of Medical Sciences, Universiti Sains Malaysia, Kelantan 16150, Malaysia; 6) Human Genome Center, School of Medical Sciences, Universiti Sains Malaysia, Kelantan 16150, Malaysia; 7) School of Biosciences & Biotechnology, Faculty of Science & Technology, Universiti Kebangsaan Malaysia, Bangi 43600, Malaysia; 8) The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, Ontario, Canada; 9) McLaughlin Centre and Department of Molecular Genetics, University of Toronto, Toronto, Canada; 10) Department of Pathology, Faculty of Medicine, Kuwait University, Safat 13110, Kuwait.

   Peninsular Malaysia is a strategic region which might have played an important role in the initial peopling and subsequent human migrations in Asia. However, the genetic diversity and history of human populations—especially indigenous populations—inhabiting this area remain poorly understood. Here, we conducted a genome-wide study using over 900,000 single nucleotide polymorphisms (SNPs) in four major Malaysian ethnic groups (MEGs; Malay, Proto-Malay, Senoi and Negrito), and made comparisons of 17 world-wide populations. Our data revealed that Peninsular Malaysia has greater genetic diversity corresponding to its role as a contact zone of both early and recent human migrations in Asia. However, each single Orang Asli (indigenous) group was less diverse with a smaller effective population size (Ne) than a European or an East Asian population, indicating a substantial isolation of some duration for these groups. All four MEGs were genetically more similar to Asian populations than to other continental groups, and the divergence time between MEGs and East Asian populations (12,000—6,000 years ago) was also much shorter than that between East Asians and Europeans. Thus, Malaysian Orang Asli groups, despite their significantly different features, may share a common origin with the other Asian groups. Nevertheless, we identified traces of recent gene flow from non-Asians to MEGs. Finally, natural selection signatures were detected in a batch of genes associated with immune response, human height, skin pigmentation, hair and facial morphology and blood pressure in MEGs. Notable examples include SYN3 which is associated with human height in all Orang Asli groups, a height-related gene (PNPT1) and two blood pressure-related genes (CDH13 and PAX5) in Negritos. We conclude that a long isolation period, subsequent gene flow and local adaptations have jointly shaped the genetic architectures of MEGs, and this study provides insight into the peopling and human migration history in Southeast Asia.

Shared Identity by Descent segments within current Italian population reveals new details about recent population history. G. Fiorito1,2, C. Di Gaetano1,2, F. Rosa1, S. Guarrera1,2, B. Pardini1, A. Piazza1,2, G. Matullo1,2 1) Human Genetics Foundation, Turin, TORINO, Italy; 2) Department of Medical Sciences, University of Turin, Turin, Italy.

   The inference of Identity by Descent (IBD) shared segments were recently enabled by high-resolution genomic data from large cohorts and novel algorithms for IBD detection. This approach permits to examine more in detail the genetic structure of a population as well as to get information about recent demographic events such as bottlenecks and migrations. This study aims to characterize the genetic variability within the Italian population. We present analytical results on the relationship between IBD sharing across 301 unrelated Italian individuals genotyped for about 2.5 million Single Nucleotide Polymorphisms (SNPs). Each sample has well-defined geographical origins (four grandparents coming from the same geographical region). Due to the well-known common ancestral origin of the Italian population we focused our attention on long-range and relatively recent shared IBD segments. By using Principal Component Analysis (PCA) and ancestry estimation, we ascertain Sardinia as the genetic outlier within Italy. Moreover a certain degree of differentiation is still detectable within Aosta Valley population. For each of the 11 subpopulation, we find a significant highest number of shared IBD segments within vs. between population, suggesting isolation by distance. Samples sharing the highest number of internal IBD blocks are Sardinian as expected, followed by those living in Aosta Valley, Tuscany and Sicily. We also evaluate the relationship between shared IBD segments and geographical distance. Contrary to what is expected, the decay of IBD with distance is not steeper for longer (recent) blocks. Such result suggests a constant exchange due to several migratory waves within Italy and/or to the considerable high number of population that have lived in Italy. We finally demonstrate that regions of increased IBD sharing are enriched for structural variation and loci implicated in natural selection and we highlighted the relationship between shared IBD haplotypes and demographic events occurred both in Sardinia and in the Italian peninsula. In conclusion, our results suggest that the study of shared IBD segments between populations is a useful method to detect novel details about relatively recent population history.

Identity by descent between humans, Denisovans, and Neandertals. S. Hochreiter, G. Povysil Institute of Bioinformatics, Johannes Kepler University Linz, Linz, Austria.

   We analyze the sharing of very short identity by descent (IBD) segments between humans, Neandertals, and Denisovans to gain new insights into their demographic history. Short IBD segments convey information about events far back in time because the shorter IBD segments are, the older they are assumed to be. The identification of short IBD segments becomes possible through next generation sequencing (NGS), which offers high variant density and reports variants of all frequencies. Only recently HapFABIA has been proposed as the first method for detecting very short IBD segments in NGS data. HapFABIA utilizes rare variants to identify IBD segments with a low false discovery rate. We applied HapFABIA to the 1000 Genomes Project whole genome sequencing data to identify IBD segments which are shared within and between populations. Some IBD segments are shared with the reconstructed ancestral genome of humans and other primates. These segments are tagged by rare variants, consequently some rare variants have to be very old. Other IBD segments are also old since they are shared with Neandertals or Denisovans, which explains their shorter lengths. The Denisova genome most prominently matched IBD segments that are shared by Asians. Many of these segments were found exclusively in Asians and they are longer than segments shared between other continental populations and the Denisova genome. Therefore, we could confirm an introgression from Deniosvans into ancestors of Asians after their migration out of Africa. While Neandertal-matching IBD segments are most often shared by Asians, Europeans share more than other populations, too. Again, many of the Neandertal-matching IBD segments are found exclusively in Asians, whereas Neandertal-matching IBD segments that are shared by Europeans are often found in other populations, too. Neandertal-matching IBD segments that are shared by Asians or Europeans are longer than those observed in Africans. This hints at a gene flow from Neandertals into ancestors of Asians and Europeans after they left Africa. Interestingly, many Neandertal- or Denisova-matching IBD segments are predominantly observed in Africans – some of them even exclusively. IBD segments shared between Africans and Neandertals or Denisovans are strikingly short, therefore we assume that they are very old. This may indicate that these segments stem from ancestors of humans, Neandertals, and Denisovans and have survived in Africans.

Exploring Detailed Demographic Histories of Human Populations Using SNP Frequency Spectrums. X. Liu, Y.-X. Fu Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX.

   Inferring human demographic history using genetic information can shed light on important prehistoric evolutionary events such as population bottleneck, expansion, migration, and admixture, among others. It is also the foundation of many population genetics analyses, as demographic history is one of the most important forces shaping the polymorphic pattern of DNA sequences. We developed a novel model-free method called stairway plot, which infers detailed population size changes over time using SNP frequency spectrums. This method can be applied to low-coverage sequence data, pooled sequence data and even reference-free sequence data for species whose reference genome are not yet available. Another advantage of this method is the ability to handle whole-genome sequences of hundreds of individuals. Using extensive simulation we compared our method to Li and Durbin’s method based on the pairwise sequentially Markovian coalescent (PSMC) framework and the results show that our method outperformed the PSMC method for inferring recent population size changes. We applied our method to the genomes of nine non-admixed populations (CEU, GBR, TSI, FIN, CHB, CHS, JPT, YRI and LWK) from the 1000 Genomes Project, and showed a detailed pattern of human population fluctuations from 10 to 500 thousand years ago (kya). The results supported many mainstream viewpoints on the demographic histories of human populations, and at the meantime also produced several interesting observations worth further and more careful investigations.

Exome sequencing of 3,000 individuals reveals differences in recent demographic history between East Asian and European populations. K. E. Lohmueller1, M. He2,3, Y. Li3, B. Kim1, L. Sun4, X. Zhang4, X. Jin3, K. Kristiansen3,5, T. Hansen6,7, J. Wang3, O. Pedersen7,8,9, E. Huerta-Sanchez10, R. Nielsen5,10 1) Ecology and Evolutionary Biology, University of California, Los Angeles, Los Angeles, CA; 2) Bioinformatics and Computational Biology Program, Iowa State University, Ames, IA; 3) BGI-Shenzhen, Shenzhen, China; 4) Department of Dermatology, First Affiliated Hospital, Anhui Medical University, Hefei, China; 5) Department of Biology, University of Copenhagen, Copenhagen, Denmark; 6) Faculty of Health Sciences, University of Southern Denmark, Odense, Denmark; 7) The Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, Copenhagen, Denmark; 8) Faculty of Health Sciences, Aarhus University, Aarhus, Denmark; 9) Institute of Biomedical Sciences, University of Copenhagen, Copenhagen, Denmark Denmark; 10) Integrative Biology, University of California, Berkeley, Berkeley, CA.

   Studies of genetic variation in thousands of individuals have found evidence for extreme population growth within the last 10,000 years in European and African American populations. The magnitude of recent growth in other continental populations, such as East Asians, has received comparatively little attention. In order to learn more about recent population history in East Asia, here we analyze high-coverage exome sequencing data from 1,449 Han Chinese individuals sampled from the Anhui province of China and 1,449 Danish individuals. We estimated recent demographic history using the site frequency spectrum. We find that the current effective size of the Han is approximately 4-fold larger than that estimated in the Danish population. Thus, while previous studies of common variants suggest historically smaller effective sizes in East Asian populations relative to European populations, our estimates of recent effective population sizes show the opposite pattern and trend in the same direction as the census population sizes. Next, we characterize the relationship between our estimates of the current effective population sizes and the census sizes. The ratio of the census size (over the last 200 years) to the recent effective size is significantly higher in the Han population than in the Danish population (P<2×10-4). This difference can be explained by greater variance in reproductive success in the Han population as compared to the Danish population. Alternatively, this result could be due to greater migration into the Danish population than the Han population. While it is appreciated that effective sizes of human populations are smaller than the census sizes, here we demonstrate that the magnitude of this difference varies across populations, even after accounting for population size changes. Finally, we examine patterns of deleterious variants in the Han and Danish populations. We find that the proportion of private variants that are nonsynonymous is higher in the Han sample (67.6%) than in the Danish sample (64.6%; P<10-10), consistent with recent population growth increasing the input of weakly deleterious mutations into the population that selection has not had sufficient time to remove. Our study provides the first analysis of recent population history and exploration of neutral and deleterious rare variants in an East Asian population.

Analysis of Genetic Diversity Representation of the 1000 Genomes in Worldwide Human Populations. D. Lu, S. Xu Partner institute for Computational Biology, Shanghai, Shanghai, China.

   The 1000 Genomes Project (1KG) aims to provide a deep characterization of human genome sequence variation, by design was expected to aims to provide a comprehensive resource on human genetic variation. With an effort of sequencing 2,500 individuals, 1KG is expected to cover the majority of the human genetic diversities worldwide. However, it would be interesting to evaluate to what extent the 1KG data represent the genetic diversity of human populations in each region, which will give insight into the power of 1KG and also give guidance to regional efforts for further sequencing project and study design. In this study, using analysis of population structure based on genome-wide single nucleotide polymorphisms (SNPs) data, we examined and evaluated the coverage of genetic diversity of 1KG samples with the available genome-wide data from 3,831 individuals representing 140 worldwide population samples. We demonstrated that the 1KG does not have sufficient coverage of human genetic diversity in Asia, especially in Southeast Asia. We thus suggest a better coverage of Southeast Asian populations be considered in 1KG or a regional effort be initialized to provide a more comprehensive characterization of the human genetic diversity in Asia, which is important for both evolutionary and medical studies in the future.

Visualizing the Geographic Distribution of Genetic Variants. J. H. Marcus, J. Novembre Department of Human Genetics, University of Chicago, Chicago, IL.

   One of the core features of any genetic variant, beyond its potential phenotypic effects or its frequency, is its geographic distribution. The geographic distribution of a genetic variant can shed light on where the variant first arose, in what populations it survived and spread within, and in turn help us learn about historical patterns of migration and natural selection. Collectively the geographic distribution of genetic variants can help to explain how populations have been related through time (e.g. levels of gene flow and divergence). For variants with large effects, it can also help us understand the geographic distribution of spatially-varying phenotypes. For these reasons, visual inspection of geographic maps for genetic variants is common practice in genetic studies. Here we develop a series of reusable interactive visualizations for illuminating the geographic distribution of genetic variants. We specifically address several non-trivial challenges of this type of visualization; in particular, how to represent non-uniform levels of uncertainty in allele frequencies due to variable sample sizes; how to represent results from data with >10,000 individuals in which allele frequencies can vary over 4 orders of magnitude; how to display data for regions of the globe with dense sampling of populations; and how to quickly access frequency data from large samples. To meet these challenges, we implement a flexible REST API for allowing for easy access to allele frequency and sample size data from large scale public genomic datasets. Built upon this API we develop a web-based browser, entitled the Geography of Genetic Variants (GGV) browser for visualizing the geographic distribution of genetic variants. The GGV browser rapidly provides maps of derived allele frequencies in populations distributed across the globe. The GGV browser builds upon past tools such as the HGDP Selection browser by allowing for more interactive features, new representations of rare variation, as well as incorporating uncertainty in allele frequency estimation. As ancillaries, we also develop a research visualization toolkit that includes a method for displaying high Fst outlier SNPs from the joint site frequency spectrum and an interactive version of commonly used PCA figures. We hope the GGV browser will be a valuable research and education tool for exploring population genetics data.

Finding the oasis of humanity in Neanderthal deserts. B. Vernot, JM. Akey Department of Genome Sciences, University of Washington, Seattle, WA.

   As anatomically modern humans dispersed out of Africa, they encountered Neanderthals in Eurasia and low levels of hybridization occurred such that approximately 2% of each non-African’s genome is inherited from Neanderthal ancestors. Recently, we developed an approach to identify surviving Neanderthal lineages in contemporary individuals, and recovered over 600 Mb of the Neanderthal genome present in modern non-African populations [1]. The map of surviving Neanderthal sequences shows marked heterogeneity across the genome, and we identified many “deserts of Neanderthal sequence” that are almost entirely devoid of Neanderthal sequence. These genomic regions are of particular interest because they delimit sequences that may confer uniquely human characteristics. For example, the largest Neanderthal desert is a 15Mb region on Chromosome 7, centered around the FOXP2 gene, which has previously been implicated in speech and language. Here, we present a detailed characterization of Neanderthal deserts by analyzing surviving archaic sequences in an expanded sample of geographically diverse individuals. We have developed a formal statistical test to identify genomic regions significantly depleted of Neanderthal lineages, and performed extensive simulations to infer the strength of purifying selection acting on these Neanderthal deserts. Additionally, we have utilized extensive bioinformatics analyses superimposing heterogenous functional genomics data to identify candidate causal variants. These analyses provide significant new insights into regions of the human genome that harbor sequences that have played a critical role in the evolution of anatomically modern humans, and suggest that regulatory sequences responsible for muscle, bone, and brain development were key differences between humans and Neanderthals. [1] Vernot and Akey, Science, 2014.

Population structure in the UK: Rare variant analysis using whole genome sequencing in 3,621 samples in the UK10K cohorts project. K. Walter1, S. Metrustry2, E. Zeggini1, Y. Memari1, J. Min3, J. Huang1, M. Cocca4, S. Schiffels1, I. Mathieson5, D. Lawson6, N. Soranzo1, UK10K Consortium Cohorts Group 1) Human Genetics, Wellcome Trust Sanger Institute, Hinxton, United Kingdom; 2) Twin Research & Genetic Epidemiology, Kings College London, United Kingdom; 3) MRC CAiTE Centre, University of Bristol, United Kingdom; 4) Institute for Maternal and Child Health-IRCCS ʻʻBurlo Garofolo”-Trieste, University of Trieste, Italy; 5) Harvard Medical School, Boston MA 02115, United States; 6) Heilbronn Institute, School of Mathematics, University of Bristol, United Kingdom.

   Population structure is a well-characterized potential confounder of association studies based on common variants, but the structural pattern for rare variants and their influence on association studies is less understood. The cohorts arm of the UK10K project undertook whole-genome sequencing at low-read depth (median ~7x) in nearly 4,000 individuals from two large population samples in the UK (TwinsUK, N=1,754 and the Avon Longitudinal Study of Parents and Children (ALSPAC), N=1,867) in a comprehensive exploration of associations between rare and common genetic variants and a set of 61 bio-medically important quantitative phenotypes. The two study cohorts have marked differences in demographic profile with ALSPAC participants originating from a geographically restricted area (Bristol) in the South West of the UK, while the TwinsUK participants were born in different parts of the UK. After stringent QC steps, the data set comprises 42 million SNPs, 3.5 million INDELs and about 18,000 large deletions across 3,621 study participants. Here we describe the extent to which geographic stratification exists at rare variants by focusing on 31 shared ‘core’ phenotypes in 1,139 twins with available place of birth data throughout the United Kingdom. We modeled genetic structuring using a Euclidian distance metric, a regional grid and generalized additive models (GAM) applied to latitudinal and longitudinal data, and for single nucleotide variants of different minor allele frequencies separately. We further modeled correlation of genotypic and phenotypic data at these geographical locations, and compared them to simulated datasets. Finally, we applied Mantel tests to analyze the significance of genotypic and phenotypic relationships given the distance metrics. Overall, these analyses suggested that there is a moderate genetic structuring of very rare alleles (MAF=0.1-0.3%), however this structure is not associated with phenotypic variation and is unlikely to pose a serious concern for association studies of complex quantitative phenotypes and rare variation in the UK.

Sequencing the genomes of single cells. P. Ribaux, C. Borel, F. Santoni, E. Falconnet, S. E. Antonarakis Dept Genetic Medicine, Univ Geneva Medical School, Geneva, Switzerland.

   Whole-genome amplification and next-generation sequencing advances enable investigation of somatic structural and nucleotide variation to single-cell resolution. The ultimate goals of our study are (i) to identify disease-associated somatic mutations and (ii) to uncover the extent of low-abundance DNA variations in individual cancer cells in order to underlie mechanisms of tumor evolution. Because of the technical challenge of detecting and analyzing genomic heterogeneity among single cells, we first analyzed individual cells in culture and tested the robustness of our experimental workflow. We choose the K562 cells, a human immortalized myelogenous leukemia line and F-T21, a human primary Trisomy 21 fibroblast cell line. We used the C1 Single Cell Auto Prep System (Fluidigm) to capture hundreds of individual cells and to generate high quality of individual amplified DNA. So far, 96 barcoded whole-exome libraires were sequenced at deep coverage (PE, 100bp). Variant calls (CNVs and SNVs) were generated with an in-house analysis pipeline. Here, we will discuss the amplification uniformity, the detectable fraction of the exome and the level of DNA contamination. By comparing single cells and bulk of cells datasets, we will assess the percentage of allelic drop out for each each single-cell exome based on the heterozygous SNVs. High quality single-cell genome sequence will greatly enhance the genetic analysis of somatic genomic disorders. C.B. and P.R. contributed equally.

Monozygotic Twin Pairs: CNV and sequence concordance. A. Abdellaoui1, E. Ehli2, J. J. Hottenga1, Z. Weber2, H. Mbarek1, G. Willemsen1, T. van Beijsterveldt1, A. Brooks3, J. J. Hudziak4, P. F. Sullivan5, E. C. J. de Geus1, K. Ye6, P. E. Slagboom7, G. E. Davies2, D. I. Boomsma1 1) Biological Psychology, VU University Amsterdam, Amsterdam, Noord Holland, Netherlands; 2) Avera Institute for Human Genetics, Avera McKennan Hospital & University Health Center, Sioux Falls, SD, USA; 3) Department of Genetics, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA; 4) University of Vermont, College of Medicine, Burlington, VT, USA; 5) Department of Genetics, University of North Carolina, Chapel Hill, North Carolina, USA; 6) The Genome Institute, Washington University School of Medicine, St. Louis, Missouri, USA; 7) Molecular Epidemiology, Leiden University Medical Center, Leiden, Netherlands.

   Monozygotic (MZ) twins are genetically identical at conception, making them informative subjects for studies on somatic mutations. Copy number variants (CNV) are responsible for a substantial part of genetic variation, have relatively high mutation rates, and have been associated with susceptibility to disease, such as autism and schizophrenia. We conducted a genome-wide survey for post-twinning de novo CNVs (i.e., not shared by co-twins) in ~1,100 MZ twin pairs who had been repeatedly phenotyped across a wide range of traits, and of which a large proportion has gene-expression and methylation data available. CNVs from 1,097 MZ twin pairs were measured in DNA from peripheral blood samples (mostly in adults) or buccal epithelium (mostly in children) with the Affymetrix 6.0 microarray. Whole-genome sequencing was performed in DNA from blood samples from 13 MZ twin pairs and their parents (12x coverage – Illumina, and 2 twin pairs additionally sequenced with Complete Genomics). We found a total of 153 putative post-twinning de novo CNVs >100 kb, of which the majority resided in the same unstable genomic region (15q11.2). Based on how well the raw intensity signals visually agreed with CNV calls made by the two algorithms, a first selection was made of eleven de novo CNVs from 15q11.2 for a first series of qPCR validation experiments. Two out of eleven post-twinning de novo CNVs were validated with qPCR in the same twin pair. This 13-year old twin pair did not show large phenotypic differences. The remaining putative de novo CNVs from 15q11.2 were found significantly more often in older twins, suggesting that we are capturing real signals. The large putative de novo CNVs detected with microarray data were not present in the subsample that had whole-genome sequence data available. We do expect the whole-genome sequence data to allow us to search for smaller de novo CNVs that cannot be detected with micro-array data.

Beyond the 1000 Genomes Project. L. Clarke, H. Zheng-Bradley, A. Datta, I. Streeter, D. Richardson, P. Flicek, The 1000 Genomes Consortium Vertebrate Genomics, European Molecular Biology Laboratory – European BioInformatics Institute (EMBL-EBI), The Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom.

   The 1000 Genomes Project provides an essential reference catalog of human variation with more than 60 million variant sites ranging from single nucleotide polymorphisms to structural variant events including inversions and duplications. Also provided are global allele frequencies and genotypes for 2535 individuals from 26 different populations across Europe, Africa, East and South Asia and the Americas, which enable many other projects to better interpret their results. Primary uses for the 1000 Genomes data sets include imputation panels to create whole genome variant sets from exome or array-based genotypes; as filters of “normal” or shared variation in rare disease or cancer sequencing projects; and to explore demography and selection in human populations. The 1000 Genomes Project is now drawing to a close. Here we describe plans to maintain the resource in order to ensure it remains the valuable data set it is today by providing long-term support for the 1000 Genomes Project resource. For example, we will continue to host both the FTP site (ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp) and the project website (http://www.1000genomes.org) to ensure the community can access both the raw data and the documentation about the project. We will also create a stable version of the 1000 Genomes Browser (http://browser.1000genomes.org) based on the project’s final date release. This project specific Ensembl-based browser displays all of the 1000 Genomes variants as soon as possible and will use the GRCh37 assembly of the human reference genome. We will also maintain the existing tools and incorporate new ones as appropriate to enable users to easily access the data they desire. Our most popular tools are the Data Slicer—that allows users to select genomic subsections of our alignment (BAM) and variant (VCF) files and thus download just the piece of the file they need—and the Variation Pattern Finder, which allows users to discover patterns of shared variation in a specific region of the genome. Other tools include the VCF to PED converter, which allows users to generated PLINK format files from remotely hosted VCF files and the recently introduced the Allele Frequency Calculator that will calculate allele frequencies in bulk for specific sub populations from our VCF files.

Next generation association studies in isolated populations. E. Zeggini1, L. Southam1,2, K. Panoutsopoulou1, K. Hatzikotoulas1, G. R. S. Ritchie1, A.-E. Farmaki3, I. Tachmazidou1, A. Matchan1, N. W. Rayner1,2, J. Schwartzentruber1, I. Ntalla3, E. Tsafantakis4, M. Karaleftheri5, G. Dedoussis3, A. Gilly1 1) Wellcome Trust Sanger Institute, Hinxton, United Kingdom; 2) Wellcome Trust Centre for Human Genetics, University of Oxford, UK; 3) Harokopio University Athens, Athens, Greece; 4) Anogia Medical Centre, Anogia, Greece; 5) Echinos Medical Centre, Echinos, Greece.

   Isolated populations have unique characteristics that can be leveraged to increase power in genetic association studies. In founder populations genetic drift can drive trait-associated alleles to higher frequency and thus enable the identification of rare variant associations with smaller discovery sets. We have collected samples from two isolated populations in Greece (HELlenic Isolated Cohorts study): the Pomak villages (HELIC-Pomak) in the North of Greece; and the Mylopotamos villages (HELIC-MANOLIS) on Crete. All samples (n~3000) have information on a wide array of anthropometric, cardiometabolic, biochemical, haematological and diet-related traits, genotypes from the Illumina OmniExpress and exome-chip platforms, and are being whole-genome sequenced at low depth. Using 1x WGS data from 995 (HELIC-MANOLIS) individuals, we demonstrate that over 80% of true low-frequency (0.01<MAF<0.05) variants are found, compared to an average 60% for 0.001<MAF<0.01 and 40% for MAF<0.001. Genotype concordance reaches >95% and minor allele concordance >90% across the whole MAF spectrum. We replicate known association hits, thereby providing a proof of concept for a robust processing pipeline for low-depth WGS variant calls. Using genotype data, we find that 80% of subjects have at least one “surrogate parent” in the isolates, compared to 1% in the outbred Greek population. In the MANOLIS cohort we observe an enrichment of missense variants amongst the variants that have drifted up in frequency by >5 fold. We have previously reported a lipid traits association with a functional variant in the APOC3 gene in 1267 individuals in MANOLIS. The equivalent sample size needed to detect this in the general European population would be 67,000. In the Pomak cohort we find novel associations at variants on chr11p15.4 showing large allele frequency increases (from 0.2% in the general Greek population to 4.6% in the isolate) with haematological traits, for example with mean corpuscular volume (at rs11035019, beta=-1.249, p=3.45×10-29). Their detection in cosmopolitan populations would necessitate thirteen times as many samples. We demonstrate the significant power gains that can be afforded by studying well-characterised founder populations.

Admixture mapping of exome genotyping data implicates region 15q21.2-22.3 with keloid risk in African Americans. K. S. Tsosie1,2, D. R. Velez Edwards1,3,4,5, S. M. Williams6, T. L. Edwards1,2,3,4, S. B. Russell1,7 1) Center for Human Genetics, Vanderbilt University, Nashville, TN; 2) Division of Epidemiology, Department of Medicine, Vanderbilt University, Nashville, TN; 3) Vanderbilt Epidemiology Center; 4) Institute for Medicine and Public Health; 5) Department of Obstetrics and Gynecology, Vanderbilt University, Nashville, TN; 6) Department of Genetics, Geisel School of Medicine, Dartmouth University, Hanover, NH; 7) Division of Dermatology, Department of Medicine, Vanderbilt University, Nashville, TN.

   Keloids (MIM 148100) are benign dermal fibrotic tumors with no effective clinical remedy that affect people of recent African ancestry approximately 20 times more than individuals of Caucasian descent. Possible related fibroproliferative diseases with increased prevalence in African populations include hypertension, nephrosclerosis, allergic disease, and uterine fibroma. Familial aggregation and ancestral differences in risk among geographic subpopulations strongly suggests a genetic association between African ancestry, keloids and fibroproliferative disease risk. There are no published genome-wide studies of keloid risk in African ancestry subjects. We conducted admixture mapping (AM) and whole exome association in 478 African Americans (AAs: 122 cases, 356 controls) with exome arrays to identify regions of local ancestry and SNP genotypes under AM peaks associated with keloid risk. Results: The most significant association with keloids discovered by AM was observed on chr15q21.2-22.3. This 5Mb region includes NEDD4, which was previously implicated in keloid formation by GWAS in Japanese and later validated in Chinese. Though our study nominally replicated this finding by AM and genotype association, the most significant SNP genotype association under the AM peak was observed at MYO1E (rs747722, odds ratio [OR]=4.41, 95% confidence interval [CI]=2.29-8.50, p=9.07×10-6). A scan of all common genotype associations also identified associations at MYO7A (rs35641839, OR=4.71, 95% CI=2.38-9.32, p=8.34×10-6) at chr11q13.5. GWAS have linked the chr15q21.2-22.3 region with hypertension in AAs, asthma in Europeans, and atherosclerosis in a Finnish cohort, providing evidence for common genetic elements. Examination of earlier microarray data of fibroblasts from keloids and normal scars that included some subjects from this study also implicated chr15q21.2-22.3 as a causal region for keloids, with increased expression of MYO1E in keloids compared to normal scars. Notably, MYO1E has been shown to be a crucial component of the invadosome, a structure involved in matrix degradation and invasion and thus may have a functional role in the keloid phenotype. Conclusion: This study is the first to use AM and exome array association analysis to explore the genetics of keloids in AAs. Our findings, strengthened by support from expression data, further elucidate a potential region on chr15q21.2-22.3 for a role in risk of keloids in AAs, Japanese, and Chinese populations.

Exome sequencing of 487 Community Acquired Pneumonia patients. K. S. Elliott1, A. Ndungu1, T. C. Mills1, A. L. Rautanen1, P. Hutton2, C. Garrard2, A. Gordon3, C. M. Hinds4, M. Lathrop5, A. V. S. Hill1, S. J. Chapman1 1) Wellcome Trust Centre Human Genetics, University of Oxford, Oxford, UK; 2) Intensive Care Unit, John Radcliffe Hospital, Oxford, UK; 3) Anaesthetics, Pain Medicine and Intensive care, Imperial College, London, UK; 4) William Harvey Research Institute, Queen Mary University of London, London EC1M 6BQ, UK; 5) McGill University-Génome Québec Innovation Centre, Montreal, Canada.

   Respiratory infection is the largest contributor to global disease burden and pneumonia kills over one million children each year. A major genetic component of the infectious disease was demonstrated by a study of Danish adoptee children where a 5.8-fold increased risk of death from infectious disease was observed if one of their biological parents had died prematurely from infection. Severe bacterial disease may exert enormous selective pressure leading to the finding of rare susceptibility variants of relatively recent origin. In order to identify such variants an exome sequencing study was undertaken. DNA samples from 487 adult UK individuals admitted to an intensive care unit with severe community-acquired pneumonia (CAP) were collected as part of a study of genetic predictors of death from sepsis in critically ill patients (Genomic Advances in Sepsis [GAinS]). Analysis of sepsis susceptibility was performed on a discovery cohort of 270 CAP samples compared to the UK10K ALSPAC control dataset. After stringent QC criteria were applied, 135,392 variants were identified. Of these, 43 reached ExWAS significant threshold for association (p < 3.6 x 10-7) and an additional 63 variants were suggestive (p < 1 X 10-4). The exomes from the remaining 217 CAP patients are being analysed as a replication dataset against the UK10K TWINSUK control dataset. The sepsis outcome phenotype was also analysed, measured as 28 day mortality post-ICU admission within the 487 combined CAP cohorts (deaths n=237, survivors n =237, unknown n=13). In single variant analysis of – sepsis outcome, seven variants were identified reaching the ExWAS significance threshold including variants in two related genes known to be involved in thrombosis. Collapsing methods with rare deleterious variants are being performed to detect gene centric associations. Identification of novel, large-effect genetic variants has the potential to significantly expand current understanding of sepsis biology and may have clinical applications.

WordPress theme: Kippis 1.15