There’s a new paper in Science, A network framework of cultural history, which is interesting, and naturally a media splash (it is in Science). The paper illustrates the power of “Big Data” in a domain where most people have not thought to utilize big data. The authors state that “we have reconstructed aggregate intellectual mobility over two millennia through the birth and death locations of more than 150,000 notable individuals.” In the past historical judgments, especially those in the domain of culture, had to be conceded to those with a thick and dense personal database gained through a lifetime of erudition. I’m thinking for example the reduction of a lifetime of scholarship that you can literally feel as you work your way through Jacques Barzun’s magisterial From Dawn to Decadence. But there are serious shortcomings with this sort of intellectual endeavor when it comes to gaining a better grasp of reality. A personal example should suffice. I’ve been thinking of purchasing Alan Cameron’s The Last Pagans of Rome. We don’t need to get into the details, but Cameron basically argues for a major revision for our understanding of the Late Antique transition from paganism to Christianity. This argument is buttressed by the fact that Cameron is one of the world experts, a master of the literature without parallel. The problem is that how exactly do you judge the quality of argument from someone who has a better grasp of the topic than you, perhaps by orders of magnitude? Who are you to disagree if Peter Brown is impressed?
Some of the same issues outlined by Noam Chomsky in his famous critique of fashionable “deconstructionism” actually applies to elements of humanistic scholarship in practice if not the ideal. In actuality I could learn Latin and Greek and wend my way through the scholarship which Cameron draws upon (I suspect that I would do so more slowly because I lack a strong natural adult fluency with languages, but it is feasible). But in reality very few people have the time to reproduce to the same magnitude the knowledge database of a specialist in a particular area. With the diversity within academia it may even be that a given topic has only a few individuals of parity in terms of expertise. Obviously this causes a problem, because at the end of the day many arguments have to be resolved by appeals to authority. There are some workarounds. For example, one may not have specialized knowledge about a particular area, but in many cases specific instances are likely part of a broader pattern. I’m more liable to give credence to a particular argument if I can check analogous cases in other contexts where I do have thick knowledge, and see that it checks out. If, on the other hand, the argument flies in the face of the general trend I am more skeptical.
But a better ultimate solution is to quantitize and formalize. This is why the above paper is exciting. The only schematic that I can recall from The Origin of Species is a tree of life, the precursor to the phylogenetic trees which are common today. But it was the cladistic revolution, and later the emergence of computational statistical methods, which have revolutioned phylogenetics and turned it into a reproducible science which does not rely upon specialized domain knowledge. Before World War II if you wanted to know about the systematic relationship of ant genera you would have to consult an expert or a work written by experts. Today you can actually pull some sequence data and construct the tree yourself! That is where we need to be when it comes to a scientific understanding of human culture and history.
Nevertheless there are downsides to this process. In National Geographic Peter Turchin says of this work: “This is a terrific data set, but they are not testing a scientific question here….” If you read the paper, and the press coverage, you see lots of neat visualizations which are representations of the patterns extracted from the data, but to a great extent they are representations of what we already know. Very few non-quantitative scholars would be surprised that eminent individuals tend to move from rural areas to urban ones. Or that Rome and Athens were prominent magnets in 300 AD while London was in 1800 AD. There is value to be gained in formalizing this, to establish an algebra of history if you will. But this is not revolutionary; the field of cliometrics has been around for two generations. What is different is that computational methods can be brought to analyze data far more effectively. But a major temptation of this sort of cutting edge analysis is data dredging, as well as the issues that come with ascertainment bias. For example in National Geographic the first author states:
The distance that people moved over their lifetimes has also changed “very little,” the study says, over the past eight centuries. It grew from a typical distance of 133 miles (214 kilometers) in the 14th century to 237 miles (382 kilometers) today, despite the advent of automobiles and airplanes. Schich expected that the opening of the 3,000-mile (4,828-kilometer) trip to the New World after 1492 would stretch the distance much farther.
“People in the past were not so different from us,” Schich says, noting the records include accounts of Jesuit priests who traveled to China in the 17th century. “It’s very strange to think my odds of moving a long distance are similar,” he says, with a laugh.
This is certainly a result that would surprise many, but please remember that the database is a selection of notable individuals. I would bet that the change would be far greater if you had a sampling of most of the world’s population, rather than ~150,000 extremely notable ones from the last few thousand years. Immanuel Kant aside, those people in the past who became famous often did so by migrating and getting involved in the events of the world, which entailed travel. They were atypical (consider also that every single Roman Emperor was probably functionally literate in a world where this was a minority capacity [the idea that Justin was illiterate is probably a slander]).
To make the best use of the data we need to be clear about our thinking. I do think it goes beyond just asserting that we need hypotheses, though that’s part of it. Genomics is the product of the age of big(ger) data, and it has had to deal with problems of false positives being confused for real signals because old statistical thresholds became out of date. Culturomics has a lot it could learn from the experience of biologists before 2010. With all that said, there is a body of formal theory which can move in and start to operate upon the data set. Boyd and Richersen’s The Origin and Evolution of Cultures and Cavalli-Sforza and Feldman’s Cultural Transmission and Evolution are good places to start. Recently I read Alex Mesoudi’s Cultural Evolution: How Darwinian Theory Can Explain Human Culture and Synthesize the Social Sciences, which is newer, and probably aimed at an audience that is a touch less specialist. I highly recommend it for those interested in this topic (if you have an evolutionary genetics background much of it goes fast because it is review of basic theory).
Let me finish with a quote from The Genetic Basis of Evolutionary Change by Richard Lewontin:
For many years population genetics was an immensely rich and powerful theory with virtually no suitable facts on which to operate. It was like a complex and exquisite machine, designed to process a raw material that one had succeeded in mining. Occasionally some unusually clever or lucky prospecter would come upon a natural outcrop of high-grade ore, and part of the machinery would be started up to prove to its backers that it really would work. But for the most part the machine was left to the engineers, forever tinkering, forever making improvements, in anticipation of the day when it would be called upon to carry out full production….
Apply your regular expression substitution where appropriate.