Monday, November 12, 2012

L'H?te: empiricism is inextricable from theory, data is meaningless ...

This post will be quite long. Please consider that before deciding whether to read it.

I'm writing this from my office. I came here at an ungodly early hour on a Saturday morning because I've been trying to work out a problem. The problem lies in a particular search string. I'm currently engaged in a research project investigating the composition processes of second language learners, in this case Chinese and Hindi L1s. I am utilizing corpus linguistics software to mine a vast archive of student essays for certain patterns of argumentative and rhetorical structures. The software reports back to me about frequency and position, and through these outputs I can statistically compare the use of such structures between demographics-- language of origin, years of education in English, etc. And since a key to pragmatic results from second language studies is often reference to native speakers, I'm creating a baseline through reference to an equivalently-sized corpus of L1 English subjects.

Much of corpus linguistics has focused on the level of morphosyntax, for the simple reason that the software is better equipped to look for certain word-level constructions or word pairings than it is to examine the larger, more complex, and more variable argumentative plane. English is notoriously morphologically inert; that is, our use of inflections such as affixes is quite limited in comparison to other languages. (Compare, for instance, to a language like Spanish.) For this reason, searching for particular syntactic structures with computers can be quite tricky. It's also for this reason that formalist poets in other languages often have an easier go of it than in English-- it's much harder to write a villanelle or in terza rima when words lack consistent inflectional endings. In a language like Latin, word order is vastly more malleable because the inflections carry so much of the information necessary for meaning. In English, word order is quite mutable in an absolute sense but quite restricted in comparison to many languages. (There are exceptions, such as floating quantifiers, eg all-- "All the soldiers will eat," "The soldiers all will eat," "The soldiers will all eat," etc.)

But recently, researchers in composition have had some success in looking for certain idiomatic constructions as a clue to the kind of arguments that students are making. Some of these are obvious, such as the use of formal hedges ("to be sure") or boosters ("without question"), and those are types of features for which I'm searching. Some are more complicated and require a little more finesse to search for effectively.

Code glosses, for example. A code gloss is an attempt by a writer to explain to readers what a particular word or term in his or her text is meant to convey in the context of the particular writing. Code glosses are not or not merely definitions; a definition provides denotative information that is accurate or inaccurate regardless of context. A code gloss, in contrast, has to provide the information necessary for a reader to follow the writer's argument, and so a code gloss could fail as a general definition but succeed in its specific purpose. (This paragraph itself amounts to a code gloss.) The study of these kinds of features in writing, if you're feeling fancy, is referred to as metadiscourse. Many types of metadiscourse have certain formal clues that can be used to search for them in large corpora.

Unfortunately, false positives are common. The further you get from a restricted set of idiomatic phrases, the more likely it becomes that the computer will return a morphologically identical but argumentatively distinct feature-- so a search for "to be sure" as a formal hedge will also return "I looked it up in a dictionary to be sure that I got it right," which is not a hedge. The flexibility of language, one of our great strengths as a species, makes this sort of thing inevitable to a certain extent. The recourse is often just to sift through the returned results, looking for false positives. (Or, if you're lucky enough to have one, making a research assistant do it!) You might ask why to bother with the computer at all, if you have to perform a reality check yourself with most strings. The answer is just that it's possible to look through the, say, 600 returned examples from a given search string and eliminate the false positives but not to look through the 500,000-2,000,000 words in a given corpus looking for what you want to find.

Beyond that, your only recourse is to building effective search strings given the interface of the particular corpus linguistics software you are using. This requires carefully calibrating wildcards, places in the search string where the software can include any result. You can restrict these wildcards in a variety of ways-- for example, you can allow the wildcard to return any particular letter or one of a certain number of letters. Or you can bind the wildcard in terms of immediacy of surrounding letters or words; that is, the wildcard can be formatted so that the software will look a certain distance in characters from a particular search term. The more open-ended you make your search strings, the more likely you are to have false positives that have to be laboriously culled for accurate data; the more restrictive you are, the more likely you are to exclude relevant examples and thus jeopardize the quality of your research.

And that's why I'm here on a Saturday morning: I'm poking around with a particular search string, looking at the results it returns, and trying to fine tune it in order to better approach the results that I want. All of this is in the service of coming up with research that can express certain qualification- and caveat-filled conclusions, responsibly presented, in order to provide some small amount of progress in our understanding of second language literacy acquisition, which is one of my primary research interests. It's what I love to do.

*****


This is all an exceptionally long-winded windup. (If you are looking to make the accurate criticism that my posts are way to long, this may be the best proof yet.) I mention it all as context for how I feel when I read this excellent post from Shawn Gude, on Ezra Klein and a certain brand of liberal commentator, commonly referred to as a wonk.

Gude quotes Klein as saying that he doesn't think of himself as a liberal anymore, just an empiricist. Gude points out, correctly, that this is a mistake on any number of levels. He shows an old Bloggingheads video where Will Wilkinson and Klein debate the necessity of first principles. I find Klein to be agonizingly frustrating in the video. Wilkinson keeps asking him to accept a simple fact: that whatever the empirical reality of health care, Klein is embracing that empiricism as a means to advance a particular normative end. And as Wilkinson accurately points out, that normative end must itself be justified and argued. It does not exist a priori. Klein repeatedly and doggedly evades having to make that justification, and in so doing makes his own arguments weaker and his own credibility suspect.

Gude is right to view all of this as a profound mistake on Klein's part. But it would be a mistake to view this as a conflict between empiricism and theory. Rather, it is a failure to understand what empiricism is, both in ideal and real-world terms. I referred to my own research here because I want to discuss how empirical work exists in a theoretical and normative framework. I think that, in addition to the theoretical, political, and moral failings in Klein's worldview, there are internal contradictions within Klein's worldview that leaves him bordering on incoherence. And as he is now one of the five or ten most influential journalists in the world, those failings have profound consequences.

Empiricism exists within a framework of theory, and theory cannot be derived empirically. The fact-value distinction is real. (This argument of mine is illustrative of its own point: I take it as an empirical truth, not a normative statement, but its empirical claims are necessarily grounded in theoretical assumptions.) And fact-value problems exist for both the commission of empirical projects and the evaluation of empirical results.

Conducting empiricism requires making a seemingly ceaseless number of choices, choices that cannot be resolved through reference to other people's empiricism. Sadly for all of us, a guidebook for empiricism has never been handed down to us from the heavens. Arguments and complaints about research methods and methodology are vast, and an enormous literature devoted to adjudicating these arguments has been written. These disagreements stretch from the most limited and quantitatively-oriented questions (when is it appropriate to use a p=.05 level of statistical significance? When is it necessary to use a .001 level? What aspects of a given research project determines the use of one or the other?) to the broadest questions of purpose and justification (why research? Towards what end? For what purpose and for whose good?). None of them can be answered empirically-- not that they shouldn't be but that they can't be. We lack even an idea about how such an empirical investigation of questions of value could be undertaken.

In the example from the Bloggingheads video, Wilkinson is trying to get Klein to acknowledge that before we assess empirically the data he says is in his favor, they have to determine what would constitute empirical success and even what questions they are trying to answer. If Klein prefers the language of social sciences, Wilkinson is asking him to consider what their construct is, how it is operationalized, and what results must be returned through the assessment of that operationalization in order to suggest success at a certain degree of confidence. Klein cannot answer those questions empirically. I think he knows that, but he is so stuck on this monolithic and transcendent vision of what empiricism is that he seemingly can't confront those necessary and prerequisite questions.

The real shame is that he would never make this mistake if he ever attempted to do social science research himself.

This semester, I'm taking a seminar in quantitative language testing. Among many other methods, one tool we've looked at is multiple choice tests. Some find such tests to be inherently clumsy or reductive instruments, and indeed there are a host of issues with them. But the science of multiple choice testing is extraordinarily sophisticated. Writing a good multiple choice item involves decisions having to do with deciding what construct to test, formatting of the language in which the question is expressed, formatting of the key, what kind of distractors should be included, and many more-- all before you even begin situating it within the context of a larger test. There are certainly statistical and quantitative ways to assess multiple choice tests, and any responsible test administration would have to utilize them. But the stats themselves are meaningless outside of a theoretical understanding of what they mean in application to a particular question. You can get great stats from your multiple choice test with worthless questions.

You might, for example, get a point-biserial correlation that suggests that a MC test item is perfectly discriminating between high- and low-scoring test takers, in a case where the test item is useless. A point-biserial correlation compares continuous variables to dichotomous variables; it is often used to show how well a test item (which can be scored dichotomously, ie, right or wrong) discriminates between higher and lower scorers. This is a clue to the validity of the item; if more low scorers (based on the rest of the test) are getting the question right than high scorers, the suggestion is that something has gone wrong with the item. But even with perfect point-biserial coefficients, you can have flawed, even lousy tests. Construct-irrelevant variance happens. The reason your high scorers are getting the question right and low scorers are getting the question wrong might have nothing to do with the given construct your items are meant to test. We have empirical tools that can help us find this kind of error, but we can never farm out our interpretation of such problems to empiricism. They have to be understood theoretically; theory is the only guide to pragmatic solutions to such problems.

I think of someone like Dr. Glenn Fulcher, whose credentials and reputation are beyond reproach. His wonderful website is a testament to a decades-long pursuit of better, fairer, more accurate tests of language skill. Dr. Fulcher is an empiricist, as I am, and he believes as I do in the positive power of responsibly-generated social science. And yet his work is filled with caveats, provisos, and discussions of limitation. In his indispensable book Practical Language Testing, he expresses this kind of self-skepticism again and again, declaring repeatedly that any effective empirical inquiry into human behavior requires locating the meaning of data in a theoretical framework. Speaking towards Gude's point explicitly, he says of the task of crafting test specifications, "Specifications therefore reflect the theoretical and practical beliefs and judgments of their creators." The meaning of the test is always a reflection of the pre-empirical beliefs of the people who wrote. In a world where tests like the SAT or standardized mastery tests in public schools have profound impact on human lives, understanding this empirical-theoretical interchange is essential, and that understanding is what Klein is at risk of obscuring.

Nothing could be more discouraging of the "I only deal in empiricism" mindset than a lifetime spent performing empirical research.

That skepticism of experience is why wonks frequently worry me, because in their (necessary and well-intentioned) policy generalism they fail to acquire the experience through which a broad understanding of a given field of human inquiry is derived. I don't question the dedication and responsibility of the wonk class. But I do believe that really becoming fluent in the ins-and-outs of messy, contingent research requires that one performs research, and that one spends the endless frustrating hours of reading and writing necessary to become credentialed in a given area of the social sciences. Our system of development of expertise, flawed though it may be, creates the indispensable conditions of time, thoroughness, and review. All three of those are frequently impossible for journalists and bloggers. Smarts are necessary but not sufficient. I have no doubt at all that Dylan Matthews, wonk-of-the-future and Klein's frequent research assistant, is extremely smart. But smarts are not nearly enough. If they were, we'd have fewer problems.

I said before, and will say again, that wonkery is necessary in today's democracy. But it can never be enough, and the sad fact is that in the world of liberal media these days, many people flatter themselves to believe that wonkery is the only way to advance the progressive cause. That's a ruinous and self-defeating notion. It is also one which is applied inconsistently: wonks, for example, tend to be supportive of conventional school reform efforts, despite the incredible empirical failures of those same efforts. (It seems that behavior does not descend so cleanly or inerrantly from empirical results after all.) Liberalism no doubt needs wonks. But the growing resistance to academics, theorists, and philosophers that Gude identifies within the liberal media is a profound mistake.

This all precedes the political question, which to my mind has been answered dispositively by global warming: the facts are not enough. We flatter ourselves to self-identify as the party of facts and science, but the human mind does not operate by facts alone. Last week, I saw a lecture by a rhetoric professor who works in a center for global sustainability. He told stories about how often climatologists and environmental scientists lament to him that the science is sound and yet not convincing for the people who need to be convinced. His job is to help them better craft their arguments in order to make convincing easier, and he often has to remind these scientists that conviction does not develop from science alone. Aristotle was one of the fathers of empiricism, and yet he wrote extensively on the dangers of trying to convince people based on facts alone. The human animal doesn't work that way, not 2,300 years ago and not now. The facts are not enough.

The urge towards empiricism as a tool is a necessary response to an ideology which has rejected plain facts again and again in the pursuit of its political ends. The urge to a transcendent or totalizing empiricism, however, is a deeply human, deeply understandable, and deeply flawed one: it seeks to remove messy moral questions from the scramble of everyday life. But no such refuge can be found, not in numbers or science or anywhere else. We are in a period of great liberal self-satisfaction. I get it, and I'm taking part in it. But these periods never last long, and they don't in large part because we mistake political victory for the triumph of the facts. This misunderstands both our strengths and our weakness, both the American character and the human character. The facts are only the beginning. To fail to understand that is to fail not only philosophically but practically and politically as well.

Source: http://lhote.blogspot.com/2012/11/empiricism-is-inextricable-from-theory.html

buffett rule lollapalooza lineup joss whedon ronnie montrose melissa gilbert dancing with the stars dandelion wine cough

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.