Notice & Comment

From Replicability to Replication in Systematic Coding of Legal Decisions, by Mark A. Hall and Ronald F. Wright

It is fitting that this symposium focuses on the propriety of systematic coding of caselaw in the context of an ALI Restatement. A decade ago, we wrote a comprehensive review article of legal scholars’ use of systematic case coding up to that time, which we framed as a quasi-“Restatement” of this emerging research methodology. Because this case-counting methodology grew organically in a number of fields of legal research, we believed the time was ripe for a synthesis of best practices in case coding. For that reason, we find it heartening that, apparently for the first time, Restatement reporters have used a version of systematic case coding in their work – an application that plays to the strengths of this methodology.

This symposium, and efforts elsewhere, now bring us to a new phase in methodological development. The first phase was how best to code cases systematically. This second stage is how best to replicate case coding projects.

Replicability is the essence of scientific methodology. Using methods that are systematic and transparent places findings and conclusions on a firmer epistemological footing because such methods bring much greater objectivity to findings. Objectivity arises from the fact that using systematic and transparent methods in principle allows anyone with sufficient skills to use the same methods to replicate (or falsify) the findings. Thus, the outcomes of research do not depend on the subjective viewpoints or prior assumptions of researchers. Results are persuasive because other researchers can test them, not because the reader trusts the individual judgment of the original researcher.

Research assumes a more scientific status simply by being replicable, even if it has not in fact been replicated. The simple ability in principle to attempt replication indicates that the researcher used credible scientific methods. Thus, actual replication is not essential for research to have credibility.

Actual replication, however, is important for several reasons. To name two, replication tests accuracy and robustness. Transparency and objectivity do not guarantee accuracy. Mistakes can occur, and happenstance can skew results; attempts at replication can detect these errors. Moreover, even when initial results are valid, they can be highly sensitive to variations in the particular methods used. Replication with methodological variation can help determine whether initial reports are robust enough to survive seemingly inconsequential contingencies.

With these thoughts in mind, we reflect on the symposium articles, not from a viewpoint of which of the various substantive positions is better, but instead how best to go about the process of replication. Useful research does not always need to live up to the highest “gold standard” of rigor. Shortcuts or compromises are often necessary and acceptable, both in initial studies and in replication. Here, these replicating authors began with the set of putatively relevant cases identified by the original Restatement researchers and concluded that they contain a substantial number of “false positives” (not actually relevant). A more thorough form of replication would have taken a step further back to search for potentially relevant cases from scratch, looking for any missed cases (false negatives). The replicators reasonably concluded this was not essential for their purposes.

On the other hand, these replicators adopted several compromises that are not ideal. First, their reading of the cases from the original study was not fully blinded to the results that the original researchers reported. Blinding is especially important where researchers have a distinct viewpoint going into the project (as appears to be the case here). Second, although Levitin et al. double-coded their cases (meaning two people read each one), they did not use optimal methods of double-coding, which require the researchers to report the level of disagreement between two coders and to discard entirely any categories of coding where results do not have fairly high congruence.

The ability of blinded and independent readers to see the same thing fairly consistently is critical to the objectivity sought by the systematic coding attempted here – both by the Restatement researchers and the replicators. When coders disagree, it does not suffice, necessarily, to bring in a tie-breaker or to ask the coders to confer. Doing that will produce a distinct data point, but the data point will not necessarily be replicable unless disagreement is rare in the first place, or can be reduced through refinement of the coding protocol.

If low disagreement between independent, blinded coders cannot be achieved, then it may tell us that the question being posed (e.g., what did the court hold, did it recognize or reject the rule in question, etc.) is not amenable to the classical form of systematic content analysis recognized by social scientists. If so, then using the trappings of those methods could suggest an unjustified level of empirical rigor. Instead, it might be more honest to fall back to conventional methods of legal analysis, which are simply to declare that the author, who is an expert in the field, has read and analyzed the relevant cases as described.

Finally, researchers and replicators should use care in drawing statistical inferences from coding case law. One distinct advantage of studying case law is that usually it is possible to analyze the entire universe of relevant instances. Having universal samples of manageable size eliminates the need to do the types of statistical analyses that social scientists usually employ, which are designed to determine whether a limited study sample is an accurate representation of a larger, more hidden universe.

In that regard, the use that Klass makes of 95 percent confidence intervals around reported results is not a standard application of those statistical methods. As well explained by Lee Epstein and Gary King , sampling statistics indicate whether observed results are possibly due to random variation rather than reflecting the full reality among the unsampled universe, at the time the sampling occurred. Unless variation over time is known to be trivial, standard sampling statistics say nothing about whether a current sample is likely to predict future results. Thus, a poll of voters has a margin of error for what all voters think now, but the margin of error does not specify what voters are likely to do a month from now. Polling results might well help us predict what voters will do in the future, but that prediction is based more on what we think might change between now and then, rather than on what a standard confidence interval tells us. In coding cases, uncertainty about the sampling error simply does not arise when we in fact code all known available cases.

Thus, if case coding is done correctly, it tells us, in a replicable fashion, what courts have actually said and ruled. As such, we think it is a very suitable method for any legal analysis that reports simply that. Since accurate analysis of caselaw is an essential part of Restatement work, we think that systematic case coding furthers this goal.

That said, there are many difficult issues that case coders must confront – not only in the precise design of their project’s methods, but also in how legitimately to interpret results. For that reason, we fully agree that case coding methods should be reasonably transparent and available for replication efforts, especially when used for purposes as important as Restatement work. Replicability (and replication) of caselaw studies do not resolve core substantive questions that drive the interpretation of results. What are the right questions to ask about the cases? Which cases are most or least relevant to those questions? What is the relative importance of dicta or holding? What exactly does it mean that only three quarters rather than sixth sevenths of all relevant cases have ruled a particular way?

These remaining difficulties should not discourage greater use of systematic coding or replication. Instead, such work, when well done, can help to narrow the focus of remaining areas of productive debate and disagreement.

Mark Hall is the Fred D. & Elizabeth L. Turnage Professor of Law and Director of the Health Law and Policy Program at Wake Forest University School of Law. Ronald Wright is the Needham Yancey Gulley Professor of Criminal Law at Wake Forest University School of Law.

This post is part of a symposium on the Draft Restatement of the Law of Consumer Contracts. All of the posts in this symposium can be viewed here.

A blog from the Yale Journal on Regulation and ABA Section of Administrative Law & Regulatory Practice.

Made possible in part by the support of Davis Polk & Wardwell LLP