From Replicability to Replication in Systematic Coding of Legal Decisions, by Mark A. Hall and Ronald F. Wright

by Guest Blogger — Tuesday, Mar. 19, 2019

It is fitting that this symposium focuses on the propriety of systematic coding of caselaw in the context of an ALI Restatement. A decade ago, we wrote a comprehensive review article of legal scholars’ use of systematic case coding up to that time, which we framed as a quasi-“Restatement” of this emerging research methodology. Because this case-counting methodology grew organically in a number of fields of legal research, we believed the time was ripe for a synthesis of best practices in case coding. For that reason, we find it heartening that, apparently for the first time, Restatement reporters have used a version of systematic case coding in their work – an application that plays to the strengths of this methodology.

This symposium, and efforts elsewhere, now bring us to a new phase in methodological development. The first phase was how best to code cases systematically. This second stage is how best to replicate case coding projects.

Replicability is the essence of scientific methodology. Using methods that are systematic and transparent places findings and conclusions on a firmer epistemological footing because such methods bring much greater objectivity to findings. Objectivity arises from the fact that using systematic and transparent methods in principle allows anyone with sufficient skills to use the same methods to replicate (or falsify) the findings. Thus, the outcomes of research do not depend on the subjective viewpoints or prior assumptions of researchers. Results are persuasive because other researchers can test them, not because the reader trusts the individual judgment of the original researcher.

Research assumes a more scientific status simply by being replicable, even if it has not in fact been replicated. The simple ability in principle to attempt replication indicates that the researcher used credible scientific methods. Thus, actual replication is not essential for research to have credibility.

Actual replication, however, is important for several reasons. To name two, replication tests accuracy and robustness. Transparency and objectivity do not guarantee accuracy. Mistakes can occur, and happenstance can skew results; attempts at replication can detect these errors. Moreover, even when initial results are valid, they can be highly sensitive to variations in the particular methods used. Replication with methodological variation can help determine whether initial reports are robust enough to survive seemingly inconsequential contingencies.

With these thoughts in mind, we reflect on the symposium articles, not from a viewpoint of which of the various substantive positions is better, but instead how best to go about the process of replication. Useful research does not always need to live up to the highest “gold standard” of rigor. Shortcuts or compromises are often necessary and acceptable, both in initial studies and in replication. Here, these replicating authors began with the set of putatively relevant cases identified by the original Restatement researchers and concluded that they contain a substantial number of “false positives” (not actually relevant). A more thorough form of replication would have taken a step further back to search for potentially relevant cases from scratch, looking for any missed cases (false negatives). The replicators reasonably concluded this was not essential for their purposes.

On the other hand, these replicators adopted several compromises that are not ideal. First, their reading of the cases from the original study was not fully blinded to the results that the original researchers reported. Blinding is especially important where researchers have a distinct viewpoint going into the project (as appears to be the case here). Second, although Levitin et al. double-coded their cases (meaning two people read each one), they did not use optimal methods of double-coding, which require the researchers to report the level of disagreement between two coders and to discard entirely any categories of coding where results do not have fairly high congruence.

The ability of blinded and independent readers to see the same thing fairly consistently is critical to the objectivity sought by the systematic coding attempted here – both by the Restatement researchers and the replicators. When coders disagree, it does not suffice, necessarily, to bring in a tie-breaker or to ask the coders to confer. Doing that will produce a distinct data point, but the data point will not necessarily be replicable unless disagreement is rare in the first place, or can be reduced through refinement of the coding protocol.

If low disagreement between independent, blinded coders cannot be achieved, then it may tell us that the question being posed (e.g., what did the court hold, did it recognize or reject the rule in question, etc.) is not amenable to the classical form of systematic content analysis recognized by social scientists. If so, then using the trappings of those methods could suggest an unjustified level of empirical rigor. Instead, it might be more honest to fall back to conventional methods of legal analysis, which are simply to declare that the author, who is an expert in the field, has read and analyzed the relevant cases as described.

Finally, researchers and replicators should use care in drawing statistical inferences from coding case law. One distinct advantage of studying case law is that usually it is possible to analyze the entire universe of relevant instances. Having universal samples of manageable size eliminates the need to do the types of statistical analyses that social scientists usually employ, which are designed to determine whether a limited study sample is an accurate representation of a larger, more hidden universe.

In that regard, the use that Klass makes of 95 percent confidence intervals around reported results is not a standard application of those statistical methods. As well explained by Lee Epstein and Gary King , sampling statistics indicate whether observed results are possibly due to random variation rather than reflecting the full reality among the unsampled universe, at the time the sampling occurred. Unless variation over time is known to be trivial, standard sampling statistics say nothing about whether a current sample is likely to predict future results. Thus, a poll of voters has a margin of error for what all voters think now, but the margin of error does not specify what voters are likely to do a month from now. Polling results might well help us predict what voters will do in the future, but that prediction is based more on what we think might change between now and then, rather than on what a standard confidence interval tells us. In coding cases, uncertainty about the sampling error simply does not arise when we in fact code all known available cases.

Thus, if case coding is done correctly, it tells us, in a replicable fashion, what courts have actually said and ruled. As such, we think it is a very suitable method for any legal analysis that reports simply that. Since accurate analysis of caselaw is an essential part of Restatement work, we think that systematic case coding furthers this goal.

That said, there are many difficult issues that case coders must confront – not only in the precise design of their project’s methods, but also in how legitimately to interpret results. For that reason, we fully agree that case coding methods should be reasonably transparent and available for replication efforts, especially when used for purposes as important as Restatement work. Replicability (and replication) of caselaw studies do not resolve core substantive questions that drive the interpretation of results. What are the right questions to ask about the cases? Which cases are most or least relevant to those questions? What is the relative importance of dicta or holding? What exactly does it mean that only three quarters rather than sixth sevenths of all relevant cases have ruled a particular way?

These remaining difficulties should not discourage greater use of systematic coding or replication. Instead, such work, when well done, can help to narrow the focus of remaining areas of productive debate and disagreement.

Mark Hall is the Fred D. & Elizabeth L. Turnage Professor of Law and Director of the Health Law and Policy Program at Wake Forest University School of Law. Ronald Wright is the Needham Yancey Gulley Professor of Criminal Law at Wake Forest University School of Law.

This post is part of a symposium on the Draft Restatement of the Law of Consumer Contracts. All of the posts in this symposium can be viewed here.

Cite As: Author Name, Title, 36 Yale J. on Reg.: Notice & Comment (date), URL.

One thought on “From Replicability to Replication in Systematic Coding of Legal Decisions, by Mark A. Hall and Ronald F. Wright

  1. Adam Levitin

    I am responding solely for myself and not on behalf of the co-authors, but as “master of the spreadsheet.” All errors, opinions, and offense given should be debited from my account. Let me first thank Professors Hall and Wright for their thoughtful response before responding to their specific comments.

    Regarding the false negative issue, we did not do a systematic search for false negatives, but in the course of our review we nonetheless found some. Disturbingly, some of these false negatives had been previously brought to the attention of the Reporters (or even appeared in _other_ data sets supporting the draft Restatement), yet continued to be ignored.

    Regarding the strength of our internal coding disagreements, this really isn’t an issue because we did not have many disagreements. Only 13 out of 189 cases involved a disagreement in our coding. (A handful of cases where there was no disagreement also received an additional read, either because of miscommunication among ourselves or because of reader curiosity.) When we disagreed, however, the scope of the disagreement was usually a “strong,” binary matter. Counterintuitively, that actually gives us more confidence in our methodology because these disagreements were generally the result of outright miscoding by a reader, such that the additional “tie-breaker” reader could readily identify what was the obviously and objectively “right” answer. Put another way because the disagreements were black/white, rather than about shades of grey, it was easy to figure out which reading was correct. Even if we tossed all of the 13 cases from our data, however, there would be no material change in our finding that the majority of cases in both data sets were simply not relevant to the issue for which they were counted.

    On a broader level, what constitutes ideal replication study methodology is beside the point for our project. Our disagreements with the Reporters’ coding are not about close calls where methodological issues matter. I cannot emphasize this enough. Our disagreements with the Reporters’ coding are about black and white questions where the methodological design issues of either the Reporters’ study or of our replication study are beside the point. If a case is a business case, then it has no business being part of a consumer contracts restatement. Likewise, if a case is decided on statutory grounds, then it doesn’t have any relevance to a question of the common law of consumer contracts. One doesn’t need any methodological sophistication to recognize problems like these. Moreover, our findings were not of isolated problems, but of pervasive ones in the Reporters’ coding. Indeed, even if we were wrong in our conclusions half the time, the draft Restatement would still be rest on studies in which roughly a third of cases included are irrelevant. That alone would be appalling. Put another way, methodological issues are simply not material to our ultimate findings.

    You, the reader, however, needn’t merely take our word for it. If you doubt our findings, it’s easy to look at a sample of the cases where we found coding problems yourself. Maybe you’ll disagree with us in a particular instance, but look at 10 cases, and we’re confident you’ll come away agreeing that there are serious problems in the draft Restatements’ case coding.

    Ultimately, while methodological debates are interesting, it’s important not to lose sight of the real question here: why is the ALI is on the verge of approving a Restatement that is based on the legal equivalent of “junk science,” and which has a substantive position that, as Mel Eisenberg notes in his contribution to this on-line symposium, has resulted in an unprecedented opposition from state attorneys general, consumer, labor, and civil rights groups?


Leave a Reply

Your email address will not be published. Required fields are marked *