Surrogation

1. Introduction
2. Behavioral vs. introjective surrogation
3. Quantitative surrogation: The problem of value capture

3.1. Complexity and compression
3.2. Behavioral surrogation: Motivated interpretation & massaged maps
3.3. Introjective surrogation: Value capture, value clarity

4. Surrogation and the crisis in psychology

4.1. Surrogation and the replication crisis
4.2. Surrogation and the generalizability crisis

4.2.1. Appearance-optimization as a cargocult
4.2.2. The sociology of surrogation: Why can't honest actors recover corrupted systems?
4.2.3. Surrogation vs. contextualization

5. Qualitative surrogation: The tragedy of appearances
6. Mitigations

6.1. Supplementation vs. surrogation

6.1.1. System 1 and System 2
6.1.2. Developing an aesthetic

6.2. Tracking intuitions
6.3. Minimizing measurement surrogation (Goodhart-Campbell)

7. Works Cited
8. Footnotes

Surrogation is the substitution of a representation—be it a metonym, symbol, proxy, metric, or signal—for some represented whole.

1. Introduction

Goodhart’s Law, in the (re)phrasing of anthropologist Marilyn Strathern: “When a measure becomes a target, it ceases to be a good measure.” Since we expect humans to behave roughly rationally toward self-interest, and as the incentive structure of an activity determines self-interest, the surrogate measure distorts behavior in the direction of optimizing for the surrogate, at cost to the surrogated which is proportional to the divergence between surrogate and surrogated.

We will call this general mechanism—where a representation of a holistic target generates its own gravitational field, and in some meaningful way replaces that original whole—“surrogation.” Choi, Hecht, and Tayler in the early 2010s published papers in management accounting theory (2011, 2012) proposing that managers exhibited a pattern of losing sight of their original strategic target in favor of an instituted proxy measure set up to represent it. The authors are potentially the first to use the term “surrogation” in this context, and they include in their definition the psychological “amnesia” of managers—but it is well-chosen as an umbrella handle for a broader phenomenon comprising multiple theoretic carvings. In management studies, that of Choi, Hecht, and Taylor. In the social sciences, that of Goodhart as well Law and that of Donald Campbell (i.e. “Campbell’s Law”) stating that “[t]he more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.” (In the context of policing, specifically, Campbell has accused the Nixon administration’s crackdown on crime as having “as its main effect the corruption of crime-rate indicators, achieved through underrecording and downgrading the crimes to less serious offenses.”) In philosophy, we see C. Thi Nguyen’s theory of value capture as outlined in 2020’s Games and the Art of Agency (and presented in the larger context of gamification, a process similar to surrogation). Nguyen defines value capture as the substitution of a simplified metric, or indicator, for a richer holistic value, distinguishing it from Goodhart’s Law in that the substitution is internalized by the agents situated within the surrogate incentive structure. There is not just a change in the agents’ behavior but in their actually held values.

The surrogate by definition is chosen because it is somehow easier or more tractable than the surrogated “real” or “original” destination. That previous destination may have beeen hidden from sight; it may have been too difficult to compare or rank among instances; it may simply have beeen costlier to track in time or money. A proxy is a surrogate; so is a signal, a metric, a marker, and a representation.

Surrogation is linked in meaningful ways to the concepts of “economic thinking” and “commodification,” where the formalization, compression, or technical specification of a vague or humanistic value is “lossy,” i.e. cannot meaningfully capture the whole. You can see its economically reductive—its “classical”—form in an especially weak passage of Brian Eno’s 2015 lecture, “An Ecology of Culture” (emphasis mine):

But surrogation is not a problem or phenomenon unique to metrics. Setting reality to words is always the first surrogation: we reify our concepts, confuse them with nature. Were we to avoid quantitative analysis, we would not avoid surrogation. Language, like our internal drives and desires, is always vague—our goals are always underspecified, and our words are always unstable and underdefined. One way to put surrogation in simple English, losing the unnecessary if common statistical bent, is to say that some simplified marker of appearance, eligible for its co-occurence with an intractable or invisible reality, becomes in some way a substitute for that reality.^[1] In the language of signaling theory, the sign is reified in place of the hidden, signaled quality. Markers in fashion—an everday, material instantiation of signaling theory—are famously contextual. The same piece of clothing can signify very different aspects of its wearer depending on the larger inferred complex of intentionality, knowingness, and providence in which its display (is inferred to) originate. A “designer” brand like Lacoste, on its own and out of context, is interpreted as a sign of white wealth only by the excessively naive; like the Silicon Valley hoodie uniform, it is a contextual move which is able to signify only against an understood landscape of signification; it can only be considered in context.^[2]

Much related, a cargocult is the confusion of surface details and instantiation-specific components for substantive or functionally necessary aspects (Feynman 1974, Reason 2016). “The cargoculter builds a motorless airplane from palm fronds, sprinkles it with holy water, and prays to the gods for it to fly” (Reason 2016). Typically the confusion is born of a lack of deep systems understanding of the target domain. As a result, a cargocult imitates superficial and aesthetic elements (markers or ritual indicators) in expectation of their efforts reproducing the operation of the original. These confusions may be attributed in part to confusion over the direction of causality, and the role of the components in the enveloping system. A system which surrogates non-causal attributes, and especially the surface products of deeper causes, can be considered a cargocult.^[3]

Kahneman and Tversky have also theorized attribution substitution, in which an agent tasked with a difficult question may resort—unwittingly—to answering a related, proxying but distinct question that is easier to answer. Perhaps most famous as an example is the bat and ball cognitive reflection test, where subjects are asked to calculate the cost of a ball, given that the bat and ball together cost $1.10, and the bat costs $1 more than the ball: respondents appear to most immediately answer that the ball costs ten cents, which the behavioral economists speculate is the result of subjects substituting the real task for the task of merely parsing large and small quantities (e.g. as a fast-and-frugal heuristic for making financial decisions). Whether subjects come to the correct response is largely reliant on whether they use System 2 thinking to monitor and correct their System 1 intuition. While analogous in underlying mechanism to surrogation, attribution substitution in Kahneman’s factoring is performed quickly and unconsciously, rather than at the level of conscious institutional or individual structuring. Still, we can think of them as similar in kind.

Finally, in artificial intelligence research (which frequently mobilizes Goodhart’s Law), there are the concepts of underspecification and nearest unblocked strategy (Manheim 2019, Arbital 2020). Specifying a telos in code proves a hard task: any behaviors not explicitly prohibited may be exploited; incentives turn perverse; roadblocks prove insufficient and—like Midas—the goal literal turns out not to be the goal actual.

In his December 2018 article on the origins of Goodhart’s and Campbell’s laws, Jeff Rodamar makes the case that differences in use of the different terms, field to field, “harms communication, creates barriers to science, and hinders improvements in practice.” This is to say nothing of the many other similar concepts, enumerated above, in management accounting, behavioral economics, artificial intelligence, and the philosophy of games.

2. Behavioral vs. introjective surrogation

Roughly, there are two kinds, or possible stages, of surrogation: the alteration of behavior towards a surrogate, and/or the psychological internalization of the surrogate—the reification of the surrogate as if it were the thing itself; an amnesia surrounding the switch.

Perhaps the major problem of surrogation is that it alters and corrupts human behaviors, moving their telos away from the originally desired behavior and into those which, while rational at an individual level (as exploitations of the incentive structure), are inefficient at the level of institution and society. Moreover, in cases in which individuals are aware that their incentivized behavior diverges from pro-social goals, the activity loses meaning and the individuals become “cynical”—they are aware of the performative aspects of their acitivity. (See, by way of example, Michael Inzlicht’s reflections on his disillusionment with social psychology [2016].) In selectively rewarding individuals or institutions who optimize away from the real target and toward the instituted surrogate, it (1) discourages play interested in the real target, (2) discourages players interested in the real target, who engage in exit from the game, (3) increasingly promotes and advances individuals or institutions who optimize (undesirably) to the surrogate at the cost of the real target.

To understand just how perverse (cf. “perverse incentives”) surrogate incentives can be, we can look to Robert Jackall’s sociological study of institutional ethics, Moral Mazes:

The introjective kind of surrogation is performed (or “happens to”) not just those are expected themselves to optimize toward the surrogate—as in the behavioral kind—but also those who have constructed the incentive structure, who pass down the surrogate and, in institutional contexts, may even have designed it. In more decentralized social settings, subject to cultural inheritance and the ongoing, distributed negotiation of norms—realms where we “are organized beings, but are not the authors of our organization” (paraphrasing Noë 2015)—values, preferences, and norms are more amorphous, enacted but often only half-consciously known; their slow replacement by a surrogate can occur without intentionality or conscious recognition.

Goodhart and Campbell’s Law approach surrogation from the perspective of the system. Nguyen’s intervention, with his concept of “value capture,” is to shift the perspective away from the system and toward the individual inhabiting it.

3. Quantitative surrogation: The problem of value capture

By setting a metric as a target, and by linking that target to a reward structure, we create an incentive for that metric to be gamed in some way. (Rodamar 2018)

The central problem in any superorganism or institution is that of aligning values between members so as to coordinate action toward a shared purpose; this dilemma is known as the principal-agent problem. Solving principal-agent problems requires preferential treatment—rewards or punishments doled out on the basis of performance (fixed-rate salary with bonuses is a classic example of financial incentive—though prestige and reputational incentives have proved efficacious on their own).

Broadly speaking there are three main advantages to instituting measurement across these systems, David Manheim writes in his essay series on measurement for Ribbonfarm: “[It] replaces intuition, which is often fallible. It replaces trust, which is often misplaced. [And it] finesses complexity, which is frequently irreducible”—where irreducible entails intractable^[4] (2016a, 2016b). In other words, the organization wishes to supervise and monitor the behavior of its subagents, to ensure honest and high quality performance. Complexity must somehow be reduced to a synopsis, or indicator, in order to effectively evaluate performance. Additionally, a desire to make the basis of this preference consistent across the organization, and transparent and legible for involved parties—as is frequently expected in a society that values equal opportunity—involves instituting public bases for advancement. In some arenas, performance plays out in a way that is easily quantitatively tractable, such as sales figures, but where there is divergence—where a number or statistic is preferred over “the real deal”—the publicity makes the surrogate gameable, and rational self-interest adjusts its targets accordingly. Our accounting of the motivations for quantitative surrogation thus includes not only the replacement of intuition and the reduction of complexity, but the production of legible, transparent, consistent, fair, and objective-seeming bases to ensure better management of the organization, and (contiguously) the principal-agent solving preferential treatment of organization members. Nguyen, in Games and the Art of Agency, illustrates how the reduction of qualitative values to quantitative metrics ensures the “units” or demonitator of evaluation are consistent across both time and space:

3.1. Complexity and compression

What this involves, necessarily, is the reduction or summarization of complex and fuzzy realities.^[5] Manheim:

In the language of computation, the surrogate lossily compresses some complex whole, and bears inverse fidelity to its divergence from said whole, and conflates many possible worlds into a single measure. For instance, one’s appearance as a student with a GPA of 3.15 may mask two very different realities—on the one hand, a straight-B student; on the other hand, an enormously talented physicist who flunked his compulsory literature course. The Australian counterinsurgency expert David Kilcullen writes in “Measuring Progress in Afghanistan”:

When quantitative metrics obscure meaningful valence differences in the compressed whole, selection based on these metrics produces disastrous, counterproductive results which are difficult to monitor precisely because the disaster is invisible to the system of monitoring. Historian Muller, in The Tyranny of Metrics, describes the downfall of simple counting:

We can adapt the excerpt for psychology—whose use of surrogation, as we will soon see, has led to its generalizability and replication crises:

3.2. Behavioral surrogation: Motivated interpretation & massaged maps

While quantitative surrogates are often instituted to provide an objective oversight (to prevent being fooled by an employee’s “spun” self-representation, for instance), in practice, when the interpretation of reality as statistics—the choice of how to compress that reality—is left to the monitored agents, statistics are easily “massaged.” Muller reports:

Muller quotes a Chicago detective on the ease of “juking the stats” (i.e., orienting “the activity of the department toward seemingly impressive outcomes”):

This massaging occurs at all levels, so long as the information is being passed upward in command, i.e. to the agents responsible for doling out preferential treatment. Since in many cases even the highest-ranking members of an organization are responsible to shareholders or a public, massaging occurs at all levels. And while superiors often prefer accurate over massaged information in order to make better strategic decisions and fill out the organization with competent workers, in some situations, statistics massaged at lower levels may be preferable or even knowingly demanded, since “keeping their hands clean” in this way allows higher-ups plausible deniability in passing their own claims forward.

3.3. Introjective surrogation: Value capture, value clarity

Games extract pleasure from what C. Thi Nguyen, in 2020’s Games and the Art of Agency, calls “value clarity”:

Game play, in other words, involves an “all-consumingly instrumental mode of practical reasoning.” The legibility, meanwhile, allows public ranking, encourages improvements in productivity and performance by establishing common knowledge of relative performance, fostering competition among members.

The appeal of value clarity can lead human superorganisms into what Nguyen calls accidental gamification, where game-like features—such as clear metrics, often introduced top-down with the explicit aim of motivating employees through public competition:

The gamification of academia, science, and the “global knowledge game” is discussed in the following section, “Surrogation and the crisis in psychology.”

In “simplifying the specification of the target” we end up pursuing, “with ever more ferver and ferocity, the wrong target.” Often, by the laws of complexity—that is, the inevitability of perverse incentives—surrogated efforts even make the situation worse than passivity—this being part of the case made by Michael Huemer in his defenses of policy passivity (2012).

4. Surrogation and the crisis in psychology

As we have seen, surrogation permeates distributed human projects, or “superorganisms”—institutions like the military, police department; the medical or justice system; diplomacy and trade; but also what Sarah Perry (2020) dubs the “global knowledge game”—the ongoing process of attempting discovery of global truths spanning scientific and, to a lesser extent, humanities work. In other words, a lifting of knowledge out of context and into some human (aspirational-)universal or generalization.

There are numerous surrogation-caused problems in the global knowledge game (GKG). Because the GKG has become a vast enterprise characterized by information overload—by the simultaneous production of millions of members—and because there is a vast, distributed incentive structure designed to reward certain behaviors ostensibly in the service of knowledge production, we should expect it to have the same institutional issues of stats-gaming (e.g. p-hacking) already discussed with respect to the police and military. (Moreover, surrogation is common across knowledge-oriented fields, such as education, where in the United States we’ve seen controversies over “teaching to the test” as well as more blatantly corrupt Goodhartian actions such as teachers manually altering students’ Scantron forms.)

Additionally, in the “inexact sciences”—that is, those which are attempting to mature past their qualitative roots and into a more quantitative or empirical science, for instance psychology’s abandonment of phenomenology and psychoanalysis in favor of statistical lab studies—there is a problem of wanting to grow up too fast. In their rush to “objectify” and rigorize themselves, many of the social sciences have hastily abandoned old methods, replacing them entirely with a more performatively “scientific” surrogate. Here, I’ll use Tal Yarkoni’s recent assault on social psychology, “The generalizability crisis” (2019), as a launching pad to discuss the phenomenology or psychology of surrogation, as well as some of the sociological reasons that institutions deep in surrogated divergence (i.e., away from the “real” target) are so difficult to correct.

4.1. Surrogation and the replication crisis

[TK: Gigerenzer on the “surrogate idol” of a universal method; p-hacking and gamification of paper submission; the incentive structure that discouraged replication in the first place.]

4.2. Surrogation and the generalizability crisis

The broad argument Yarkoni advances is that psychology studies’ ability to generalize—for the narrow bounds of a lab study done with “just one video, one target face, and one set of foils” to provide evidence for the existence of some broad psychological construct like ego depletion—is orders of magnitudes lower than traditionally assumed in the field. Yarkoni’s critiques are not new—as he himself notes, many thinkers in the inexact sciences have been raising the alarm on similar issues, including Gerd Gigerenzer and Paul Meehry, in some cases for upwards of half a century—but they compile and make sense of the scope of the problem social psychology faces.

First, a psychological construct, in order to gather evidence as to its “existence” or “nonexistence”—and even here there is a whiff of conceptual confusion—must be operationalized:

Yarkoni himself has characterized the surrogative aspects of operationalization: the validity of any findings depend, post-operationalization, on “the degree to which the chosen proxy measures successfully capture the constructs of interest.”

Once the study is completed, a second stage follows: the discovered quantitative or operationalized reality is re-translated back into language via generalization or loose induction. The coarse metrics to some extent “disappear,” as we re-enter the realm of language where knowledge is hosted and decisions made. The context is further stripped as the narrow lab finding is “generalized” into a larger claim about human behavior: “Papers should be given titles like ‘Transient manipulation of self-reported anger influences small hypothetical charitable donations,’ and not ones like ‘Hot head, warm heart: Anger increases economic charity.’," Yarkoni writes.

4.2.1. Appearance-optimization as a cargocult

The important thing, it appears, is that the numbers have the right form. (Yarkoni 2019)

Recall that to cargocult is to imitate a work’s surface structures while lacking a proper understanding of the actual mechanisms behind its power. This kind of behavior can be either opportunistic and knowing, putting on a show of appearances for others—as in the cult leader, cynic, or grifter—or else merely a kind of magical thinking and wish fulfillment: “The cargoculter builds a motorless airplane from palm fronds, sprinkles it with holy water, and prays to the gods for it to fly.” The psychologist builds up all the meticulous appearances of real science, and prays that his findings contribute to human knowledge. What’s more, since we live in a society that unwittingly or uncaringly surrogates appearance for reality in decision-making and evaluation—in other words, an optikratic society^[6] that lives and dies by appearances—these performances frequently do succeed in “flying,” perpetuating the optikratic incentive structure.

Yarkoni himself uses the phrase “cargocult science” to refer to the performative aspects of empiricism in psychology, and its concurrent optimization of metrics à la p-hacking:

Here, the “superficial” stands as the actually-incentivized surrogate, and the “substantive” the surrogated destination which organizations and players in the global knowledge game self-purport to navigate toward.

Ironically, it may be the case that the inexact sciences, rather than abandoning qualitative research, have merely cloaked it in the grand rhetoric of empiricism; Yarkoni concludes that “many fields of psychology currently operate under a kind of collective self-deception, using a thin sheen of quantitative rigor to mask inferences that remain, at their core, almost entirely qualitative.”

4.2.2. The sociology of surrogation: Why can’t honest actors recover corrupted systems?

Researchers, Yarkoni writes, are driven in psychology and related fields “to expend enormous resources on studies that are likely to have very little informational value even in cases where results can be consistently replicated.” Statistically and inferentially unfounded claims are passed up, from psychology, to the highest levels of public and private decision-making, altering the behavior of governments, corporations, and public institutions alike, in large part because this performance of empiricism is highly effective in lending legitimacy to psychological hypotheses. Books are published, and become bestsellers, or talks given that go viral, by psychologers who lead the public to claims and generalities that their studies do not support. There is widespread abuse and gamification of statistics of legitimization, the most well-known being p-hacking. Yarkoni presents a number of “next steps,” given this horrifying state of affairs, but they are designed for individuals: leave the field, practice slower science, present one’s findings more modestly. As a result, they miss the crucial sociological angle from whence these problems originate. There are game-theoretic forces at play here, and the structure of incentives in which the problematic behavior originates is not much altered by individual decision-making.^[7]

The first problem is that more modest claims come at the loss of power, prestige, and reputation. Not only would the field be ceding much of its previously claimed credibility, but that credibility would ostensibly drop even further on the basis of the prior deception. It would arguably take quite some time for the field’s place in public discourse to recover.

The second problem is that as individual psychologists leave the field—and this is already happening, at least insofar as graduate students high in integrity are turned off from psychology’s performative pseudoempiricism—as individual psychologists leave the field, or cease to advise public policy, or cease to make grand claims on-stage, they will be replaced by those willing to. Those who replace will on average be those with less integrity, less interest in rigorous skepticism, and less knowledge as to the limitations of their practice than those who they replace. They will then train PhD students in their techniques.

In other words, as knowledgeable insiders slowly leave the field (or choose never to join it in the first place), psychology will become increasingly dangerous and destructive until its public credibility collapses entirely. This process has been with the discipline from the beginning; academic psychologists Yoel Inbar and Michael Inzlicht report multiple occasions of “bright undergraduates” voicing complaints similar to Yarkoni’s, and we can only imagine that psychology’s inability to convincingly answer such concerns discourages those with the foresight to see it from entering. In other words, we have both a selection problem and a self-selection problem.

Social psychologist Pamela Smith; interview on Two Psychologists Four Beers (recall Bourdieu’s idea that the gossip of a field makes up some of its essential wisdom):

A third problem, related to the second, is that those psychologists who choose to stay will be out-competed, out-hired, and out-tenured compared to those who are willing to play ball with p-hacking regimes, with performative pseudoempiricism, and with the publish-or-perish emphasis on quantity over quality. Misuse raises the bar of expectation; those who optimize toward “real” science—in other words, the surrogated target—are penalized in their competition with those who more efficiently and directly optimize toward the actual metrics of promotion, advancement, and recognition—the surrogate that is “optics.” This incentive structure is real and affects not just the career prospect of individuals but the larger efficacy and service of the institution.

Finally, psychology—insofar as it can be meaningfully said to “freeride” the reputation of legitimate science, by enjoying the benefits of its perceived reputation while showing little obligation to the same rigor—will increasingly harm the overall perception of legitimacy of the sciences. We can see some taste of this in the so-called Science Wars of the late 20th C, where the failings and hubris of social sciences helped delegitimize the “hard” science, in part because the problem of psychology is the problem of inference, while of a very different scale of problematicity compared to physics, are of the same kind—the seemingly intractable problem of inference.

4.2.3. Surrogation vs. contextualization

There is good reason that Yarkoni and Paul Meehl both emphasize that much of the current crisis in psychology comes from the conventional, automatic, and uncritical surrogation of statistical measures. The alternative to surrogating the qualitative-holistic is contextualizing the metric within the qualitative-holistic, using the two as mutual agitation, a dialectic.

5. Qualitative surrogation: The tragedy of appearances

Though surrogation has been usually described in terms of the lossy compression from qualitative, inuitive, holistic judgment, and into quantitative metrics, it can occur any time a marker is substituted for what it demarcates. In signaling theory, classically, signals are external, public-facing attributes that indicate, to other organisms, a probabilistic presence of some hidden, private trait. Just like in language, with the connection between the signified and the signifier, this ability to “stand proxy for,” and represent publically, some private and hard-to-verify truth is built up through brute associative learning: an experience with the coincidence of some prominent physical marker and some attribute instill a relationship that can be meaningfully used as the basis for future inference. To illustrate just how common this kind of behavior is in our social lives, consider: how do we size up an strangers’s socioeconomic class, or extrapolate a candidate’s future performance in a hiring interview?^[8]

Unfortunately, the metonymic surrogation and reification we see play out in the sphere of metrics plays out in the qualitative signaling sphere as well. To take an example from the history of pop music, authenticity—a hard-to-measure, complex trait—has seen itself instantiated in different ways, for instance, the folk scene in Greenwich Village in the 1950s was perceived as having this reputation; the same is true in the late 20th and early 21st century of “lo-fi aesthetics”—music recorded on relatively inexpensive amateur equipment. The logic for this association was relatively straightforward, if not premised on costly signals but rather, the lack of incentives present in these domains—folk singers typically were single individuals, making almost no money, requiring only a guitar and a small performane venue (e.g. a bar or comedy club); musicians home-recording from Tascam 4-Trax did not need to pay a studio or producer’s fee, which means not needing label support. In both cases, there is a lack of financial pressure, with the recognition that such pressure tends to corrode or compromise an audience ideal of “aesthetic integrity”—the vision of the artist, rather than a catering to the listener.

When such fields of production were ignored, and there was no money available for their agents, there was a meaningful sense in which these associations were costly: artists which cared more about autonomy would forego the income and reputation that label support might afford them. When the scenes began to attract attention, however, there was a quick free-rider effect of acting as if: there was nothing intrinsic to performing on an acoustic guitar, or having audio distortion due to poor compression capabilities of recording hardware, that was more “honest,” and by imitating all the aesthetic residue and markers—the associated surface signals—of authenticity, acts would see authenticity conferred on them in turn.

This burgeoning fetishization of surface aesthetics still permeates the independent music scene, where tape warble and white noise, vocal clipping and compression, is deployed tactically to give a certain affective impression—and since the affect is so fleeting, who could make an accusation of falsehood “stick”? This is one case of surrogation: by incentivizing compliance to a set of surface qualities, in a purported bid for monitoring and securing authenticity, musicians and labels are, in actuality, ironically encouraged to falsify their own material origins and capacities.

It was against this backdrop we can understand Dylan’s 1965 performance at Newport Folk Festival—an incident with its own encyclopedia page, the “Electric Dylan controversy,” and the flipside to this surrogation. We can see footage today of the set: Dylan, performing the exact same songs that had been heralded, and borderline sanctified, for their honesty and activism, but performing with an electric, rather than acoustic guitar. Dylan had “plugged in”; the widespread sentiment was that in doing so, Dylan had “sold out,” was no longer a performer of integrity, on the basis of a new guitar sound. Without playing down the complexities of the historical situation—without denying that there is something legitimate about anger over symbols, and that the mythologization of this event undoubtedly has led it to be exaggerated—how else can we make sense of the outrage that followed, than as the reification of an associated but causally distinct measure, than as the surrogation of a complex trait like “authenticity” for a much simpler one, the way one speaks or the instrument one plays? The reception lasted for years in Dylan’s tours, jeers of “Judas” from the crowd.

The imitation of surface attributes, rather than causal mechanisms, is a common one in beginning artists. In Arthur Danto’s book-length profile of Warhol, we encounter Warhol’s early imitation of AbEx “paint drips,” his belief that it was somehow critical to the painting project:

Whereas, for the original Abstract Expressionists, paint drips were a byproduct of a technique that embodied an ideology of art (an ideology much in line with the emphasis on spontaneity and honesty found also in folk music). But here that very byproduct is lifted out of its context and treated as a goal in its own right. Amidst these performances, which are often enough to fool critics, genuine embodiments of qualities like innovation or integrity go unrecognized, while regurgitation disguised by savvy signaling is showered in praise. Today in many visual art cultures, the aesthetics of a “zine”—themselves artifacts of copymachine technologies from the 1990s, as pioneered by groups like Riot grrrl—surrogate the proxied-for qualities, and are perceived as somehow “more DIY” than those projects made with contemporary projects. Filmmakers who wish to be perceived as experimental will engage in the now-antiquated techniques of avant-gardes past, in order to seem “of a kind” with their hallowed paters.

This historical residue is all around us—it is the lingering ooze of prestige past, available for any who care more about said prestige than the field. We can call its effects retrolegitimation. And yet, considered this way—as the anemic surrogate, a pretense as if—the appeal to retrolegitimation, and the presence of this residue in works ought to be treated as a reverse indicator, as zombie art animated by the hungover associations of eras past. AD Jameson describes the dynamic:

Imitation of a canon is obviously antithetical to the spirit of experimentalism. And yet “the film students of today frequently make work that employs those techniques [associated with historical experimentalism]. The question then becomes: Are they making experimental films?” We can leave quibbling over labels to art historians while confidently assessing that the original target of experimental practice has been lost, surrogated for those techniques which are known, in the critical and public sphere, to have accompanied it—and which are still met, by critics and elite audiences, with the prestige accorded the originals.

And is against this backdrop—the nefarious surrogation of real efforts into cardboard cutouts, surface signaling replacing genuine embodiment—that we can understand the emergence of showy, fantasy-ridden, egoic and artificial glam rock in the early 1970s, as well as the disdain that it raised. The pop studies scholar Simon Reynolds, in his book on glam Shock & Awe, sets the scene for us with an illustration of surrogation in 60s theater:

This is both in the sense that all naturalness is “technically” a performance, and also that the performance had become increasingly and meaningfully more conscious, strategic, and commercial. Glam, as Reynolds shows, took the strongest symbols of Sixties “natural honesty”—hair and nudity—and mocked them with makeup, costume, and dye. What it was really mocking was surrogation—the dangerously cheerful illusion that we can fetishize a measure and it will continue to tell us the truth about its subject. How else can we understand these great developments in the history of pop, other than as products of freeriding and surrogation, of symbols reified as the things themselves?

6. Mitigations

6.1. Supplementation vs. surrogation

Measurements alone cannot devastate; they are merely a second source of information. It is when other forms of evaluation are surrogated to measurements, and become instituted metrics, that causes trouble—we now have less information rather than more.

The issue isn’t the introduction of metrics, which have in fact gotten rid of many human biases, the personal grudges of managers, the inconsistencies of distributed bureaucracies, etc. It’s the substitution of holistic qualitative eval for reductive quantitative eval.

But treat them as complementary, you get somewhere. Metrics strip context, but allow the evaluated employee to re-insert context by narrativizing their metrics? We’re starting to iron out the kinks.

Similarly, employees self-narrativizing is great for qualitative richness, but shitty when it comes to reliable, grounded, no-bullshit understandings of performance. Metrics ground them in reality, and shrink the space of fabricated unrealities.

We should’ve expected this in the first place, right? It’s basic Pareto frontier stuff. Single-minded solutions have serious drawbacks, no matter which you pick. Start combining them and you start getting balanced feedback.

6.1.1. System 1 and System 2

6.1.2. Developing an aesthetic

Quoting Ben Connable’s Rand study Embracing the Fog of War: Assessment and Metrics in Counterinsurgency:

6.2. Tracking intuitions

An alternative approach to combining System 1 and System 2 thought—and a proven effective approach to measurement—is to track locals’ subjective, acted-upon, skin-in-the-game assessments of complex situations, e.g. tracking the stability of Afghanistan through the price of market goods:

(This is similar to the approach in ethnomethodology, which attempts to understand sociality through demonstrated behavioral adaptation and response to stimuli.)

6.3. Minimizing measurement surrogation (Goodhart-Campbell)

7. Works Cited

Choi, Hecht, and Tayler (2012). “Strategy Selection, Surrogation, and Strategic Performance Measurement Systems”. Journal of Accounting Research. 51 (1): 105–133. https://doi:10.1111/j.1475-679X.2012.00465.x.

Choi, Hecht, and Tayler (2011). “Lost in Translation: The Effects of Incentive Compensation on Strategy Surrogation.” The Accounting Review. 87 (4): 1135–1163. https://doi:10.2308/accr-10273.

Huemer, Michael (2012). “In Praise of Passivity.” Studia Humana 1,2 (2012): 12-28.

Jackall, Robert (1988). Moral Mazes: The World of Corporate Managers. Oxford University Press.

Kahneman and Tversky (1973). “On the Psychology of Prediction”. Psychological Review. 80 (4): 237–51. https://doi:10.1037/h0034747.

Manheim, David (2019). “Multiparty Dynamics and Failure Modes for Machine Learning and Artificial Intelligence.” Artificial Superintelligence: Coordination & Strategy.

Ortega and Maini et al (2018). “Building safe artificial intelligence: specification, robustness, and assurance.” DeepMind Safety Research.

Yarkoni, Tal (2019). “The Generalizability Crisis.” PsyArXiv. November 22. https://doi:10.31234/osf.io/jqw35.

8. Footnotes

In this way, the realm of social reputation is prone to a range of related dynamics in which the gap between appearance—assumed signals or markers of some hidden reality—and that reality itself causes problems. ↩︎
Partially this is described by the “barberpole” metaphor of fashion, “where lower classes continually imitate higher classes, who are themselves engaged in a continual quest for “distinction” from the chasing masses… Its cyclical nature is the result of limited options and a continual evasion of freeriders who exploit an associative proxy: clothing for caste” (Reason 2020). ↩︎
Yarkoni specifically refers to psychology as engaging in “cargocult science” in (2019). ↩︎
One reason intuition and subjective judgment have been given over, so easily, to quantitative measures is the case made by behavioral econ against such judgments. (In other words, the defense of pseudoempiricism comes from pseudoempirical work.) And yet our ability to parse highly indexical social signals, or linguistic expressions, within an ecological context, and to an extent that consistently defies our best computational models, testifies to our ignored “indexical genius.” As Perry puts it in “Ignorance, a skilled practice”: “Overconfidence in the global knowledge game, especially [among the] social sciences, threatens the production and appreciation of the genuine kind of indexical knowledge that humans are geniuses at producing and using” (2020). ↩︎
The connection with legibility: a lack of modesty toward complexity, and a lack of faith in the hard-to-scrutinize. In surrogation, the lack of faith lies in judgment; in legibility, in the forces of cultural evolution and the local emergence of sense-making and self-order. ↩︎
Looking to the most sclerotic and dysfunctional arenas of our society—politics, policing, institutionalized art, among others—suggests that one of the main problems of the modern world is a much softer type than the traditional corruption. Our society is not so much meritocratic as it is optikratic: to be seen is to have power, just as being seen to have power is also to have it, and power is awarded not on the basis of “actual” (in the ideal sense) merit or value, but on the basis of sporting their appearance. This is at once utterly obvious—one’s impression of an object is all one can, finally, operate off, and thus there “is no other way”—and at the same time seriously non-trivial: the translation of holistic quality to public appearance is lossier than usually assumed or acted-upon. ↩︎
Actions like Yarkoni’s which alter the common knowledge of the field and thus potentially alter its internal incentive structure, may improve the situation negligibly. ↩︎
There is also the fact that those associations which endure—which are robust to destruction by free-riding—must be built on a logic of cost disparity: the public marker must be disproportionately cheap to exhibit for those who really possess the private quality, and accordingly, be disproportionately expensive to those who do not. A middle-class individual might be able to afford a Rolex, or a very expensive car, but it so impacts his finances and freedoms, requires such significant sacrifices that it is only rarely and in specific circumstances worth it, in the final cost-benefit analysis, to “purchase” a signal. For the upper-class individual, such expenditures involve very little sacrifice at all, and the small gain in status from parading such luxury goods will outweigh their cost. There are many associative signals that are not robust, however; though they may be ridden into oblivion over months or years, they still exert a presence in daily life. ↩︎