This is the third section in a series on modeling AI governance. You might want to start at the beginning.
Building from our AI governance model, let’s make some assumptions that allow us to describe a scenario in which we have an explicit tradeoff between safety and performance.
We start with our basic model, where the manager’s payoff is with .
Let’s first assume that consists of two inputs, and , which represent effort put into safety and performance, respectively. We’ll then write the net cost as where is a scalar per-unit factor cost. We\‘ll assume for now that is a concave, increasing function on .
We now want to define so that is a random variable that tends to be higher when is greater (it tends upward with increasing performance), and has lower variance when is greater (it becomes more predictable with increasing safety). One way to do this is simply to let
although that is somewhat arbitrary; really any random function where the mean increases in and the variance increases in will work (though some might be more appropriate).
If we do this, then, because we assumed to be concave, the manager is risk-averse and thus has an incentive to increase .
Let’s assume for now that there aren’t any transfers from the public to the manager, assuming instead that the manager receives only the intrinsic benefit , and look more closely at the manager’s optimization problem: the manager chooses to maximize
We can write the above expectation as an integral, where is the density function for given parameters and . For an interior solution, the manager should choose and so that the marginal benefit from each is equal to the marginal cost:
What does this mean?
Let’s take a simple example to get a feel for how this works. Let’s suppose that we can have either a “good” outcome or a “bad” outcome. Increasing increases the magnitude of the outcome (either good or bad), and increasing increases the probability that we get the good outcome:
Notice that in this example, increasing performance without also increasing safety is bad for a risk-neutral manager, since doing so increases the severity of a bad outcome.
A quick note: why are we using for the probability of a good outcome? Why not just use ? This is just because we want to be able to any positive number, but the probability needs to be between 0 and 1. therefore represents the odds of a good outcome rather than the probability.
Let’s suppose that manager’s intrinsic payoff is . The expected intrinsic payoff is
To hone our intuition, we can take a look at what this payoff is for a range of values of and :
In the first plot, we see that, with performance held constant, more safety is always helpful. In the second plot, we see that with safety held constant, more performance is only good up to a point, and thereafter increasing performance is too risky.
We can also solve for the choices of and that maximize the manager’s net payoff for a range of values of – this gives us the following plots of solutions:
Both safety and performance decrease with an increase in the per-unit factor cost. That, of course, doesn’t have to be the case in general – exploring variations of this model where, e.g., safety increases with may be worthwhile (and I’m already working on this elsewhere).
Now let’s consider the public’s side of things: an important class of problems is where some social planner wants to choose some transfer rule to maximize public welfare
subject to the manager’s optimization, as discussed above, but with the transfers added to the manager’s payoff:
Typically, we also set some “individual rationality” condition that stipulates that the manager must be able to achieve some minimum expected payoff under a proposed scheme of transfers. This problem is, at its core, not too tricky: we expect the social planner to set to reward values of that are good for the public (for which is high) and penalize values that are bad for the public (for which is low). An important question here is why we would expect transfers to be necessary in the first place – after all, if the makes choices that optimize of their own accord, then no transfers will be necessary.
One scenario is where is some sort of public good that the manager pays to produce but the public also benefits from. In this case, the manager may not produce enough on average from the perspective of the public, so the public would be willing to pay the manager to produce more: the simplest case of this would be where , so all the manager’s payoff comes from payments from the public – in that case, the manager has no intrinsic incentive to produce and only does so because the public wants them to.
Another, potentially more interesting, scenario, is where the public’s welfare differs from the manager’s in some important way; we’ll focus here on cases where the public is more risk-averse than the manager. Below, I show some example plots where the manager’s intrinsic payoff is but the public’s payoff is which represents a higher risk averseness for the public.
In the first plot, you can see that the public benefits more from higher levels of safety. In the second plot, you can see that the public has a strong preference for lower levels of performance at any given level of safety. Clearly, the difference in risk averseness gives the public a reason to change the incentives faced by the manager. Figuring out the optimal incentives (as expressed through transfers) for a variety of scenarios seems like a useful area of inquiry.
The examples given here are only meant to give an idea of how this all works; there are lots of other interesting assumptions we could make. In the next section, we’ll examine what happens in this model when we have multiple, competing managers.