This is the fifth section in a series on modeling AI governance. You might want to start at the beginning.
In this post, we’ll use the safety-performance model to explore the question of whether we should give higher or lower factor prices to agents that are more concerned about safety. Recall that an agent’s net payoff is
Here, we want to figure out how we should tweak in cases where is higher for one of the competitors relative to the others. The parameter is the cost that an agent will face (or thinks they will face) in the event of a disaster, so we can think of an agent with a higher as being more safety conscious.
In the plot below, I show how the equilibrium safety changes in a scenario where both players have the same disaster cost as I increase that disaster cost.
You can see that, as increases, their chosen level of safety increases consistently. Now, what happens if we increase just for one of the players? I show this below, also including a plot for performance:
We see that as player 2’s disaster cost rises, they increase their safety. Player 1 also increases their safety a bit, since they now don’t have to devote as many resources to competing with player 2.
Now, what happens if we give player 2 a higher (make them more safety-conscious) and give the players different factor prices? In the next plots I show a couple of scenarios where I make player 2 more safety-conscious then increase either for player 1 or player 2.
The safety scores for each player can be hard to interpret, so I’ve also calculated the probability of a safe outcome in each scenario. (Recall that a player’s safety is the odds that they cause a disaster, and the probability of a safe outcome is the product .)
Interesting! It looks like the case in which we give the safety-conscious player (player 2) a higher is safer than the case where we give the other player (player 1) a higher , with the difference in safety increasing as increases. At least with the assumptions here, if we have to charge someone more, we should charge the safety-conscious player more.
In this example, I assumed . You may remember that in symmetric scenarios, safety increases with if and only if . That’s what we have here (which intuitively explains why see safety decreasing when we increase for either player), so it’s worth asking whether we might see something different (i.e., higher safety when we charge the non-safety-conscious player more) if we have . In this spirit, the next graph shows the result when , and .
Here, the probability of a safe outcome increases as we increase either player’s , but we still see that it’s better to have a relatively higher price for the safety-conscious player.
What is going on here? We might intuitively expect that we should give the safety-conscious player a lower factor price, enabling them to have an edge over their competitors, but this seems to be telling us that’s not true: the safety-conscious player puts relatively more resources into safety, so if we have to charge someone a higher factor price, it’s better to charge the person who will respond by changing safety relatively less than they change performance. Note that this doesn’t mean that charging safety-conscious players more is always good, as we can see clearly in the case where ; rather, it just means that if we can only tax or subsidize one player, it’s best to do so in a way that gives the safety-conscious player(s) a relatively higher factor price.
So far we’ve assumed that the players differ only in their perceptions of disaster cost () and the factor price we give them (). What if we assume that one of the players has a higher productivity than the other?
Let’s start here by saying that (meaning that player 2 can produce performance at a lower cost than player 1). We’re still assuming that player 2 is the safety-conscious player – i.e., – so we’re basically curious here about scenarios in which a safety-conscious actor gets some sort of technological lead that allows them an edge over their competitors. As far as safety goes, should we give them a higher or lower than their competitor(s)? Below I show the resulting strategies for a range of values of for the case where and the case where .
The thing to pay attention to here is the difference between the red and blue lines. At sufficiently low and high values of , it’s best to charge player 2 (the safety-conscious and more productive player) a higher factor price, though there is an interval in which it’s better to charge player 2 a lower factor price.
The same general pattern holds if we tweak the assumed values for , , , etc., while maintaining , though the location of the points where the blue and red lines intersect will change as we change the parameters. One thing to note is that, if we increase the difference in between player 1 and player 2, the intermediate interval shrinks, meaning that if player 2 is a lot more safety-conscious than player 1, then there are very few values of for which it’s better to give player 2 a lower .
If, instead of assuming , we assume that (the safety-conscious player produces performance less efficiently), then we again get that’s it’s always better to give the safety-conscious player a higher . The same goes if we assume that , but (meaning that the safety-conscious player produces safety more efficiently) – this result defies intuition (at least for me), since it indicates that even if we have an actor who both cares more about safety and is better at producing it, we should charge higher prices for their factors of production than we charge their competitors.
Finally, let’s assume that (meaning that the safety-conscious player is less efficient at producing safety). Now, we get results like the one shown in the following graph:
Here, it’s better to charge the safety-conscious player a lower while is low, after which point, charging them a higher is better. We don’t see the two lines intersect again later.
What’s the takeaway from this section? We seem to be pretty consistently getting that we should charge safety-conscious players higher factor prices than we charge their competitors, with some small caveats. I’m not 100% sure right now what drives this, but it seems like the basic idea is that, if we start charging a non-safety-conscious (low ) actor more, they’ll respond by reducing their safety more than (or not increasing their safety as much as) a more safety-conscious actor would in the same situation, simply because they value safety less. So if we have to charge somebody more, it’s best to charge the safety-conscious actor more, since we can count on them to respond with a trade-off that’s better for safety.
I suspect that there are some implicit assumptions in the model that are key to generating this result. For example, in this model, any player can cause a disaster, even if they don’t “win” the performance competition, although with the probability of them doing so does depend on their performance. This is in contrast to a model like Armstrong, Bostrom, & Shulman’s “Racing to the precipice,” where only the most performant player can cause a disaster. In that model, if a player knows that they can’t get the highest level of performance, they simply choose not to compete (implicitly giving zero probability of causing a disaster); in this model, there is always an incentive for players to purchase at least a small amount of performance, which comes at the expense of safety. This leads to things like what we’ve seen here: in a lot of cases, giving a safety-conscious player a price advantage doesn’t cause their competitors to give up but rather to try riskier strategies to remain competitive.