Subsidies/taxes for safety-conscious actors

This is the fifth section in a series on modeling AI governance. You might want to start at the beginning.

In this post, we’ll use the safety-performance model to explore the question of whether we should give higher or lower factor prices to agents that are more concerned about safety. Recall that an agent’s net payoff is

(j=1nsj1+sj)(pij=1npj)(1j=1nsj1+sj)dir(Xs+Xp).\left( \prod_{j=1}^n \frac{s_j}{1 + s_j} \right) \left( \frac{p_i}{\sum_{j=1}^n p_j} \right) - \left(1 - \prod_{j=1}^n \frac{s_j}{1+s_j} \right) d_i - r(X_s + X_p).

Here, we want to figure out how we should tweak rr in cases where did_i is higher for one of the competitors relative to the others. The parameter did_i is the cost that an agent will face (or thinks they will face) in the event of a disaster, so we can think of an agent with a higher did_i as being more safety conscious.

Changes in dd

In the plot below, I show how the equilibrium safety changes in a scenario where both players have the same disaster cost as I increase that disaster cost.

Plot is loading...

You can see that, as dd increases, their chosen level of safety increases consistently. Now, what happens if we increase dd just for one of the players? I show this below, also including a plot for performance:

Plot is loading...
Plot is loading...

We see that as player 2’s disaster cost rises, they increase their safety. Player 1 also increases their safety a bit, since they now don’t have to devote as many resources to competing with player 2.

Differences in rr when one player is more safety-conscious

Now, what happens if we give player 2 a higher dd (make them more safety-conscious) and give the players different factor prices? In the next plots I show a couple of scenarios where I make player 2 more safety-conscious then increase rr either for player 1 or player 2.

Plot is loading...
Plot is loading...

The safety scores for each player can be hard to interpret, so I’ve also calculated the probability of a safe outcome in each scenario. (Recall that a player’s safety is the odds that they cause a disaster, and the probability of a safe outcome is the product jsj/(1+sj)\prod_j s_j / (1+s_j).)

Plot is loading...

Interesting! It looks like the case in which we give the safety-conscious player (player 2) a higher rr is safer than the case where we give the other player (player 1) a higher rr, with the difference in safety increasing as Δr\Delta r increases. At least with the assumptions here, if we have to charge someone more, we should charge the safety-conscious player more.

In this example, I assumed α=β=θ=0.5\alpha = \beta = \theta = 0.5. You may remember that in symmetric scenarios, safety increases with rr if and only if α<θβ\alpha < \theta \beta. That’s what we have here (which intuitively explains why see safety decreasing when we increase rr for either player), so it’s worth asking whether we might see something different (i.e., higher safety when we charge the non-safety-conscious player more) if we have α>θβ\alpha > \theta \beta. In this spirit, the next graph shows the result when α=β=0.5\alpha = \beta = 0.5, and θ=1.5\theta = 1.5.

Plot is loading...

Here, the probability of a safe outcome increases as we increase either player’s rr, but we still see that it’s better to have a relatively higher price for the safety-conscious player.

What is going on here? We might intuitively expect that we should give the safety-conscious player a lower factor price, enabling them to have an edge over their competitors, but this seems to be telling us that’s not true: the safety-conscious player puts relatively more resources into safety, so if we have to charge someone a higher factor price, it’s better to charge the person who will respond by changing safety relatively less than they change performance. Note that this doesn’t mean that charging safety-conscious players more is always good, as we can see clearly in the case where α<θβ\alpha < \theta \beta; rather, it just means that if we can only tax or subsidize one player, it’s best to do so in a way that gives the safety-conscious player(s) a relatively higher factor price.

Differences in rr when one player is more safety-conscious and more productive

So far we’ve assumed that the players differ only in their perceptions of disaster cost (dd) and the factor price we give them (rr). What if we assume that one of the players has a higher productivity than the other?

Let’s start here by saying that B2>B1B_2 > B_1 (meaning that player 2 can produce performance at a lower cost than player 1). We’re still assuming that player 2 is the safety-conscious player – i.e., d2>d1d_2 > d_1 – so we’re basically curious here about scenarios in which a safety-conscious actor gets some sort of technological lead that allows them an edge over their competitors. As far as safety goes, should we give them a higher or lower rr than their competitor(s)? Below I show the resulting strategies for a range of values of θ\theta for the case where r1>r2r_1 > r_2 and the case where r2>r1r_2 > r_1.

Plot is loading...

The thing to pay attention to here is the difference between the red and blue lines. At sufficiently low and high values of θ\theta, it’s best to charge player 2 (the safety-conscious and more productive player) a higher factor price, though there is an interval in which it’s better to charge player 2 a lower factor price.

The same general pattern holds if we tweak the assumed values for rr, dd, BB, etc., while maintaining B2>B1B_2 > B_1, though the location of the points where the blue and red lines intersect will change as we change the parameters. One thing to note is that, if we increase the difference in dd between player 1 and player 2, the intermediate interval shrinks, meaning that if player 2 is a lot more safety-conscious than player 1, then there are very few values of θ\theta for which it’s better to give player 2 a lower rr.

If, instead of assuming B2>B1B_2 > B_1, we assume that B1>B2B_1 > B_2 (the safety-conscious player produces performance less efficiently), then we again get that’s it’s always better to give the safety-conscious player a higher rr. The same goes if we assume that B1=B2B_1 = B_2, but A2>A1A_2 > A_1 (meaning that the safety-conscious player produces safety more efficiently) – this result defies intuition (at least for me), since it indicates that even if we have an actor who both cares more about safety and is better at producing it, we should charge higher prices for their factors of production than we charge their competitors.

Finally, let’s assume that A1>A2A_1 > A_2 (meaning that the safety-conscious player is less efficient at producing safety). Now, we get results like the one shown in the following graph:

Plot is loading...

Here, it’s better to charge the safety-conscious player a lower rr while thetatheta is low, after which point, charging them a higher rr is better. We don’t see the two lines intersect again later.

What’s the takeaway from this section? We seem to be pretty consistently getting that we should charge safety-conscious players higher factor prices than we charge their competitors, with some small caveats. I’m not 100% sure right now what drives this, but it seems like the basic idea is that, if we start charging a non-safety-conscious (low dd) actor more, they’ll respond by reducing their safety more than (or not increasing their safety as much as) a more safety-conscious actor would in the same situation, simply because they value safety less. So if we have to charge somebody more, it’s best to charge the safety-conscious actor more, since we can count on them to respond with a trade-off that’s better for safety.

I suspect that there are some implicit assumptions in the model that are key to generating this result. For example, in this model, any player can cause a disaster, even if they don’t “win” the performance competition, although with θ>0\theta > 0 the probability of them doing so does depend on their performance. This is in contrast to a model like Armstrong, Bostrom, & Shulman’s “Racing to the precipice,” where only the most performant player can cause a disaster. In that model, if a player knows that they can’t get the highest level of performance, they simply choose not to compete (implicitly giving zero probability of causing a disaster); in this model, there is always an incentive for players to purchase at least a small amount of performance, which comes at the expense of safety. This leads to things like what we’ve seen here: in a lot of cases, giving a safety-conscious player a price advantage doesn’t cause their competitors to give up but rather to try riskier strategies to remain competitive.

goodness, truth, and quiet, snowy mornings