## Reflexive Theory of Mind Reasoning in Games

JUN ZHANG: OK, thank you. Thank you, Tommy, for

giving me this opportunity to give a lecture on one

of my research topics. And I’m very glad to

come here and to talk with many of my colleagues’

friends and the students here. That quote that

Tommy just apparently looked up my website, which

has yet to be updated, is actually taken

from Jerome Busemeyer. He said that life is

complex because it has both real and imaginary parts. I have another quote, I think,

on my website which I like, which talks about

modeling, actually. The story goes that,

say, OK, trains in Austria usually running late. So one day a passenger

kind of waited on the train in

the train station and complained about

this and asked, well, if the train’s

always late– well, he complained to

the station master. He said, if the trains

are always late, why do have to print

out the timetable? What’s the use of the timetable? And then the station

master looked at him and said, OK, without

the timetable, how do you know

the train is late? I think this is quite

an interesting remark in terms of the way how

models and the data interact. We built models, to

some extent, to serve as a kind of a benchmark, a

way of thinking about the data, to think about the

underlying process. So with that, I would like

to just now talk about, as Tommy said, my

background of coming out from somebody who

actually first studies theoretical physics

in China and then got a PhD in neuroscience and

then now in a psychology department with a

goal of modeling a psychological process,

modeling the mind. And the approach

which I have taken is trying to look

at the mind as kind of a computational

device which runs kind of a software,

a neural hardware. OK, it’s a particular

kind of a view on computational intelligence. I therefore have this quote of

mind, machine, and mathematics. I’ve done a variety

of work, but today the topic I’ll be

talking about is so called the theory of mind. The notional of theory of

mind has its particular kind of a meaning in

developmental psychology or in psychology in

general, as evolved as a core domain for cognition. Here’s a quote from Henry

[? Willman’s ?] book. Basically, he just

says that, well, we have this lay view about,

say, our perception, which forms the basis of our belief. And we have basic emotions

underlying neurophysiology which give rise to our desire. And then based on our belief

and desire, we take actions. And then we get

reactions from others and from our environment. And therefore from perception,

belief, emotion, desire, action, and so forth,

we have a generic kind of a theory about this

belief, desire, psychology. And this emerges in the

study of human cognition, developmental cognition, the

development of cognition. Very early on in a

child’s life, he or she will need to acquire this

kind of a general model of the notion of a mind. So the theory-of-mind

type of reasoning, that is to say reasoning

about the beliefs and desire and actions, can be looked at

from a more formal prospective via the so-called game theory. Well, game theory

emerges out of economics. It’s a framework for model,

interpersonal, strategic interaction. So the basic

ingredients of a game is that you have a set of

players, a set of players who individually can take actions. And based on the joint actions

of these individual players, there is an outcome. So in other words an outcome is

a result of the combined action of all the players. Now, given an outcome,

different players can have different values or

payoffs for these outcomes. So the outcomes are determined

by actions of all the players and the values of

each of the outcomes. Or the outcome that’s resulted

is valued possibly differently by all the different players. So you have this player,

outcome, and pay off, which are the basic

ingredients for a game. Now, in terms of the

solutions for the game, the formal game-theoretic

solutions of the game, people have various notions. One for instance is the

notion of best reply, which involves a modelling

of the other person’s action and trying to devise

a best response to the actions of others. The idea of equilibrium is that

somehow the joint reasoning of the players leads to a state

in which no individuals would like to deviate from

that state of affairs. I’m going to get into

more details later on. In the economic

studies of the game, there’s some fundamental

assumptions being made. And these assumptions are the

axioms of common knowledge. That is so each player would

know the structure of the game, know the players,

know their actions, and know the payoffs or

the structure of the game. And this is all

common knowledge. Now being common

knowledge, that is to say that players know

the structure, the nature, the process of the game and know

that other players also know the strategies, the

payoffs of the game, and know that other players

know that they know the game structure and so forth. So this is a part of

the common knowledge. So you know something. You know others know something. And you know that

others know that you know something, et cetera. So it forms a common knowledge. And this axiom of common

knowledge is very important. It plays an important

role, for instance, for that notion of

the Nash equilibrium, for the equilibrium idea. And then along with

that is the notion of the axiom of rationality. In other words, players

tend to act rationally. Now, here’s a little

bit of trouble about this notion

of the rationality in the case of the games. And this is one of

the questions that we are going to address, a part of

this notion of the rationality. What we submit and

what we argue is that while we can

well define the notion of the so-called instrumental

rationality, that is to say once we have

a model of the game situation, a model of the other

players’ action in the game, then we can act rationally

as a best response, as a best action,

out of our model. But nevertheless,

this model itself, the modelling of others

in a game setting, is a much more involved process. It involves a recursive

kind of a reasoning, which is yet to be explored. So let me just give a detailed

exposition of these concepts for the case of, say, for

instance, the prison dilemma game and then the notion of the

Nash equilibrium in the prison dilemma game. So a prison dilemma

game as shown on the right– this is adapting

the strategies to say you have two players. Each can have a choice of

being selfish or altruistic. And the numbers in the

cells represent the payoffs to these individuals,

with a smaller number meaning less of the

value and larger numbers being more or

higher of the value. OK, so five is more than

three and more than one, more than zero. So now if both players

act altruistically, say, for both players

you can get three points. But if one player acts

selfishly and the other person acts altruistically, the

player who plays selfish would get five points,

whereas the person who played altruistically

would get zero points. And if both play selfishly,

then they both get one point. OK, so this is a

standard setting for a so-called

prisoner dilemma game. So for this game, this has

puzzled many, many people. For this kind of a game,

if you do an analysis as to what players

would do, what people would do in this

kind of setting, the game-theoretic

analysts say, for instance, the notion of the

Nash equilibrium basically says that

how people would choose would be in the state of

affair in which nobody would want to unilaterally deviate. So in other words, if

somehow the players have chosen a

particular strategy, knowing that other players

have chosen that strategy, no player has the incentive

to unilaterally deviate, on other words, to change if the

other player will not change. So let’s take a look at this. For instance, the

three-three cell. That cell would not

be a Nash equilibrium because if we are in

the three-three cell, the role player would change

to the other strategy, the selfish strategy,

while gaining more points, a three becomes a five. And likewise, the

column player would also want to deviate from his or

her choice to get a five. So three-three, the

cooperative solution for this prison dilemma gain,

is not a Nash equilibrium. Whereas the one-one,

the selfish-selfish, or the non-cooperative solution,

that is a Nash equilibrium because nobody would like

to deviate unilaterally without damaging him or

herself, without getting a lower pay off. OK, so that’s the notion

of the Nash equilibrium. One way to explain or to solve

this prison dilemma game is they say, OK, use this Nash

equilibrium based idea, which is basically saying that

it’s a mutual best response based on like the theory of

mind kind of a modeling of what others would do. That is to say, whether others

would stick with the choice, this is the foundation

for the Nash equilibrium, a unilateral deviation. Now, extensive

psychological research showed that this notion

of the Nash equilibrium would not be valid with

actual humans, when they’re engaging those games. So this is showing that

one-one would be the Nash equilibrium in this case. Nobody would want to

unilaterally deviate. Now, another notion

in solving the game is the notion of the

so-called dominance strategy. A dominating strategy

is a strategy such that regardless of what others do,

it’s better for the player to choose one strategy

versus the other, one action versus the other. So now in this case, if you

look at the choices of the two players here, you soon

realize that the one-one cell, the selfish-selfish, is

actually a dominating strategy for both players, for both

the row and column players. In other words, for the

row and column players, for the players

in this game, you do not need to know what the

other person would have chosen. It is always better for

you to choose selfish. Why? Because if the other person

chooses selfish, of course, you should choose selfish. But if the other person

chooses altruistic, you should also choose

selfish because it gets you higher points for this game. So the one-one solution, the

non-cooperative solution, is actually very,

very stable because it is a combination of the

dominating strategies of the two players. A dominating strategy is

one that does not rely on a modelling of the other. You really don’t even need

to kind of extensively model the other player. But just by analysis

of this, you are always better off in

choosing dominating strategy. So for the prison dilemma

game, the only way when you can actually get out of

this non-cooperative solution– OK, so the question is when

cooperation in the prison dilemma game can be

individually rational, can be rational

to the individual or to the players themselves as

an instrumental, rational kind of strategy. Well, it turns out that the

only way for that to happen is when you are not playing

this as a single-shot game. And you have to assume there’s

a probability of continuation. There is a probability

of continued interaction. So the nonzero probability

of continued interaction is a necessary condition for

this prisoner dilemma game to evolve a

cooperative solution. So basically in those

cases, now a rational player would maximize a

total expected payoff, taking into account the

probability of continuation, of continued interaction. What happens is that when that

is taken into account, then you can basically transform the

payoff values of your actions. And what happens in this

case is that your play, your actual choice

in the current gain, not only will give rise

to some immediate payoff but also would influence

the other person’s choice for future games, in

future rounds of the same game. So therefore, your

action not only impacts your immediate

reward, immediate gain, but also impacts on how others

would act for future games. So therefore, the

cooperation will arise. So it turns out that after

very rigorous analysis of this thought, one can show

that the sufficient condition for this to happen is when

the continuing probability has to exceed a certain

amount of threshold. So to rationally solve

the prison dilemma game, one requirement

is that you would have to have continued

interaction or a presumed probability or expectation

of continued interaction. So that kind of shows

that all these kind of cooperative solutions

in the prison dilemma game would rely on that

kind of expectation, which may be neurobiologically

grounded into the brain through evolution, for instance. So that is a rational

solution for cooperation. Now, I’m not going to expand

on this type of analysis for the prison

dilemma game and how cooperation arrives in this. And I have some

theoretical results, a theoretical paper on this. But rather today, I’m going

to talk about a related issue about this recursive

depth in this theory of mind reasoning, a recursion in

theory of mind reasoning. OK, so to explain

what that is, I’m going to give you an example,

the so-called p-beauty contest game. Now suppose we have,

say, the audience here. Suppose I asked you to submit

a number between 0 and 100. And you’re going to write

it on a piece of paper and hand it to me. I’m going to collect

all the papers, and I’m going to do an

average of all of the numbers that you submitted. And then I’m going to

multiply 2/3 of that average. So I got some number. After multiplying 2/3 of the

average, I get some number. Now, the rule of

the game is that whoever submits

the number which is closest to the 2/3 of my

average and not exceeding that, that person

would win the game. OK, so everybody gets a

chance to, say, submit a number between 0 and 100. You submit any number. To make it simple, you can

maybe just submit an integer. And I’m going to take an average

of this and multiply 2/3, and that’s my target number. You are going to shoot

for the target number. Whoever gets closest

to that number would win, say, a big prize. I actually run this for my

mathematical psychology class, and by actually

running this experiment and saying that whoever

wins this really can get a boost in their grade. First, think about this. What number would you submit? Suppose this is a very important

kind of consequence for you. What number would you submit? I want you to think maybe for

one minute, a couple minutes maybe. AUDIENCE: 30. JUN ZHANG: So

that’s then 30 here. Right, 30? OK. OK, so actually let’s go

through some of the reasoning that you might have in thinking

about what number to submit. OK, now, in order to win this

game, I’m trying to submit. So everyone will submit

between 0 to 100, but I’m going to take the

average and do 2/3 to it. So the number, the average which

I get, can not be more than 67. Assuming everybody

submits 100, average 100. 2/3 of that is 67. So there’s no way I can

win by submitting anything above 67 because I need to

be below my target number. So I should not submit

anything above 67. But now realizing that maybe

the person that sits next to me thinks the same thing,

they realize that, too, so they think nobody will

submit anything above 67. So then if everybody submits

67, I’m going to take 2/3 of 67, which is 44. So maybe I should

submit, like, 44. And then I realize

that other people may realize the same thing. So I can [INAUDIBLE] to redo

my calculations and so forth. So I submit 29 and so forth. If you keep doing

this, very soon you will find out the best

thing to submit is zero. So this is an example of

this recursive reason, this recursion, the

theory of mind recursion, because you think what

others will think. You think what others think that

you will think and so forth. Well, I actually ran this

experiment, you know, with my class, and others

have run this experiment. It turns out that

people would never submit zero, very few people. There are people who submit

zero anyway, but not all. But you can argue,

well, maybe the number they think about maybe is 15. That’s in the middle, and

then 2/3 of that is, like, 33. Maybe that’s where

the 33 comes about. It turns out that

such an experiment has been run on subjects

in various contexts. They ran this with some

college students in Germany. And the average number

they get is, like, 35. And they ran some Caltech

students, which is 24. That’s one level kind of deeper. And then they have run

other, more prestigious kind of subject groups, like readers

of Financial Times, London. They actually gave out

a real prize for that. So they opened up a window for,

say, two weeks for submission. And then they are going

to tally the results, and they’re going to give

out an actual monetary reward for the person

that does the best. So it turns out that for

that group, the numbers they submit are always

between, like, 24 and 35. OK, so that’s the mode. I mean, there is a whole

spectrum of numbers being submitted, but the

mode of that, which indicates that

people actually do know more than, like, second

or third level of recursion. Or even maybe, depending on

your interpretation of data, it can be between 1

and 2, if you think everybody starts out with 50. So this level of

recursion, of how deep people go into it in

real, social interactions and so forth, is

of interest here. In my first studies, I

looked into the literature about how they measure the

depths of recursive reasoning. So it turns out the

existing paradigms for measuring this

kind of recursive depth is either through these

so-called dominance solvable games, through

iterated removal of this strategy in the

dominantly solvable games. And what they do is that

they always run a few games, but against many, many subjects. And so for these

games, you can argue that if a subject is faced with

many, many different subjects in doing this kind

of reasoning, then they will have a model of

the strategic sophistication of the population. So that may be the reason

why people don’t even go to, like, 0, in this

p-beauty contest game. Well, think about this. Even though you think

theoretically the equilibrium solution is zero, you may

not think that other people may have thought about that. Or maybe there’s a distribution

of the strategic sophistication that leads you to say,

oh, maybe I’ll just get like 24 or 35 or something. So that’s the

explanation for that. So even though you are able

to reason in great depth, maybe you’re just not doing

that because your model of the general

population is such that there’s a distribution of

the depths of this recursion. So this depth, or

order of recursion, in order to study that in a very

individual social interaction and dynamic setting,

we propose a paradigm in which we have a series

of trial-unique games with a diagnostic

payoff structure that will allow us to

differentiate this order or depth of recursion. And this paradigm has since been

adopted with some modification by a variety of other groups to

probe this depth of recursion. And I’m going to explain to

you what that paradigm is. Now, the paradigm works

in the following way. So we have a game. It’s a three step game. So we have to players,

player one, player two. And the game starts with

A, and then B, C, D. So it can move to B,

to C, and then to D. There are four cells. And so the numbers, as

before, represent the payoffs to the players, where the first

number goes to player one. And the second number

goes to player two. And player one controls the

first and the third move, and player two controls

the second move. So this is a sequentially

played game, a sequential game. So you start out,

say, in cell A. And player one controls

the first move. Player one can decide

whether to stay in cell A and collect the reward,

collect the points, or decide to move on

to cell B and then pass the control to player two. So if player one decides

to move to cell B, and it’s player

two’s turn to decide whether to stay at cell

B and collect the reward or to move the piece to

cell C and let player one have the final say–

so they think about either to stay there or to move. If then player two

decides to move to cell C, then player 1 has another

chance to either stay at C or move to D. And that’s it. OK, so whichever player

decides to stop, then the game ends in that cell,

and each player would collect their

respective payoff. And as you can see, the numbers

differ in these four cells for the two players. And also they differ

from game to game. This payoff matrix is

trial unique in the sense that players encounter that

once in the whole experiment. So, now, this

sequential move game, there are a maximum

of three steps, with four possible outcomes. So if you draw out in kind

of a game-tree diagram, you can see that player one

controls the first and third, and player two controls

the second move. And these are the payoff

values, and so forth. So we instruct the

players such that the game is non-cooperative in the

sense that the goal is to earn as many points as

possible for themselves and not to worry how much

the other, their opponent or their co-players, earn. So the questions we ask our

subjects are as follows. We ask them two questions

in the sequence. First, we ask the subject,

what would player two do if the game

progresses to cell B? So what would player two do if

the game progresses to cell B? And then we ask

them the question, what would player

one do in cell A, given the answer

to question one? So the first

question is basically a modelling of

what would happen, and the second question

is translate that model into a rational action. So let’s take this

as an example. So we have these payoff numbers. I want you to kind

of look at this and see how you would

answer these questions. So now, what would

player two do in cell B with this payoff matrix? So whether player two would

move or would stay– OK, so that’s the question. So there are people who

think they should move. Oh, OK, so we do have

move and stay, right? OK. So now if, say,

we’re in cell B now, whether player two should move

or should stay– so you think, well, because player

two already has a three, but there’s a potential

of getting a four if he or she moves, then

if you think further, player one had the

final chance of deciding whether to stay in cell C

or move away from cell C. So one ought to try to

think whether it makes sense for player one to

move away from cell C or to stay at cell C

upon the final move. OK, so in this case, player one

would get a one by staying in C and get a two in D. So if the

game had progressed to cell C, player one would

want to move to D. So therefore player

two would not want to move away from

cell C because there is no way the game

can end in cell C. But you can see already

in this type of reasoning, or in answering

the question, you invoke a kind of a

theory of mind reasoning. So you can reason

on one hand, saying that, OK, so player two

would think because it’s better off in the

payoff in C than B, so you should move that way. But on the other

hand, if player two thinks what player one would

do, so therefore adding from C to D, then player

two should not move. So you already see this

kind of recursive reasoning. So in one case, a myopic

kind of a reasoning would say player two would move

because you can get a four. But a more predictive

reasoner would say player two

would not move away from B because player one here

would want to move from C to D if he or she has a chance. So this level of

recursion can be revealed as to how you answer

that particular question. And then given that

answer, you can see what would be

the rational thing to do for player one in

cell A. So what would be a rational thing to do? Well, you just need to translate

that model into the action. So these are the two

questions we ask. For this particular game,

this is a diagnostic game, a diagnostic in the

sense that depending on the level of recursion,

whether our subject is reasoning myopically

or predictively, you give opposite

answers to the question whether player two would

move from B or not. So it’s diagnostic. Now, compare with

this game here. We have another set

of payoff values here. So in this case, a

myopic player two– well, if you think myopically,

when player two is in cell B, then they should not move

because otherwise they will get a one, decreasing their points. Also if you think

deeply or predictively, that means player one will

not move away from C to D because there is

a two over a one. So for that reason,

you should not move. These two answers,

whether you’re engaged in myopic reasoning

or predictive reasoning, would give rise to

the same answer. So this is a game that

is non-diagnostic. It is non-diagnostic. So we have classes of

games which on one hand are diagnostic and on the

other hand are non-diagnostic. But for the

non-diagnostic games, we use them as a way to serve

as our so-called catch trials because we want the subjects

to engage in reasoning. And if somehow they are

not paying attention to the payoff numbers, they

are just randomly choosing and so forth, they

will get those wrong. So we use them as catch trials. Our block of the games would

mix the diagnostic games with a non-diagnostic game,

using the non-diagnostic games as catch games, catch trials. So we have diagnostic

and non-diagnostic games. And we have four

strategically distinct types of diagnostic games, based

on the payoff values. There are four different

kinds of games. And we balance. We counterbalance everything

in terms of the predictions about, say, yes, no. They will move away. They will not move away. And we also counterbalance

about their risk attitude about whether you should move. I started with lower payoff. And I wouldn’t move if

I started with a higher payoff, or this heuristic

kind of risk attitude. We make everything

counterbalanced. So in this case, the

player strategy, as I said, may either play myopically

or predictively at cell B. And player one’s

model, or player two, can either be myopic

or predictive. And player one’s choice at cell

A would depend on this model. Now in the actual experiment,

we assign either player one or player two– one of them

of course is our subject. But the other is an

experimental confederate. We instruct the confederates

to perform according to our instructions,

and these are the experimental manipulations. We start with 24

games as training for training the subjects to

familiarize them with the game. And these are very

simple payoff structures, so the subject would have

no problem in deciding what player two would do in

cell B, move or not move. And then we have two test

blocks, each with 20 games. We have 16 diagnostic games

interlinked with four catch trial or non-diagnostic ones. And all games are unique, with

distinct payoff structures. So they only encounter that

once in the experiment. And to avoid heuristics,

we block, like, all the games that

start with a two with all of the games that start

with a three in separate blocks to avoid an inference

of the risk attitude. So we have a total

of 64 games, all presented in the fixed order. Now, subjects are

assigned either as player one or player two. So it’s a

between-subjects design. The reason we want to

do different assignments is that we want to test this

so-called perspective taking because we ask the

same kind of question. But we want the subjects

to be in different shoes, to assume the role of

player one or player two in this experimental

manipulation. The opponent is always an

experimental confederate. And the games are actually

played out by computer. We actually have a

confederate that comes in. They are introduced to each,

interact with our subjects, and then go to different

rooms and play the game. So subjects are asked to

answer these two following questions in a fixed order. So the first question is, what’s

player two’s optimal strategy at cell B? And question two is player

one’s optimal strategy at cell A. Yes? AUDIENCE: Do they

have infinite time? How much time? Do they have x amount of time? JUN ZHANG: They can think

as much time as they want. But once they are

familiarized with the game, it took them, like, a

few seconds to do it. And these are the intro

psych subject pool. And we recruit from them. They always want to get

out of the experiment as quick as possible. We allow about one or two hours

for the experimental blocks. So the first question is

about the optimal strategy. This is a reasoning question. This is what we

call an anticipation question, or third person. When the subject is

assigned as player one, then this is an

anticipation question, or third-person perspective. So we ask that question,

whether the opponent would move if the game

progressed to cell B– so whether the opponent would

move if the cell progressed to B, and whether

you will move away from cell A. This is

the question, [? too. ?] On the other hand,

if the subject is assigned as

player two– and this is really their planning

for their move in cell B– and then we ask whether you

will move if the game progresses to cell B and whether it

is smart for the opponent to move away from cell A.

So it’s the same question, but phrased slightly

differently to make the subjects reason differently, to take

different perspectives based on their assignment. The first question is referred

to as the theory of mind, or the [INAUDIBLE] question. And the second question is

an instrumental rationality question. For data analysis, we

exclude unmotivated, confused subjects from

our data analysis, based on their performance

on the training games and also on the catch trials. So after this exclusion,

we have 28 subjects playing player one and 36

subjects playing as player two. It’s a between-subject design. And then to score their

choices for the first question, a myopic choice would

be scored as zero. A predictive choice

would be scored as one. And to score the second

question on instrumental rationality, if their choice

is consistent with what their theory of

mind model is, then we score that as being no

rationality error, zero. Otherwise, we scored it as

one, or rationality error. So rationality performance

is scored with respect to their theory of mind

model that they have. And scores for four

successive games are averaged. We call it game-set. It will give rise to a

predictive score from 0.25 and up to 1.0. So if they act predictively for

all four games of the game set, then they get a score of one. Otherwise, you can

just see the proportion of times they act predictively. So now here’s the data. So this is a distribution

of this predictive score for the subjects when they

are assigned as player one, on the left, and player

two, on the right. So the different shades

represent the predictive score. And the height of

the bar represents the amount of subjects,

total percent, 100%. So as you can see, in

both cases the subject starts out as

having a relatively low predictive score. That is to say they play

relatively kind of myopically. There’s very few

people, like, who have a score of, say, one

in the very beginning. And gradually, as

the game progresses, you see the

distribution changes. So there’s a growth of

the number of people who score higher and higher. And this is the case both for

the subjects as player one and subjects as player two. So in the case,

they are actually, through their interaction

with the same subject, they are learning something. They are enhancing their model. Now, in the data that I show

you, the confederate, they always act predictively. OK, the confederate

always acts predictively. This is the data when we average

across the entire population. So the previous slide

shows you the distribution of the number of subjects with

the various predictive scores. This is averaging across

that whole population. As you can see, the

predictive scores would increase as the

interaction with the opponent. As player one, so they increase. And towards the end, they get

a score of like 0.65 or so, 0.64, 0.65. As player two, they got a much

higher score, too, like, 0.9. The player-two

condition, it is a case where the subjects are taking

the shoes of the other person. So they are reasoning. Effectively, they are reasoning

with one level of recursion in answering the first question. So now this kind of a

pattern, if you compare with their rationality

score– the rationality score is the question about

what would player one do in the first cell. So it turns out that there’s not

much difference between player one and player two in

the rationality score, in the instrumental

rationality, in applying their theory of mind

model to come up with the optimal choice. So there’s not much difference

in terms of the assigned role. So the rationality

error decreased slightly in the second block. So the four game sets,

and then there’s a break. There’s a second

block, so the decrease slightly in the second block. But there’s no difference

between the assignment of player one and player two. Furthermore, we measured

the response time. So we measured the response

time for the subjects engaging in this task. So you look at the

time it takes for them to answer these two questions. So on the left, these

are the subjects assigned as player one. This is their time to

answer the first question. And these two bars are the time

to answer the second question. Now, we sold sort

these answers according to whether the answer is

consistent with a myopic model or a predictive

model, in other words, whether the answer

to the question is consistent with predictive

or deeper reasoning or myopic or shallow reasoning. And with the

hypotheses that if you engage in recursive

theory of mind reasoning, you add basically one

more step to that. It will take you longer to

come to that conclusion, so therefore reaction

time would be longer. So this is borne out. As you can see, if there’s

a predictive reasoning, then it takes a few

seconds longer than if you reason, like, myopically. Compare this with the

case where your answer to the second question, the

instrumental rationality, that is converting your model,

your prediction of what’s happened in cell B, to what

you should add in cell A. The reaction time,

the first [INAUDIBLE], there’s no difference

between whether this is based on a myopic

or predictive model. The same pattern

occurred when player is assigned as player two. So basically, one

extra step of recursion would cost a few more seconds. So this is

reaction-time time data that supports this

idea that in fact they are engaging this kind

of a recursive, or deep versus shallow, reasoning. Now, next, we look

at the statistics about the performance

of individual subjects as the game progresses. Now, as the game progresses, it

turns out that some subjects, they may start out by

reasoning myopically, but eventually towards the

end they kind of realize that they had maybe an

aha moment in the middle. And they say, yeah. They switch to a predictive

mode of reasoning. So we measured like

the switch point, if you look at their

choice pattern. So we measured the time by

which they did this switch to become predictive. And we do this for both the

subject assigned as player one and the subject

assign as player two. So we looked at

the switch in time. It turns out that these

switch-in-time dynamics do not differ across the

role assignment, going from myopic to predictive. So in other words, this

learning, or the acquisition of this recursive thinking,

or that it happened, it occurred to them they

should actually think that way, think kind of one more step. That acquisition is independent

of the role assignment of the subjects. But it does matter in terms

of whether any subject would convert, would actually switch. So it turns out that in the end,

if the subjects are assigned as player one, only 43% of

the subjects get converted, acquire this kind

of deep reasoning. Whereas if they are

assigned as player two, there are 64% of the

subjects that get converted. So this ratio of conversion

differs by the role assignment, indicating that, indeed,

this change of prospective, in other words asking the

subjects to act as player two, does help them in

reasoning predictively. Now, there is kind

of a caveat thing, this kind of

interpretation because one possible

interpretation, or a way to interpret this

pattern of data, could also be, like,

maybe the subjects did not realize that the game has, like,

a final step, the third step. So the final step,

so maybe this has to do with kind of a reasoning

horizon or decision horizon, like how far ahead you look. Maybe when they are reasoning

what player two would do from cell B to

C, they may not have reasoned far

enough to consider the possibility of what

happens with C to D, so this kind of horizon. So this pattern can also

be consistent with the fact that there may be a change or

realization of the decision horizon or reasoning horizon. But nevertheless, this

difference of these two numbers clearly shows that there is a

benefit for reflexive reasoning by perspective taking. With a perspective switch,

there is a benefit. You’re more likely to engage in

the deeper level of recursion, and this is almost

by definition, definition about

recursive level. OK, and then we also looked at

the effect of the opponent’s strategy– so what our

experimental confederate, how his or her action could impact

the theory of mind model. So in this case, the opponent

on the top would not switch. So either they consistently

played myopically or consistently

played predictively. So we want to see how our

subjects would respond to this kind of a player. So when the opponent plays

consistently predictively, our subject kind of catches up. On average gradually,

they just increase their level, the theory of

mind level of the opponent, mirroring the actual

behavior of the opponent. On the other hand, if the

opponent acts consistently myopically, then you

see the top model stays at the lower level,

which means that subjects are able to dynamically adjust

their model of their opponent throughout the experiment. Now, this is when we actually

have the opponents switch their strategy from a

myopic to a predictive and from predictive to myopic. So during the first

block, the opponent acts kind of predictively. So this is the data here. So in the first block, the

opponent acts predictively. So our subject would

have to follow a model to model them predictively. But during the second block,

we instruct the opponent to switch, to act myopically. And then as a result, the

subject’s model of the opponent model also switches. You can see the predictive

score kind of goes down. On the other hand, this

is to be contrasted with when the opponent

first starts out being myopically in the first block. And in the second block,

they become predictively. So you see this

kind of increase, and there is a crossover. There’s a crossover

between these [INAUDIBLE]. So this data shows that

the subjects are actually dynamically constructing and

adjusting their theory of mind model of their opponent. And their prediction

mirrors the way that opponent acts

in these games. OK, so to conclude,

we investigated depths in theory of mind reasoning. So basically the subjects

seem to start out with a default myopic

model, but then they are able to modify that

with the dynamic interaction with the opponent. And perspective would affect

the likelihood of engaging in this predictive reasoning. So there is a cost for taking

a third-person perspective compared with a

first-person perspective. There’s a cost of taking

third-person perspective. But the perspective

taking does not affect the time to acquire

this predictive model. This reaction time data

for the recursive depths is consistent with the fact

that they are actually reasoning with depths of recursion. On the side of the

instrumental rationality, we see that the performance

on the second question shows that their

rationality error, the rationality analysis,

instrumental rationality, they are not affected by

a change of perspective. And also, they are not

affected by a change of the opponent’s strategy. So this seems to be

suggesting that depths of ToM recursion and

instrumental rationality, they may constitute

two separate modules, or two separate processes, for

this theory of mind reasoning. So to conclude my

presentation, I just give this motto of the day. “We more readily account for

others’ reaction to an actual we plan than we realize

that others, when planning their action, may

have already accounted for our possible

counter-reaction.” This is from my favorite,

like, the sorting hat in the Harry Potter story. These days I’m watching

that with my son, and I really love

this kind of widget. So we more readily account

for others’ reaction to an action we plan than

we realize that others, when planning their action,

may have already accounted for our possible

counter-reactions. OK, thank you very much. That’s the end. [APPLAUSE] Yes? There’s one. Yes? AUDIENCE: How do you make

sure the subjects are clear about the

rules of the game? The learning curve

may involve, like, a subject gradually learns about

[? the rules ?] of the game rather than really [INAUDIBLE]. JUN ZHANG: Right,

good question– so we have the 24

training games. So before they play

all these games, they play 24 training

games in which the payoffs are very simple. So in other words,

say for instance the payoff for player two from

B-C-D is like one, two, three. So if they understand the rule

of the game, they should know. They should answer

that very correctly. So we gave them

the training games, and we look at the

performance in the last, say, eight games of

the training games. And that, also, we coupled

with these catch trials. So these are the ones we

used to basically screen out the subjects. Right, so, yes. Yes? AUDIENCE: I have a

p-beauty contest question. You said that it wasn’t

typically any more than three or four orders

[? of recursion ?] [INAUDIBLE]. Has there been any sort of

correlation between, say, the size of the group that’s

asked the question [INAUDIBLE]? JUN ZHANG: Yeah,

that’s a good question. But I’m not aware

of– I mean, I don’t know much about the

[INAUDIBLE] literature on the question about the

size dependency on this. So the empirical kind

of answer to the level is normally, like, two to three. But it depends on what you

count as, say, the zero level and the first level

because you can say, maybe everybody submitted

like 15 instead. So then, like, the

zeroth level would really be 33 because 2/3

of that and then– so there are some arguments. So you can always have

like one level off. But normally it’s

like the argument has been, like, two to three. And we look at this. This is even, like,

one to two in our game, only investigating one to

two steps of recursion. Yes? AUDIENCE: [INAUDIBLE]

tell people about the recursion

and then [INAUDIBLE]. I don’t know if you do. Do they add more levels of

recursion to their thinking? JUN ZHANG: Ah, so

in other words, whether they learn to

be kind of recursive. It’s a good question. And I have been thinking

about just using these as, like, training games. I’ve been thinking

about using these as training for recursion. So there’s one

issue that we need to resolve first in terms

of kind of why people, say, startup with, like, the

myopic model of the opponent. Or maybe this is kind of

an economy of effort thing. They don’t work hard enough,

and they gradually realize. They adjust and so forth. Or it could also be that this

is a rather abstract kind of a notion, and

the payoff numbers are giving out very

abstract [INAUDIBLE]. And what happens if

we want to give– say we have a very concrete

kind of reasoning paradigm, just like in the [? recent ?]

[? kind of ?] selection task. So there’s a difference

between running an abstract-reasoning game

versus a concrete reasoning game, right? So we have studied

running subjects by actually giving

them stories, a cover story for three-step reasoning. Say, like, a typical example

would be an application game. So you apply for a college. You can decide whether

to apply to a college. The college can decide whether

to accept or not accept, and then you can decide whether

to go or not go to a college. So this is a very typical

kind of a three-stage game. The applicant has control

of the first and last step, and the university

and the college has the control of

the second step. So you can give a variety

of payoff structures in terms of the desirability

of all possible outcomes, the desirability in a

sense of whether, say, a university would

like more students to apply but reject them so

that their rejection ratio can be higher. Or the university can have some

preference for a person really they didn’t want. They didn’t want

the person to come. And then for the applicant,

they can have relative rankings of the outcome based on

maybe their preference of the various outcomes. So you apply. You reject or you get accepted. So we give these

scenarios and then have people reason

on these scenarios and want to see any difference

than the abstract reasoning task. We don’t have the result yet,

but this is the direction that we are testing. But I think the

question about using those to train

recursive reasoning, this is a very

interesting question. We hope this set of games can

be useful towards that goal. AUDIENCE: In the history

of theory of mind, there have been [INAUDIBLE]

the average onset of a working theory of mind falls

in the development of a child between five

and seven years of age. But if a child is taught to

play games at earlier ages, does it or does it not

accelerate the development of a working theory of mind? Do you have an opinion on that? JUN ZHANG: Yeah, so the typical

developmental literature, the time that they pin down is

between, like, three and four. But of course, this is,

say, as evidenced by, like, false-belief tasks and so forth. It does not involve a

recursion [INAUDIBLE]. So I’m not aware of the

development literature about the recursion or what age. This would be an age where

actually training would be possible because they can

understand the structure. They can understand

instructions. So, yes, one thing

to try out is to have them reason through these games

and whether that would help. And in fact, there are groups. I think they are applying

this to children. I mentioned earlier,

our paradigm has been adopted by one

group in the Netherlands which are being used upon

children and training. Dr. Verbrugge’s group

in the Netherlands, they are devising

a concrete game of this sort, a three-step

game which they are actually running on children. And there are some other

groups, but the other groups are running among adults. I think the first group

is running on children. Yes? AUDIENCE: Did you find

in the data any evidence of social preferences? So for example, like, a person

might– even though you told them to only look at

their own payoffs, they might prefer where

both the agents get three rather than passing it so

that they can get four. But the other person will

only end up with one. Does that explain any

of the myopic behavior? So this is a good

question about, say, whether our subjects

are playing as we instructed, in terms of

playing as a non-cooperative. And that’s number one. The second is that because

there is prolonged interaction, there is always this

possibility of signaling. So in other words, they may

play the first few games in a certain way to

signal the other person that they are playing this

way so the other person can react in kind. So that’s kind of a,

like, possibility. So we checked a few of the

heuristics about the playing of non-cooperative

[INAUDIBLE]– for instance, if there is a higher number in

that game in the ending cell, in D, whether you should

go based on heuristic because everybody

wants to go there. So we check off some

of this heuristic, but we hadn’t systematically

kind of checked the answers based on the second part of it. But we did, again, use

these catch trials, catch games, just to make

sure that they are not doing anything like that because

if they are doing anything like that, we may catch them. We may spot them in

the catch trials. And then the subject

would get excluded. In the exit questionnaire,

we asked them about strategy and so forth. They didn’t mention that they

were signalling the opponent. We had one manipulation of

the apparent intelligence of a confederate. So in that manipulation,

our confederate comes in. And there are two conditions. One is intelligent. One is, like, a not

as intelligent one. So the intelligence confederate

comes in, like, a minute late and apologizes to the subject. I’m late because I’m just

tutoring math students. And the session runs

long, and the person carries a mathematics kind

of textbook and so forth. And then when sitting down,

interacting with the subjects, and says, OK, I’m a

member of the chess club in honors college

and so forth [INAUDIBLE]. And you get a condition

where, say, the person just carries like a

supermarket tabloid, apologizing that the

calculus tutoring session is running too long and

finds calculus very difficult. And then if asked what the

person wants to do, you know, I just want to hang around,

not declared any major. So we run these. We run the manipulation,

and then we ask for ratings

for intelligence, friendliness, and so forth. It turns out, the

manipulation did work in affecting the people’s

rating of intelligence. I’m surprised by

actually the outcome of this simple

manipulation and that when we asked them to rate

the other person that they did show that. But it didn’t affect

depths of reason at all. So there’s no effect

on the recursive depths for this manipulation

in either direction. So there’s no effect on that. PRESENTER: All right, there

are refreshments outside, and you can continue

the discussion outside. Thank you. JUN ZHANG: OK,

thank you very much. Thank you. Thank you.