Modernization Hub

Modernization and Improvement
Reflexive Theory of Mind Reasoning in Games

Reflexive Theory of Mind Reasoning in Games

JUN ZHANG: OK, thank you. Thank you, Tommy, for
giving me this opportunity to give a lecture on one
of my research topics. And I’m very glad to
come here and to talk with many of my colleagues’
friends and the students here. That quote that
Tommy just apparently looked up my website, which
has yet to be updated, is actually taken
from Jerome Busemeyer. He said that life is
complex because it has both real and imaginary parts. I have another quote, I think,
on my website which I like, which talks about
modeling, actually. The story goes that,
say, OK, trains in Austria usually running late. So one day a passenger
kind of waited on the train in
the train station and complained about
this and asked, well, if the train’s
always late– well, he complained to
the station master. He said, if the trains
are always late, why do have to print
out the timetable? What’s the use of the timetable? And then the station
master looked at him and said, OK, without
the timetable, how do you know
the train is late? I think this is quite
an interesting remark in terms of the way how
models and the data interact. We built models, to
some extent, to serve as a kind of a benchmark, a
way of thinking about the data, to think about the
underlying process. So with that, I would like
to just now talk about, as Tommy said, my
background of coming out from somebody who
actually first studies theoretical physics
in China and then got a PhD in neuroscience and
then now in a psychology department with a
goal of modeling a psychological process,
modeling the mind. And the approach
which I have taken is trying to look
at the mind as kind of a computational
device which runs kind of a software,
a neural hardware. OK, it’s a particular
kind of a view on computational intelligence. I therefore have this quote of
mind, machine, and mathematics. I’ve done a variety
of work, but today the topic I’ll be
talking about is so called the theory of mind. The notional of theory of
mind has its particular kind of a meaning in
developmental psychology or in psychology in
general, as evolved as a core domain for cognition. Here’s a quote from Henry
[? Willman’s ?] book. Basically, he just
says that, well, we have this lay view about,
say, our perception, which forms the basis of our belief. And we have basic emotions
underlying neurophysiology which give rise to our desire. And then based on our belief
and desire, we take actions. And then we get
reactions from others and from our environment. And therefore from perception,
belief, emotion, desire, action, and so forth,
we have a generic kind of a theory about this
belief, desire, psychology. And this emerges in the
study of human cognition, developmental cognition, the
development of cognition. Very early on in a
child’s life, he or she will need to acquire this
kind of a general model of the notion of a mind. So the theory-of-mind
type of reasoning, that is to say reasoning
about the beliefs and desire and actions, can be looked at
from a more formal prospective via the so-called game theory. Well, game theory
emerges out of economics. It’s a framework for model,
interpersonal, strategic interaction. So the basic
ingredients of a game is that you have a set of
players, a set of players who individually can take actions. And based on the joint actions
of these individual players, there is an outcome. So in other words an outcome is
a result of the combined action of all the players. Now, given an outcome,
different players can have different values or
payoffs for these outcomes. So the outcomes are determined
by actions of all the players and the values of
each of the outcomes. Or the outcome that’s resulted
is valued possibly differently by all the different players. So you have this player,
outcome, and pay off, which are the basic
ingredients for a game. Now, in terms of the
solutions for the game, the formal game-theoretic
solutions of the game, people have various notions. One for instance is the
notion of best reply, which involves a modelling
of the other person’s action and trying to devise
a best response to the actions of others. The idea of equilibrium is that
somehow the joint reasoning of the players leads to a state
in which no individuals would like to deviate from
that state of affairs. I’m going to get into
more details later on. In the economic
studies of the game, there’s some fundamental
assumptions being made. And these assumptions are the
axioms of common knowledge. That is so each player would
know the structure of the game, know the players,
know their actions, and know the payoffs or
the structure of the game. And this is all
common knowledge. Now being common
knowledge, that is to say that players know
the structure, the nature, the process of the game and know
that other players also know the strategies, the
payoffs of the game, and know that other players
know that they know the game structure and so forth. So this is a part of
the common knowledge. So you know something. You know others know something. And you know that
others know that you know something, et cetera. So it forms a common knowledge. And this axiom of common
knowledge is very important. It plays an important
role, for instance, for that notion of
the Nash equilibrium, for the equilibrium idea. And then along with
that is the notion of the axiom of rationality. In other words, players
tend to act rationally. Now, here’s a little
bit of trouble about this notion
of the rationality in the case of the games. And this is one of
the questions that we are going to address, a part of
this notion of the rationality. What we submit and
what we argue is that while we can
well define the notion of the so-called instrumental
rationality, that is to say once we have
a model of the game situation, a model of the other
players’ action in the game, then we can act rationally
as a best response, as a best action,
out of our model. But nevertheless,
this model itself, the modelling of others
in a game setting, is a much more involved process. It involves a recursive
kind of a reasoning, which is yet to be explored. So let me just give a detailed
exposition of these concepts for the case of, say, for
instance, the prison dilemma game and then the notion of the
Nash equilibrium in the prison dilemma game. So a prison dilemma
game as shown on the right– this is adapting
the strategies to say you have two players. Each can have a choice of
being selfish or altruistic. And the numbers in the
cells represent the payoffs to these individuals,
with a smaller number meaning less of the
value and larger numbers being more or
higher of the value. OK, so five is more than
three and more than one, more than zero. So now if both players
act altruistically, say, for both players
you can get three points. But if one player acts
selfishly and the other person acts altruistically, the
player who plays selfish would get five points,
whereas the person who played altruistically
would get zero points. And if both play selfishly,
then they both get one point. OK, so this is a
standard setting for a so-called
prisoner dilemma game. So for this game, this has
puzzled many, many people. For this kind of a game,
if you do an analysis as to what players
would do, what people would do in this
kind of setting, the game-theoretic
analysts say, for instance, the notion of the
Nash equilibrium basically says that
how people would choose would be in the state of
affair in which nobody would want to unilaterally deviate. So in other words, if
somehow the players have chosen a
particular strategy, knowing that other players
have chosen that strategy, no player has the incentive
to unilaterally deviate, on other words, to change if the
other player will not change. So let’s take a look at this. For instance, the
three-three cell. That cell would not
be a Nash equilibrium because if we are in
the three-three cell, the role player would change
to the other strategy, the selfish strategy,
while gaining more points, a three becomes a five. And likewise, the
column player would also want to deviate from his or
her choice to get a five. So three-three, the
cooperative solution for this prison dilemma gain,
is not a Nash equilibrium. Whereas the one-one,
the selfish-selfish, or the non-cooperative solution,
that is a Nash equilibrium because nobody would like
to deviate unilaterally without damaging him or
herself, without getting a lower pay off. OK, so that’s the notion
of the Nash equilibrium. One way to explain or to solve
this prison dilemma game is they say, OK, use this Nash
equilibrium based idea, which is basically saying that
it’s a mutual best response based on like the theory of
mind kind of a modeling of what others would do. That is to say, whether others
would stick with the choice, this is the foundation
for the Nash equilibrium, a unilateral deviation. Now, extensive
psychological research showed that this notion
of the Nash equilibrium would not be valid with
actual humans, when they’re engaging those games. So this is showing that
one-one would be the Nash equilibrium in this case. Nobody would want to
unilaterally deviate. Now, another notion
in solving the game is the notion of the
so-called dominance strategy. A dominating strategy
is a strategy such that regardless of what others do,
it’s better for the player to choose one strategy
versus the other, one action versus the other. So now in this case, if you
look at the choices of the two players here, you soon
realize that the one-one cell, the selfish-selfish, is
actually a dominating strategy for both players, for both
the row and column players. In other words, for the
row and column players, for the players
in this game, you do not need to know what the
other person would have chosen. It is always better for
you to choose selfish. Why? Because if the other person
chooses selfish, of course, you should choose selfish. But if the other person
chooses altruistic, you should also choose
selfish because it gets you higher points for this game. So the one-one solution, the
non-cooperative solution, is actually very,
very stable because it is a combination of the
dominating strategies of the two players. A dominating strategy is
one that does not rely on a modelling of the other. You really don’t even need
to kind of extensively model the other player. But just by analysis
of this, you are always better off in
choosing dominating strategy. So for the prison dilemma
game, the only way when you can actually get out of
this non-cooperative solution– OK, so the question is when
cooperation in the prison dilemma game can be
individually rational, can be rational
to the individual or to the players themselves as
an instrumental, rational kind of strategy. Well, it turns out that the
only way for that to happen is when you are not playing
this as a single-shot game. And you have to assume there’s
a probability of continuation. There is a probability
of continued interaction. So the nonzero probability
of continued interaction is a necessary condition for
this prisoner dilemma game to evolve a
cooperative solution. So basically in those
cases, now a rational player would maximize a
total expected payoff, taking into account the
probability of continuation, of continued interaction. What happens is that when that
is taken into account, then you can basically transform the
payoff values of your actions. And what happens in this
case is that your play, your actual choice
in the current gain, not only will give rise
to some immediate payoff but also would influence
the other person’s choice for future games, in
future rounds of the same game. So therefore, your
action not only impacts your immediate
reward, immediate gain, but also impacts on how others
would act for future games. So therefore, the
cooperation will arise. So it turns out that after
very rigorous analysis of this thought, one can show
that the sufficient condition for this to happen is when
the continuing probability has to exceed a certain
amount of threshold. So to rationally solve
the prison dilemma game, one requirement
is that you would have to have continued
interaction or a presumed probability or expectation
of continued interaction. So that kind of shows
that all these kind of cooperative solutions
in the prison dilemma game would rely on that
kind of expectation, which may be neurobiologically
grounded into the brain through evolution, for instance. So that is a rational
solution for cooperation. Now, I’m not going to expand
on this type of analysis for the prison
dilemma game and how cooperation arrives in this. And I have some
theoretical results, a theoretical paper on this. But rather today, I’m going
to talk about a related issue about this recursive
depth in this theory of mind reasoning, a recursion in
theory of mind reasoning. OK, so to explain
what that is, I’m going to give you an example,
the so-called p-beauty contest game. Now suppose we have,
say, the audience here. Suppose I asked you to submit
a number between 0 and 100. And you’re going to write
it on a piece of paper and hand it to me. I’m going to collect
all the papers, and I’m going to do an
average of all of the numbers that you submitted. And then I’m going to
multiply 2/3 of that average. So I got some number. After multiplying 2/3 of the
average, I get some number. Now, the rule of
the game is that whoever submits
the number which is closest to the 2/3 of my
average and not exceeding that, that person
would win the game. OK, so everybody gets a
chance to, say, submit a number between 0 and 100. You submit any number. To make it simple, you can
maybe just submit an integer. And I’m going to take an average
of this and multiply 2/3, and that’s my target number. You are going to shoot
for the target number. Whoever gets closest
to that number would win, say, a big prize. I actually run this for my
mathematical psychology class, and by actually
running this experiment and saying that whoever
wins this really can get a boost in their grade. First, think about this. What number would you submit? Suppose this is a very important
kind of consequence for you. What number would you submit? I want you to think maybe for
one minute, a couple minutes maybe. AUDIENCE: 30. JUN ZHANG: So
that’s then 30 here. Right, 30? OK. OK, so actually let’s go
through some of the reasoning that you might have in thinking
about what number to submit. OK, now, in order to win this
game, I’m trying to submit. So everyone will submit
between 0 to 100, but I’m going to take the
average and do 2/3 to it. So the number, the average which
I get, can not be more than 67. Assuming everybody
submits 100, average 100. 2/3 of that is 67. So there’s no way I can
win by submitting anything above 67 because I need to
be below my target number. So I should not submit
anything above 67. But now realizing that maybe
the person that sits next to me thinks the same thing,
they realize that, too, so they think nobody will
submit anything above 67. So then if everybody submits
67, I’m going to take 2/3 of 67, which is 44. So maybe I should
submit, like, 44. And then I realize
that other people may realize the same thing. So I can [INAUDIBLE] to redo
my calculations and so forth. So I submit 29 and so forth. If you keep doing
this, very soon you will find out the best
thing to submit is zero. So this is an example of
this recursive reason, this recursion, the
theory of mind recursion, because you think what
others will think. You think what others think that
you will think and so forth. Well, I actually ran this
experiment, you know, with my class, and others
have run this experiment. It turns out that
people would never submit zero, very few people. There are people who submit
zero anyway, but not all. But you can argue,
well, maybe the number they think about maybe is 15. That’s in the middle, and
then 2/3 of that is, like, 33. Maybe that’s where
the 33 comes about. It turns out that
such an experiment has been run on subjects
in various contexts. They ran this with some
college students in Germany. And the average number
they get is, like, 35. And they ran some Caltech
students, which is 24. That’s one level kind of deeper. And then they have run
other, more prestigious kind of subject groups, like readers
of Financial Times, London. They actually gave out
a real prize for that. So they opened up a window for,
say, two weeks for submission. And then they are going
to tally the results, and they’re going to give
out an actual monetary reward for the person
that does the best. So it turns out that for
that group, the numbers they submit are always
between, like, 24 and 35. OK, so that’s the mode. I mean, there is a whole
spectrum of numbers being submitted, but the
mode of that, which indicates that
people actually do know more than, like, second
or third level of recursion. Or even maybe, depending on
your interpretation of data, it can be between 1
and 2, if you think everybody starts out with 50. So this level of
recursion, of how deep people go into it in
real, social interactions and so forth, is
of interest here. In my first studies, I
looked into the literature about how they measure the
depths of recursive reasoning. So it turns out the
existing paradigms for measuring this
kind of recursive depth is either through these
so-called dominance solvable games, through
iterated removal of this strategy in the
dominantly solvable games. And what they do is that
they always run a few games, but against many, many subjects. And so for these
games, you can argue that if a subject is faced with
many, many different subjects in doing this kind
of reasoning, then they will have a model of
the strategic sophistication of the population. So that may be the reason
why people don’t even go to, like, 0, in this
p-beauty contest game. Well, think about this. Even though you think
theoretically the equilibrium solution is zero, you may
not think that other people may have thought about that. Or maybe there’s a distribution
of the strategic sophistication that leads you to say,
oh, maybe I’ll just get like 24 or 35 or something. So that’s the
explanation for that. So even though you are able
to reason in great depth, maybe you’re just not doing
that because your model of the general
population is such that there’s a distribution of
the depths of this recursion. So this depth, or
order of recursion, in order to study that in a very
individual social interaction and dynamic setting,
we propose a paradigm in which we have a series
of trial-unique games with a diagnostic
payoff structure that will allow us to
differentiate this order or depth of recursion. And this paradigm has since been
adopted with some modification by a variety of other groups to
probe this depth of recursion. And I’m going to explain to
you what that paradigm is. Now, the paradigm works
in the following way. So we have a game. It’s a three step game. So we have to players,
player one, player two. And the game starts with
A, and then B, C, D. So it can move to B,
to C, and then to D. There are four cells. And so the numbers, as
before, represent the payoffs to the players, where the first
number goes to player one. And the second number
goes to player two. And player one controls the
first and the third move, and player two controls
the second move. So this is a sequentially
played game, a sequential game. So you start out,
say, in cell A. And player one controls
the first move. Player one can decide
whether to stay in cell A and collect the reward,
collect the points, or decide to move on
to cell B and then pass the control to player two. So if player one decides
to move to cell B, and it’s player
two’s turn to decide whether to stay at cell
B and collect the reward or to move the piece to
cell C and let player one have the final say–
so they think about either to stay there or to move. If then player two
decides to move to cell C, then player 1 has another
chance to either stay at C or move to D. And that’s it. OK, so whichever player
decides to stop, then the game ends in that cell,
and each player would collect their
respective payoff. And as you can see, the numbers
differ in these four cells for the two players. And also they differ
from game to game. This payoff matrix is
trial unique in the sense that players encounter that
once in the whole experiment. So, now, this
sequential move game, there are a maximum
of three steps, with four possible outcomes. So if you draw out in kind
of a game-tree diagram, you can see that player one
controls the first and third, and player two controls
the second move. And these are the payoff
values, and so forth. So we instruct the
players such that the game is non-cooperative in the
sense that the goal is to earn as many points as
possible for themselves and not to worry how much
the other, their opponent or their co-players, earn. So the questions we ask our
subjects are as follows. We ask them two questions
in the sequence. First, we ask the subject,
what would player two do if the game
progresses to cell B? So what would player two do if
the game progresses to cell B? And then we ask
them the question, what would player
one do in cell A, given the answer
to question one? So the first
question is basically a modelling of
what would happen, and the second question
is translate that model into a rational action. So let’s take this
as an example. So we have these payoff numbers. I want you to kind
of look at this and see how you would
answer these questions. So now, what would
player two do in cell B with this payoff matrix? So whether player two would
move or would stay– OK, so that’s the question. So there are people who
think they should move. Oh, OK, so we do have
move and stay, right? OK. So now if, say,
we’re in cell B now, whether player two should move
or should stay– so you think, well, because player
two already has a three, but there’s a potential
of getting a four if he or she moves, then
if you think further, player one had the
final chance of deciding whether to stay in cell C
or move away from cell C. So one ought to try to
think whether it makes sense for player one to
move away from cell C or to stay at cell C
upon the final move. OK, so in this case, player one
would get a one by staying in C and get a two in D. So if the
game had progressed to cell C, player one would
want to move to D. So therefore player
two would not want to move away from
cell C because there is no way the game
can end in cell C. But you can see already
in this type of reasoning, or in answering
the question, you invoke a kind of a
theory of mind reasoning. So you can reason
on one hand, saying that, OK, so player two
would think because it’s better off in the
payoff in C than B, so you should move that way. But on the other
hand, if player two thinks what player one would
do, so therefore adding from C to D, then player
two should not move. So you already see this
kind of recursive reasoning. So in one case, a myopic
kind of a reasoning would say player two would move
because you can get a four. But a more predictive
reasoner would say player two
would not move away from B because player one here
would want to move from C to D if he or she has a chance. So this level of
recursion can be revealed as to how you answer
that particular question. And then given that
answer, you can see what would be
the rational thing to do for player one in
cell A. So what would be a rational thing to do? Well, you just need to translate
that model into the action. So these are the two
questions we ask. For this particular game,
this is a diagnostic game, a diagnostic in the
sense that depending on the level of recursion,
whether our subject is reasoning myopically
or predictively, you give opposite
answers to the question whether player two would
move from B or not. So it’s diagnostic. Now, compare with
this game here. We have another set
of payoff values here. So in this case, a
myopic player two– well, if you think myopically,
when player two is in cell B, then they should not move
because otherwise they will get a one, decreasing their points. Also if you think
deeply or predictively, that means player one will
not move away from C to D because there is
a two over a one. So for that reason,
you should not move. These two answers,
whether you’re engaged in myopic reasoning
or predictive reasoning, would give rise to
the same answer. So this is a game that
is non-diagnostic. It is non-diagnostic. So we have classes of
games which on one hand are diagnostic and on the
other hand are non-diagnostic. But for the
non-diagnostic games, we use them as a way to serve
as our so-called catch trials because we want the subjects
to engage in reasoning. And if somehow they are
not paying attention to the payoff numbers, they
are just randomly choosing and so forth, they
will get those wrong. So we use them as catch trials. Our block of the games would
mix the diagnostic games with a non-diagnostic game,
using the non-diagnostic games as catch games, catch trials. So we have diagnostic
and non-diagnostic games. And we have four
strategically distinct types of diagnostic games, based
on the payoff values. There are four different
kinds of games. And we balance. We counterbalance everything
in terms of the predictions about, say, yes, no. They will move away. They will not move away. And we also counterbalance
about their risk attitude about whether you should move. I started with lower payoff. And I wouldn’t move if
I started with a higher payoff, or this heuristic
kind of risk attitude. We make everything
counterbalanced. So in this case, the
player strategy, as I said, may either play myopically
or predictively at cell B. And player one’s
model, or player two, can either be myopic
or predictive. And player one’s choice at cell
A would depend on this model. Now in the actual experiment,
we assign either player one or player two– one of them
of course is our subject. But the other is an
experimental confederate. We instruct the confederates
to perform according to our instructions,
and these are the experimental manipulations. We start with 24
games as training for training the subjects to
familiarize them with the game. And these are very
simple payoff structures, so the subject would have
no problem in deciding what player two would do in
cell B, move or not move. And then we have two test
blocks, each with 20 games. We have 16 diagnostic games
interlinked with four catch trial or non-diagnostic ones. And all games are unique, with
distinct payoff structures. So they only encounter that
once in the experiment. And to avoid heuristics,
we block, like, all the games that
start with a two with all of the games that start
with a three in separate blocks to avoid an inference
of the risk attitude. So we have a total
of 64 games, all presented in the fixed order. Now, subjects are
assigned either as player one or player two. So it’s a
between-subjects design. The reason we want to
do different assignments is that we want to test this
so-called perspective taking because we ask the
same kind of question. But we want the subjects
to be in different shoes, to assume the role of
player one or player two in this experimental
manipulation. The opponent is always an
experimental confederate. And the games are actually
played out by computer. We actually have a
confederate that comes in. They are introduced to each,
interact with our subjects, and then go to different
rooms and play the game. So subjects are asked to
answer these two following questions in a fixed order. So the first question is, what’s
player two’s optimal strategy at cell B? And question two is player
one’s optimal strategy at cell A. Yes? AUDIENCE: Do they
have infinite time? How much time? Do they have x amount of time? JUN ZHANG: They can think
as much time as they want. But once they are
familiarized with the game, it took them, like, a
few seconds to do it. And these are the intro
psych subject pool. And we recruit from them. They always want to get
out of the experiment as quick as possible. We allow about one or two hours
for the experimental blocks. So the first question is
about the optimal strategy. This is a reasoning question. This is what we
call an anticipation question, or third person. When the subject is
assigned as player one, then this is an
anticipation question, or third-person perspective. So we ask that question,
whether the opponent would move if the game
progressed to cell B– so whether the opponent would
move if the cell progressed to B, and whether
you will move away from cell A. This is
the question, [? too. ?] On the other hand,
if the subject is assigned as
player two– and this is really their planning
for their move in cell B– and then we ask whether you
will move if the game progresses to cell B and whether it
is smart for the opponent to move away from cell A.
So it’s the same question, but phrased slightly
differently to make the subjects reason differently, to take
different perspectives based on their assignment. The first question is referred
to as the theory of mind, or the [INAUDIBLE] question. And the second question is
an instrumental rationality question. For data analysis, we
exclude unmotivated, confused subjects from
our data analysis, based on their performance
on the training games and also on the catch trials. So after this exclusion,
we have 28 subjects playing player one and 36
subjects playing as player two. It’s a between-subject design. And then to score their
choices for the first question, a myopic choice would
be scored as zero. A predictive choice
would be scored as one. And to score the second
question on instrumental rationality, if their choice
is consistent with what their theory of
mind model is, then we score that as being no
rationality error, zero. Otherwise, we scored it as
one, or rationality error. So rationality performance
is scored with respect to their theory of mind
model that they have. And scores for four
successive games are averaged. We call it game-set. It will give rise to a
predictive score from 0.25 and up to 1.0. So if they act predictively for
all four games of the game set, then they get a score of one. Otherwise, you can
just see the proportion of times they act predictively. So now here’s the data. So this is a distribution
of this predictive score for the subjects when they
are assigned as player one, on the left, and player
two, on the right. So the different shades
represent the predictive score. And the height of
the bar represents the amount of subjects,
total percent, 100%. So as you can see, in
both cases the subject starts out as
having a relatively low predictive score. That is to say they play
relatively kind of myopically. There’s very few
people, like, who have a score of, say, one
in the very beginning. And gradually, as
the game progresses, you see the
distribution changes. So there’s a growth of
the number of people who score higher and higher. And this is the case both for
the subjects as player one and subjects as player two. So in the case,
they are actually, through their interaction
with the same subject, they are learning something. They are enhancing their model. Now, in the data that I show
you, the confederate, they always act predictively. OK, the confederate
always acts predictively. This is the data when we average
across the entire population. So the previous slide
shows you the distribution of the number of subjects with
the various predictive scores. This is averaging across
that whole population. As you can see, the
predictive scores would increase as the
interaction with the opponent. As player one, so they increase. And towards the end, they get
a score of like 0.65 or so, 0.64, 0.65. As player two, they got a much
higher score, too, like, 0.9. The player-two
condition, it is a case where the subjects are taking
the shoes of the other person. So they are reasoning. Effectively, they are reasoning
with one level of recursion in answering the first question. So now this kind of a
pattern, if you compare with their rationality
score– the rationality score is the question about
what would player one do in the first cell. So it turns out that there’s not
much difference between player one and player two in
the rationality score, in the instrumental
rationality, in applying their theory of mind
model to come up with the optimal choice. So there’s not much difference
in terms of the assigned role. So the rationality
error decreased slightly in the second block. So the four game sets,
and then there’s a break. There’s a second
block, so the decrease slightly in the second block. But there’s no difference
between the assignment of player one and player two. Furthermore, we measured
the response time. So we measured the response
time for the subjects engaging in this task. So you look at the
time it takes for them to answer these two questions. So on the left, these
are the subjects assigned as player one. This is their time to
answer the first question. And these two bars are the time
to answer the second question. Now, we sold sort
these answers according to whether the answer is
consistent with a myopic model or a predictive
model, in other words, whether the answer
to the question is consistent with predictive
or deeper reasoning or myopic or shallow reasoning. And with the
hypotheses that if you engage in recursive
theory of mind reasoning, you add basically one
more step to that. It will take you longer to
come to that conclusion, so therefore reaction
time would be longer. So this is borne out. As you can see, if there’s
a predictive reasoning, then it takes a few
seconds longer than if you reason, like, myopically. Compare this with the
case where your answer to the second question, the
instrumental rationality, that is converting your model,
your prediction of what’s happened in cell B, to what
you should add in cell A. The reaction time,
the first [INAUDIBLE], there’s no difference
between whether this is based on a myopic
or predictive model. The same pattern
occurred when player is assigned as player two. So basically, one
extra step of recursion would cost a few more seconds. So this is
reaction-time time data that supports this
idea that in fact they are engaging this kind
of a recursive, or deep versus shallow, reasoning. Now, next, we look
at the statistics about the performance
of individual subjects as the game progresses. Now, as the game progresses, it
turns out that some subjects, they may start out by
reasoning myopically, but eventually towards the
end they kind of realize that they had maybe an
aha moment in the middle. And they say, yeah. They switch to a predictive
mode of reasoning. So we measured like
the switch point, if you look at their
choice pattern. So we measured the time by
which they did this switch to become predictive. And we do this for both the
subject assigned as player one and the subject
assign as player two. So we looked at
the switch in time. It turns out that these
switch-in-time dynamics do not differ across the
role assignment, going from myopic to predictive. So in other words, this
learning, or the acquisition of this recursive thinking,
or that it happened, it occurred to them they
should actually think that way, think kind of one more step. That acquisition is independent
of the role assignment of the subjects. But it does matter in terms
of whether any subject would convert, would actually switch. So it turns out that in the end,
if the subjects are assigned as player one, only 43% of
the subjects get converted, acquire this kind
of deep reasoning. Whereas if they are
assigned as player two, there are 64% of the
subjects that get converted. So this ratio of conversion
differs by the role assignment, indicating that, indeed,
this change of prospective, in other words asking the
subjects to act as player two, does help them in
reasoning predictively. Now, there is kind
of a caveat thing, this kind of
interpretation because one possible
interpretation, or a way to interpret this
pattern of data, could also be, like,
maybe the subjects did not realize that the game has, like,
a final step, the third step. So the final step,
so maybe this has to do with kind of a reasoning
horizon or decision horizon, like how far ahead you look. Maybe when they are reasoning
what player two would do from cell B to
C, they may not have reasoned far
enough to consider the possibility of what
happens with C to D, so this kind of horizon. So this pattern can also
be consistent with the fact that there may be a change or
realization of the decision horizon or reasoning horizon. But nevertheless, this
difference of these two numbers clearly shows that there is a
benefit for reflexive reasoning by perspective taking. With a perspective switch,
there is a benefit. You’re more likely to engage in
the deeper level of recursion, and this is almost
by definition, definition about
recursive level. OK, and then we also looked at
the effect of the opponent’s strategy– so what our
experimental confederate, how his or her action could impact
the theory of mind model. So in this case, the opponent
on the top would not switch. So either they consistently
played myopically or consistently
played predictively. So we want to see how our
subjects would respond to this kind of a player. So when the opponent plays
consistently predictively, our subject kind of catches up. On average gradually,
they just increase their level, the theory of
mind level of the opponent, mirroring the actual
behavior of the opponent. On the other hand, if the
opponent acts consistently myopically, then you
see the top model stays at the lower level,
which means that subjects are able to dynamically adjust
their model of their opponent throughout the experiment. Now, this is when we actually
have the opponents switch their strategy from a
myopic to a predictive and from predictive to myopic. So during the first
block, the opponent acts kind of predictively. So this is the data here. So in the first block, the
opponent acts predictively. So our subject would
have to follow a model to model them predictively. But during the second block,
we instruct the opponent to switch, to act myopically. And then as a result, the
subject’s model of the opponent model also switches. You can see the predictive
score kind of goes down. On the other hand, this
is to be contrasted with when the opponent
first starts out being myopically in the first block. And in the second block,
they become predictively. So you see this
kind of increase, and there is a crossover. There’s a crossover
between these [INAUDIBLE]. So this data shows that
the subjects are actually dynamically constructing and
adjusting their theory of mind model of their opponent. And their prediction
mirrors the way that opponent acts
in these games. OK, so to conclude,
we investigated depths in theory of mind reasoning. So basically the subjects
seem to start out with a default myopic
model, but then they are able to modify that
with the dynamic interaction with the opponent. And perspective would affect
the likelihood of engaging in this predictive reasoning. So there is a cost for taking
a third-person perspective compared with a
first-person perspective. There’s a cost of taking
third-person perspective. But the perspective
taking does not affect the time to acquire
this predictive model. This reaction time data
for the recursive depths is consistent with the fact
that they are actually reasoning with depths of recursion. On the side of the
instrumental rationality, we see that the performance
on the second question shows that their
rationality error, the rationality analysis,
instrumental rationality, they are not affected by
a change of perspective. And also, they are not
affected by a change of the opponent’s strategy. So this seems to be
suggesting that depths of ToM recursion and
instrumental rationality, they may constitute
two separate modules, or two separate processes, for
this theory of mind reasoning. So to conclude my
presentation, I just give this motto of the day. “We more readily account for
others’ reaction to an actual we plan than we realize
that others, when planning their action, may
have already accounted for our possible
counter-reaction.” This is from my favorite,
like, the sorting hat in the Harry Potter story. These days I’m watching
that with my son, and I really love
this kind of widget. So we more readily account
for others’ reaction to an action we plan than
we realize that others, when planning their action,
may have already accounted for our possible
counter-reactions. OK, thank you very much. That’s the end. [APPLAUSE] Yes? There’s one. Yes? AUDIENCE: How do you make
sure the subjects are clear about the
rules of the game? The learning curve
may involve, like, a subject gradually learns about
[? the rules ?] of the game rather than really [INAUDIBLE]. JUN ZHANG: Right,
good question– so we have the 24
training games. So before they play
all these games, they play 24 training
games in which the payoffs are very simple. So in other words,
say for instance the payoff for player two from
B-C-D is like one, two, three. So if they understand the rule
of the game, they should know. They should answer
that very correctly. So we gave them
the training games, and we look at the
performance in the last, say, eight games of
the training games. And that, also, we coupled
with these catch trials. So these are the ones we
used to basically screen out the subjects. Right, so, yes. Yes? AUDIENCE: I have a
p-beauty contest question. You said that it wasn’t
typically any more than three or four orders
[? of recursion ?] [INAUDIBLE]. Has there been any sort of
correlation between, say, the size of the group that’s
asked the question [INAUDIBLE]? JUN ZHANG: Yeah,
that’s a good question. But I’m not aware
of– I mean, I don’t know much about the
[INAUDIBLE] literature on the question about the
size dependency on this. So the empirical kind
of answer to the level is normally, like, two to three. But it depends on what you
count as, say, the zero level and the first level
because you can say, maybe everybody submitted
like 15 instead. So then, like, the
zeroth level would really be 33 because 2/3
of that and then– so there are some arguments. So you can always have
like one level off. But normally it’s
like the argument has been, like, two to three. And we look at this. This is even, like,
one to two in our game, only investigating one to
two steps of recursion. Yes? AUDIENCE: [INAUDIBLE]
tell people about the recursion
and then [INAUDIBLE]. I don’t know if you do. Do they add more levels of
recursion to their thinking? JUN ZHANG: Ah, so
in other words, whether they learn to
be kind of recursive. It’s a good question. And I have been thinking
about just using these as, like, training games. I’ve been thinking
about using these as training for recursion. So there’s one
issue that we need to resolve first in terms
of kind of why people, say, startup with, like, the
myopic model of the opponent. Or maybe this is kind of
an economy of effort thing. They don’t work hard enough,
and they gradually realize. They adjust and so forth. Or it could also be that this
is a rather abstract kind of a notion, and
the payoff numbers are giving out very
abstract [INAUDIBLE]. And what happens if
we want to give– say we have a very concrete
kind of reasoning paradigm, just like in the [? recent ?]
[? kind of ?] selection task. So there’s a difference
between running an abstract-reasoning game
versus a concrete reasoning game, right? So we have studied
running subjects by actually giving
them stories, a cover story for three-step reasoning. Say, like, a typical example
would be an application game. So you apply for a college. You can decide whether
to apply to a college. The college can decide whether
to accept or not accept, and then you can decide whether
to go or not go to a college. So this is a very typical
kind of a three-stage game. The applicant has control
of the first and last step, and the university
and the college has the control of
the second step. So you can give a variety
of payoff structures in terms of the desirability
of all possible outcomes, the desirability in a
sense of whether, say, a university would
like more students to apply but reject them so
that their rejection ratio can be higher. Or the university can have some
preference for a person really they didn’t want. They didn’t want
the person to come. And then for the applicant,
they can have relative rankings of the outcome based on
maybe their preference of the various outcomes. So you apply. You reject or you get accepted. So we give these
scenarios and then have people reason
on these scenarios and want to see any difference
than the abstract reasoning task. We don’t have the result yet,
but this is the direction that we are testing. But I think the
question about using those to train
recursive reasoning, this is a very
interesting question. We hope this set of games can
be useful towards that goal. AUDIENCE: In the history
of theory of mind, there have been [INAUDIBLE]
the average onset of a working theory of mind falls
in the development of a child between five
and seven years of age. But if a child is taught to
play games at earlier ages, does it or does it not
accelerate the development of a working theory of mind? Do you have an opinion on that? JUN ZHANG: Yeah, so the typical
developmental literature, the time that they pin down is
between, like, three and four. But of course, this is,
say, as evidenced by, like, false-belief tasks and so forth. It does not involve a
recursion [INAUDIBLE]. So I’m not aware of the
development literature about the recursion or what age. This would be an age where
actually training would be possible because they can
understand the structure. They can understand
instructions. So, yes, one thing
to try out is to have them reason through these games
and whether that would help. And in fact, there are groups. I think they are applying
this to children. I mentioned earlier,
our paradigm has been adopted by one
group in the Netherlands which are being used upon
children and training. Dr. Verbrugge’s group
in the Netherlands, they are devising
a concrete game of this sort, a three-step
game which they are actually running on children. And there are some other
groups, but the other groups are running among adults. I think the first group
is running on children. Yes? AUDIENCE: Did you find
in the data any evidence of social preferences? So for example, like, a person
might– even though you told them to only look at
their own payoffs, they might prefer where
both the agents get three rather than passing it so
that they can get four. But the other person will
only end up with one. Does that explain any
of the myopic behavior? So this is a good
question about, say, whether our subjects
are playing as we instructed, in terms of
playing as a non-cooperative. And that’s number one. The second is that because
there is prolonged interaction, there is always this
possibility of signaling. So in other words, they may
play the first few games in a certain way to
signal the other person that they are playing this
way so the other person can react in kind. So that’s kind of a,
like, possibility. So we checked a few of the
heuristics about the playing of non-cooperative
[INAUDIBLE]– for instance, if there is a higher number in
that game in the ending cell, in D, whether you should
go based on heuristic because everybody
wants to go there. So we check off some
of this heuristic, but we hadn’t systematically
kind of checked the answers based on the second part of it. But we did, again, use
these catch trials, catch games, just to make
sure that they are not doing anything like that because
if they are doing anything like that, we may catch them. We may spot them in
the catch trials. And then the subject
would get excluded. In the exit questionnaire,
we asked them about strategy and so forth. They didn’t mention that they
were signalling the opponent. We had one manipulation of
the apparent intelligence of a confederate. So in that manipulation,
our confederate comes in. And there are two conditions. One is intelligent. One is, like, a not
as intelligent one. So the intelligence confederate
comes in, like, a minute late and apologizes to the subject. I’m late because I’m just
tutoring math students. And the session runs
long, and the person carries a mathematics kind
of textbook and so forth. And then when sitting down,
interacting with the subjects, and says, OK, I’m a
member of the chess club in honors college
and so forth [INAUDIBLE]. And you get a condition
where, say, the person just carries like a
supermarket tabloid, apologizing that the
calculus tutoring session is running too long and
finds calculus very difficult. And then if asked what the
person wants to do, you know, I just want to hang around,
not declared any major. So we run these. We run the manipulation,
and then we ask for ratings
for intelligence, friendliness, and so forth. It turns out, the
manipulation did work in affecting the people’s
rating of intelligence. I’m surprised by
actually the outcome of this simple
manipulation and that when we asked them to rate
the other person that they did show that. But it didn’t affect
depths of reason at all. So there’s no effect
on the recursive depths for this manipulation
in either direction. So there’s no effect on that. PRESENTER: All right, there
are refreshments outside, and you can continue
the discussion outside. Thank you. JUN ZHANG: OK,
thank you very much. Thank you. Thank you.

Leave a Reply

Your email address will not be published. Required fields are marked *