Mechanisms of Transitions in Strategic Behavior in a Mixed-Strategy Game

By Mitchell Ostrow

My most significant research project entails computational modeling of macaque behavior and neural correlates, in a psychiatry lab at Yale University.

Matching Pennies is a binary-choice, repetitive game, played against an opponent. In this instance of the game, 6 monkeys played an unbiased matching pennies game for a number of sessions with varying length. At each step, they could perform saccades to a target on the left or right of the fixation point. Because the optimal policy is to play each choice with equal probability, their behavior appears random. However, it is not possible for a monkey to truly play randomly, due to biological constraints. Preliminary results have shown that simple learning strategies, such as those following Thorndike’s Law of Effect or Reinforcement Learning Theory, can predict behavior on a significantly greater than chance basis. However, their effect sizes are weak; these models tend to predict only slightly greater than chance (typically 0.51 – 0.6 accuracy). We sought to infer the function that generates the policy or decision strategy used by these monkeys.

Optimal behavior requires the temporal fluctuation in strategic dominance. That is, the relative importance of various factors in determining choice outcome is itself a function. For example, a simple heuristic-like algorithm might be used in the early stages of a session (such as tabular Q-learning), when there is insufficient data to calculate statistical patterns. As certainty of these patterns increases, these algorithms increase in overall importance. However, high inaccuracies of these models might cause intensive re-learning of these models, thus diminishing its likelihood and therefore scaling down its importance in overall behavior.

In this project, I designed various computational models of behavior, including deep learning, reinforcement learning, and logistic regression. The ultimate goal of the project is to develop and analyze an effective model to represent ground truth behavior. Later stages will involve finding neural correlates of these strategies.

Leave a Reply