4 min read

This ICLR 2018 accepted paper, Deep Mean Field Games for Learning Optimal Behavior Policy of Large Populations, deals with inference in models of collective behavior, specifically at how to infer the parameters of a mean field game (MFG) representation of collective behavior. This paper is authored by Jiachen Yang, Xiaojing Ye, Rakshit Trivedi, Huan Xu, and Hongyuan Zha. The 6th annual ICLR conference is scheduled to happen between April 30 – May 03, 2018.

Mean field game theory is the study of decision making in very large populations of small interacting agents. This theory understands the behavior of multiple agents each individually trying to optimize their position in space and time, but with their preferences being partly determined by the choices of all the other agents.

Estimating the optimal behavior policy of large populations with Deep Mean Field Games

What problem is the paper attempting to solve?

The paper considers the problem of representing and learning the behavior of a large population of agents, to construct an effective predictive model of the behavior. For example, a population’s behavior directly affects the ranking of a set of trending topics on social media, represented by the global population distribution over topics. Each user’s observation of this global state influences their choice of the next topic in which to participate, thereby contributing to future population behavior.

Classical predictive methods such as time series analysis are also used to build predictive models from data. However, these models do not consider the behavior as

the result of optimization of a reward function and so may not provide insight into the motivations that produce a population’s behavior policy. Alternatively, methods that employ the underlying population network structure assume that nodes are only influenced by a local neighborhood and do not include a representation of a global state. Hence, they face difficulty in explaining events as the result of uncontrolled implicit optimization.

MFG (mean field games) overcomes the limitations of alternative predictive methods by determining how a system naturally behaves according to its underlying optimal control policy. The paper proposes a novel approach for estimating the parameters of MFG. The main contribution of the paper is in relating the theories of MFG and Reinforcement Learning within the classic context of Markov Decision Processes (MDPs). The method suggested uses inverse RL to learn both the reward function and the forward dynamics of the MFG from data.

Paper summary

The paper covers the problem in three sections– theory, algorithm, and experiment.  The theoretical contribution begins by transforming a continuous time MFG formulation to a discrete time formulation and then relates the MFG to an associated MDP problem.
In the algorithm phase, an RL solution is suggested to the MFG problem. The authors relate solving an optimization problem on an MDP of a single agent with solving the inference problem of the (population-level) MFG. This leads to learning a reward function from demonstrations using a maximum likelihood approach, where the reward is represented using a deep neural network. The policy is learned through an actor-critic algorithm, based on gradient descent with respect to the policy parameters.
The algorithm is then compared with previous approaches on toy problems with artificially created reward functions. The authors then demonstrate the algorithm on real-world social data with the aim of recovering the reward function and predicting the future trajectory.

Key Takeaways

  • This paper describes a data-driven method to solve a mean field game model of population evolution, by proving a connection between Mean Field Games with Markov Decision Process and building on methods in reinforcement learning.
  • This method is scalable to arbitrarily large populations because the Mean Field Games framework represents population density rather than individual agents.
  • With experiments on real data, Mean Field Games emerges as a powerful framework for learning a reward and policy that can predict trajectories of a real-world population more accurately than alternatives.

Reviewer feedback summary

Overall Score: 26/30
Average Score: 8.66

The reviewers are unanimous in finding the work in this paper highly novel and significant. According to the reviewers, there is still minimal work at the intersection of machine learning and collective behavior, and this paper could help to stimulate the growth of that intersection. On the flip side, surprisingly, the paper was criticized with the statement “scientific content of the work has critical conceptual flaws”. However, the author refutations persuaded the reviewers that the concerns were largely addressed.

Content Marketing Editor at Packt Hub. I blog about new and upcoming tech trends ranging from Data science, Web development, Programming, Cloud & Networking, IoT, Security and Game development.

LEAVE A REPLY

Please enter your comment!
Please enter your name here