Multi-Agent Machine Learning : A Reinforcement Approach-西安交通大学图书馆

Multi-Agent Machine Learning : A Reinforcement Approach

发布日期:2015-12-17 浏览次

Multi-Agent Machine Learning : A Reinforcement Approach

[Book Description]

The book begins with a chapter on traditional methods of supervised learning, covering recursive least squares learning, mean square error methods, and stochastic approximation. Chapter 2 covers single agent reinforcement learning. Topics include learning value functions, Markov games, and TD learning with eligibility traces. Chapter 3 discusses two player games including two player matrix games with both pure and mixed strategies. Numerous algorithms and examples are presented. Chapter 4 covers learning in multi-player games, stochastic games, and Markov games, focusing on learning multi-player grid games-two player grid games, Q-learning, and Nash Q-learning. Chapter 5 discusses differential games, including multi player differential games, actor critique structure, adaptive fuzzy control and fuzzy interference systems, the evader pursuit game, and the defending a territory games. Chapter 6 discusses new ideas on learning within robotic swarms and the innovative idea of the evolution of personality traits. * Framework for understanding a variety of methods and approaches in multi-agent machine learning.* Discusses methods of reinforcement learning such as a number of forms of multi-agent Q-learning * Applicable to research professors and graduate students studying electrical and computer engineering, computer science, and mechanical and aerospace engineering

[Table of Contents]

Preface ix

Chapter 1 A Brief Review of Supervised 1 (11)

Learning

1.1 Least Squares Estimates 1 (4)

1.2 Recursive Least Squares 5 (1)

1.3 Least Mean Squares 6 (4)

1.4 Stochastic Approximation 10 (2)

References 11 (1)

Chapter 2 Single-Agent Reinforcement 12 (26)

Learning

2.1 Introduction 12 (1)

2.2 n-Armed Bandit Problem 13 (2)

2.3 The Learning Structure 15 (2)

2.4 The Value Function 17 (1)

2.5 The Optimal Value Functions 18 (5)

2.5.1 The Grid World Example 20 (3)

2.6 Markov Decision Processes 23 (2)

2.7 Learning Value Functions 25 (1)

2.8 Policy Iteration 26 (2)

2.9 Temporal Difference Learning 28 (2)

2.10 TD Learning of the State-Action 30 (2)

Function

2.11 Q-Learning 32 (1)

2.12 Eligibility Traces 33 (5)

References 37 (1)

Chapter 3 Learning in Two-Player Matrix 38 (35)

Games

3.1 Matrix Games 38 (4)

3.2 Nash Equilibria in Two-Player Matrix 42 (1)

Games

3.3 Linear Programming in Two-Player 43 (4)

Zero-Sum Matrix Games

3.4 The Learning Algorithms 47 (1)

3.5 Gradient Ascent Algorithm 47 (4)

3.6 WoLF-IGA Algorithm 51 (1)

3.7 Policy Hill Climbing (PHC) 52 (2)

3.8 WoLF-PHC Algorithm 54 (3)

3.9 Decentralized Learning in Matrix Games 57 (2)

3.10 Learning Automata 59 (1)

3.11 Linear Reward--Inaction Algorithm 59 (1)

3.12 Linear Reward--Penalty Algorithm 60 (1)

3.13 The Lagging Anchor Algorithm 60 (2)

3.14 LR--1 Lagging Anchor Algorithm 62 (11)

3.14.1 Simulation 68 (2)

References 70 (3)

Chapter 4 Learning in Multiplayer 73 (71)

Stochastic Games

4.1 Introduction 73 (2)

4.2 Multiplayer Stochastic Games 75 (4)

4.3 Minimax-Q Algorithm 79 (8)

4.3.1 2x2 Grid Game 80 (7)

4.4 Nash Q-Learning 87 (9)

4.4.1 The Learning Process 95 (1)

4.5 The Simplex Algorithm 96 (4)

4.6 The Lemke-Howson Algorithm 100(7)

4.7 Nash-Q Implementation 107(4)

4.8 Friend-or-Foe Q-Learning 111(1)

4.9 Infinite Gradient Ascent 112(2)

4.10 Policy Hill Climbing 114(1)

4.11 WoLF-PHC Algorithm 114(3)

4.12 Guarding a Territory Problem in a 117(8)

Grid World

4.12.1 Simulation and Results 119(6)

4.13 Extension of LR--1 Lagging Anchor 125(3)

Algorithm to Stochastic Games

4.14 The Exponential Moving-Average 128(3)

Q-Learning (EMA Q-Learning) Algorithm

4.15 Simulation and Results Comparing EMA 131(13)

Q-Learning to Other Methods

4.15.1 Matrix Games 131(3)

4.15.2 Stochastic Games 134(7)

References 141(3)

Chapter 5 Differential Games 144(56)

5.1 Introduction 144(2)

5.2 A Brief Tutorial on Fuzzy Systems 146(9)

5.2.1 Fuzzy Sets and Fuzzy Rules 146(2)

5.2.2 Fuzzy Inference Engine 148(3)

5.2.3 Fuzzifier and Defuzzifier 151(1)

5.2.4 Fuzzy Systems and Examples 152(3)

5.3 Fuzzy Q-Learning 155(4)

5.4 Fuzzy Actor--Critic Learning 159(3)

5.5 Homicidal Chauffeur Differential Game 162(3)

5.6 Fuzzy Controller Structure 165(1)

5.7 Q(λ)-Learning Fuzzy Inference 166(5)

System

5.8 Simulation Results for the Homicidal 171(3)

Chauffeur

5.9 Learning in the Evader-Pursuer Game 174(3)

with Two Cars

5.10 Simulation of the Game of Two Cars 177(3)

5.11 Differential Game of Guarding a 180(4)

Territory

5.12 Reward Shaping in the Differential 184(1)

Game of Guarding a Territory

5.13 Simulation Results 185(15)

5.13.1 One Defender Versus One Invader 185(6)

5.13.2 Two Defenders Versus One Invader 191(6)

References 197(3)

Chapter 6 Swarm Intelligence and the 200(37)

Evolution of Personality Traits

6.1 Introduction 200(1)

6.2 The Evolution of Swarm Intelligence 200(1)

6.3 Representation of the Environment 201(2)

6.4 Swarm-Based Robotics in Terms of 203(3)

Personalities

6.5 Evolution of Personality Traits 206(1)

6.6 Simulation Framework 207(1)

6.7 A Zero-Sum Game Example 208(8)

6.7.1 Convergence 208(6)

6.7.2 Simulation Results 214(2)

6.8 Implementation for Next Sections 216(2)

6.9 Robots Leaving a Room 218(3)

6.10 Tracking a Target 221(11)

6.11 Conclusion 232(5)

References 233(4)

Index 237

上一条：Coal Power Plant Materials and Life Assessment : Developments and Applications
下一条：Big Data Beyond the Hype: A Guide to Conversations for Today's Data Center

【关闭】