新书报道
当前位置: 首页 >> 电类优秀教材 >> 正文
Multi-Agent Machine Learning : A Reinforcement Approach
发布日期:2015-12-17  浏览

Multi-Agent Machine Learning : A Reinforcement Approach

[Book Description]

The book begins with a chapter on traditional methods of supervised learning, covering recursive least squares learning, mean square error methods, and stochastic approximation. Chapter 2 covers single agent reinforcement learning. Topics include learning value functions, Markov games, and TD learning with eligibility traces. Chapter 3 discusses two player games including two player matrix games with both pure and mixed strategies. Numerous algorithms and examples are presented. Chapter 4 covers learning in multi-player games, stochastic games, and Markov games, focusing on learning multi-player grid games-two player grid games, Q-learning, and Nash Q-learning. Chapter 5 discusses differential games, including multi player differential games, actor critique structure, adaptive fuzzy control and fuzzy interference systems, the evader pursuit game, and the defending a territory games. Chapter 6 discusses new ideas on learning within robotic swarms and the innovative idea of the evolution of personality traits. * Framework for understanding a variety of methods and approaches in multi-agent machine learning.* Discusses methods of reinforcement learning such as a number of forms of multi-agent Q-learning * Applicable to research professors and graduate students studying electrical and computer engineering, computer science, and mechanical and aerospace engineering

[Table of Contents]
Preface                                            ix
    Chapter 1 A Brief Review of Supervised         1  (11)
    Learning
      1.1 Least Squares Estimates                  1  (4)
      1.2 Recursive Least Squares                  5  (1)
      1.3 Least Mean Squares                       6  (4)
      1.4 Stochastic Approximation                 10 (2)
        References                                 11 (1)
    Chapter 2 Single-Agent Reinforcement           12 (26)
    Learning
      2.1 Introduction                             12 (1)
      2.2 n-Armed Bandit Problem                   13 (2)
      2.3 The Learning Structure                   15 (2)
      2.4 The Value Function                       17 (1)
      2.5 The Optimal Value Functions              18 (5)
        2.5.1 The Grid World Example               20 (3)
      2.6 Markov Decision Processes                23 (2)
      2.7 Learning Value Functions                 25 (1)
      2.8 Policy Iteration                         26 (2)
      2.9 Temporal Difference Learning             28 (2)
      2.10 TD Learning of the State-Action         30 (2)
      Function
      2.11 Q-Learning                              32 (1)
      2.12 Eligibility Traces                      33 (5)
        References                                 37 (1)
    Chapter 3 Learning in Two-Player Matrix        38 (35)
    Games
      3.1 Matrix Games                             38 (4)
      3.2 Nash Equilibria in Two-Player Matrix     42 (1)
      Games
      3.3 Linear Programming in Two-Player         43 (4)
      Zero-Sum Matrix Games
      3.4 The Learning Algorithms                  47 (1)
      3.5 Gradient Ascent Algorithm                47 (4)
      3.6 WoLF-IGA Algorithm                       51 (1)
      3.7 Policy Hill Climbing (PHC)               52 (2)
      3.8 WoLF-PHC Algorithm                       54 (3)
      3.9 Decentralized Learning in Matrix Games   57 (2)
      3.10 Learning Automata                       59 (1)
      3.11 Linear Reward--Inaction Algorithm       59 (1)
      3.12 Linear Reward--Penalty Algorithm        60 (1)
      3.13 The Lagging Anchor Algorithm            60 (2)
      3.14 LR--1 Lagging Anchor Algorithm          62 (11)
        3.14.1 Simulation                          68 (2)
        References                                 70 (3)
    Chapter 4 Learning in Multiplayer              73 (71)
    Stochastic Games
      4.1 Introduction                             73 (2)
      4.2 Multiplayer Stochastic Games             75 (4)
      4.3 Minimax-Q Algorithm                      79 (8)
        4.3.1 2x2 Grid Game                        80 (7)
      4.4 Nash Q-Learning                          87 (9)
        4.4.1 The Learning Process                 95 (1)
      4.5 The Simplex Algorithm                    96 (4)
      4.6 The Lemke-Howson Algorithm               100(7)
      4.7 Nash-Q Implementation                    107(4)
      4.8 Friend-or-Foe Q-Learning                 111(1)
      4.9 Infinite Gradient Ascent                 112(2)
      4.10 Policy Hill Climbing                    114(1)
      4.11 WoLF-PHC Algorithm                      114(3)
      4.12 Guarding a Territory Problem in a       117(8)
      Grid World
        4.12.1 Simulation and Results              119(6)
      4.13 Extension of LR--1 Lagging Anchor       125(3)
      Algorithm to Stochastic Games
      4.14 The Exponential Moving-Average          128(3)
      Q-Learning (EMA Q-Learning) Algorithm
      4.15 Simulation and Results Comparing EMA    131(13)
      Q-Learning to Other Methods
        4.15.1 Matrix Games                        131(3)
        4.15.2 Stochastic Games                    134(7)
        References                                 141(3)
    Chapter 5 Differential Games                   144(56)
      5.1 Introduction                             144(2)
      5.2 A Brief Tutorial on Fuzzy Systems        146(9)
        5.2.1 Fuzzy Sets and Fuzzy Rules           146(2)
        5.2.2 Fuzzy Inference Engine               148(3)
        5.2.3 Fuzzifier and Defuzzifier            151(1)
        5.2.4 Fuzzy Systems and Examples           152(3)
      5.3 Fuzzy Q-Learning                         155(4)
      5.4 Fuzzy Actor--Critic Learning             159(3)
      5.5 Homicidal Chauffeur Differential Game    162(3)
      5.6 Fuzzy Controller Structure               165(1)
      5.7 Q(λ)-Learning Fuzzy Inference     166(5)
      System
      5.8 Simulation Results for the Homicidal     171(3)
      Chauffeur
      5.9 Learning in the Evader-Pursuer Game      174(3)
      with Two Cars
      5.10 Simulation of the Game of Two Cars      177(3)
      5.11 Differential Game of Guarding a         180(4)
      Territory
      5.12 Reward Shaping in the Differential      184(1)
      Game of Guarding a Territory
      5.13 Simulation Results                      185(15)
        5.13.1 One Defender Versus One Invader     185(6)
        5.13.2 Two Defenders Versus One Invader    191(6)
        References                                 197(3)
    Chapter 6 Swarm Intelligence and the           200(37)
    Evolution of Personality Traits
      6.1 Introduction                             200(1)
      6.2 The Evolution of Swarm Intelligence      200(1)
      6.3 Representation of the Environment        201(2)
      6.4 Swarm-Based Robotics in Terms of         203(3)
      Personalities
      6.5 Evolution of Personality Traits          206(1)
      6.6 Simulation Framework                     207(1)
      6.7 A Zero-Sum Game Example                  208(8)
        6.7.1 Convergence                          208(6)
        6.7.2 Simulation Results                   214(2)
      6.8 Implementation for Next Sections         216(2)
      6.9 Robots Leaving a Room                    218(3)
      6.10 Tracking a Target                       221(11)
      6.11 Conclusion                              232(5)
        References                                 233(4)
Index                                              237
 

关闭


版权所有:西安交通大学图书馆      设计与制作:西安交通大学数据与信息中心  
地址:陕西省西安市碑林区咸宁西路28号     邮编710049

推荐使用IE9以上浏览器、谷歌、搜狗、360浏览器;推荐分辨率1360*768以上