Combining Pattern Classifiers : Methods and Algorithms-西安交通大学图书馆

Combining Pattern Classifiers : Methods and Algorithms

发布日期:2015-12-17 浏览次

Combining Pattern Classifiers : Methods and Algorithms

[Book Description]

A unified, coherent treatment of current classifier ensemble methods, from fundamentals of pattern recognition to ensemble feature selection, now in its second edition The art and science of combining pattern classifiers has flourished into a prolific discipline since the first edition of Combining Pattern Classifiers was published in 2004. Dr. Kuncheva has plucked from the rich landscape of recent classifier ensemble literature the topics, methods, and algorithms that will guide the reader toward a deeper understanding of the fundamentals, design, and applications of classifier ensemble methods.Thoroughly updated, with MATLAB(R) code and practice data sets throughout, Combining Pattern Classifiers includes: * Coverage of Bayes decision theory and experimental comparison of classifiers * Essential ensemble methods such as Bagging, Random forest, AdaBoost, Random subspace, Rotation forest, Random oracle, and Error Correcting Output Code, among others * Chapters on classifier selection, diversity, and ensemble feature selection With firm grounding in the fundamentals of pattern recognition, and featuring more than 140 illustrations, Combining Pattern Classifiers, Second Edition is a valuable reference for postgraduate students, researchers, and practitioners in computing and engineering.

[Table of Contents]

Preface xv

Acknowledgements xxi

1 Fundamentals of Pattern Recognition 1 (48)

1.1 Basic Concepts: Class, Feature, Data 1 (8)

Set

1.1.1 Classes and Class Labels 1 (1)

1.1.2 Features 2 (1)

1.1.3 Data Set 3 (3)

1.1.4 Generate Your Own Data 6 (3)

1.2 Classifier, Discriminant Functions, 9 (2)

Classification Regions

1.3 Classification Error and 11 (8)

Classification Accuracy

1.3.1 Where Does the Error Come From? 11 (2)

Bias and Variance

1.3.2 Estimation of the Error 13 (1)

1.3.3 Confusion Matrices and Loss 14 (1)

Matrices

1.3.4 Training and Testing Protocols 15 (2)

1.3.5 Overtraining and Peeking 17 (2)

1.4 Experimental Comparison of Classifiers 19 (11)

1.4.1 Two Trained Classifiers and a 20 (2)

Fixed Testing Set

1.4.2 Two Classifier Models and a 22 (4)

Single Data Set

1.4.3 Two Classifier Models and 26 (1)

Multiple Data Sets

1.4.4 Multiple Classifier Models and 27 (3)

Multiple Data Sets

1.5 Bayes Decision Theory 30 (5)

1.5.1 Probabilistic Framework 30 (1)

1.5.2 Discriminant Functions and 31 (2)

Decision Boundaries

1.5.3 Bayes Error 33 (2)

1.6 Clustering and Feature Selection 35 (5)

1.6.1 Clustering 35 (2)

1.6.2 Feature Selection 37 (3)

1.7 Challenges of Real-Life Data 40 (9)

Appendix 41 (1)

1.A.1 Data Generation 41 (1)

1.A.2 Comparison of Classifiers 42 (1)

1.A.2.1 MATLAB Functions for Comparing 42 (3)

Classifiers

1.A.2.2 Critical Values for Wilcoxon 45 (2)

and Sign Test

1.A.3 Feature Selection 47 (2)

2 Base Classifiers 49 (45)

2.1 Linear and Quadratic Classifiers 49 (6)

2.1.1 Linear Discriminant Classifier 49 (3)

2.1.2 Nearest Mean Classifier 52 (1)

2.1.3 Quadratic Discriminant Classifier 52 (1)

2.1.4 Stability of LDC and QDC 53 (2)

2.2 Decision Tree Classifiers 55 (11)

2.2.1 Basics and Terminology 55 (2)

2.2.2 Training of Decision Tree 57 (1)

Classifiers

2.2.3 Selection of the Feature for a 58 (2)

Node

2.2.4 Stopping Criterion 60 (3)

2.2.5 Pruning of the Decision Tree 63 (1)

2.2.6 C4.5 and ID3 64 (1)

2.2.7 Instability of Decision Trees 64 (1)

2.2.8 Random Trees 65 (1)

2.3 The Naive Bayes Classifier 66 (2)

2.4 Neural Networks 68 (5)

2.4.1 Neurons 68 (2)

2.4.2 Rosenblatt's Perceptron 70 (1)

2.4.3 Multi-Layer Perceptron 71 (2)

2.5 Support Vector Machines 73 (7)

2.5.1 Why Would It Work? 73 (1)

2.5.2 Classification Margins 74 (2)

2.5.3 Optimal Linear Boundary 76 (2)

2.5.4 Parameters and Classification 78 (2)

Boundaries of SVM

2.6 The κ-Nearest Neighbor 80 (2)

Classifier (A:-nn)

2.7 Final Remarks 82 (12)

2.7.1 Simple or Complex Models? 82 (1)

2.7.2 The Triangle Diagram 83 (2)

2.7.3 Choosing a Base Classifier for 85 (1)

Ensembles

Appendix 85 (1)

2.A.1 MATLAB Code for the Fish Data 85 (1)

2.A.2 MATLAB Code for Individual 86 (1)

Classifiers

2.A.2.1 Decision Tree 86 (3)

2.A.2.2 Naive Bayes 89 (1)

2.A.2.3 Multi-Layer Perceptron 90 (2)

2.A.2.4 1-nn Classifier 92 (2)

3 An Overview of the Field 94 (17)

3.1 Philosophy 94 (4)

3.2 Two Examples 98 (2)

3.2.1 The Wisdom of the "Classifier 98 (1)

Crowd"

3.2.2 The Power of Divide-and-Conquer 98 (2)

3.3 Structure of the Area 100(5)

3.3.1 Terminology 100(1)

3.3.2 A Taxonomy of Classifier Ensemble 100(4)

Methods

3.3.3 Classifier Fusion and Classifier 104(1)

Selection

3.4 Quo Vadis? 105(6)

3.4.1 Reinventing the Wheel? 105(1)

3.4.2 The Illusion of Progress? 106(1)

3.4.3 A Bibliometric Snapshot 107(4)

4 Combining Label Outputs 111(32)

4.1 Types of Classifier Outputs 111(1)

4.2 A Probabilistic Framework for 112(1)

Combining Label Outputs

4.3 Majority Vote 113(12)

4.3.1 "Democracy" in Classifier 113(1)

Combination

4.3.2 Accuracy of the Majority Vote 114(3)

4.3.3 Limits on the Majority Vote 117(2)

Accuracy: An Example

4.3.4 Patterns of Success and Failure 119(5)

4.3.5 Optimality of the Majority Vote 124(1)

Combiner

4.4 Weighted Majority Vote 125(3)

4.4.1 Two Examples 126(1)

4.4.2 Optimality of the Weighted 127(1)

Majority Vote Combiner

4.5 Naive-Bayes Combiner 128(4)

4.5.1 Optimality of the Naive Bayes 128(2)

Combiner

4.5.2 Implementation of the NB Combiner 130(2)

4.6 Multinomial Methods 132(3)

4.7 Comparison of Combination Methods for 135(8)

Label Outputs

Appendix 137(1)

4.A.1 Matan's Proof for the Limits on 137(2)

the Majority Vote Accuracy

4.A.2 Selected MATLAB Code 139(4)

5 Combining Continuous-Valued Outputs 143(43)

5.1 Decision Profile 143(1)

5.2 How Do We Get Probability Outputs? 144(6)

5.2.1 Probabilities Based on 144(3)

Discriminant Scores

5.2.2 Probabilities Based on Counts: 147(3)

Laplace Estimator

5.3 Nontrainable (Fixed) Combination Rules 150(16)

5.3.1 A Generic Formulation 150(2)

5.3.2 Equivalence of Simple Combination 152(1)

Rules

5.3.3 Generalized Mean Combiner 153(3)

5.3.4 A Theoretical Comparison of 156(4)

Simple Combiners

5.3.5 Where Do They Come From? 160(6)

5.4 The Weighted Average (Linear Combiner) 166(6)

5.4.1 Consensus Theory 166(1)

5.4.2 Added Error for the Weighted Mean 167(1)

Combination

5.4.3 Linear Regression 168(4)

5.5 A Classifier as a Combiner 172(3)

5.5.1 The Supra Bayesian Approach 172(1)

5.5.2 Decision Templates 173(2)

5.5.3 A Linear Classifier 175(1)

5.6 An Example of Nine Combiners for 175(1)

Continuous-Valued Outputs

5.7 To Train or Not to Train? 176(10)

Appendix 178(1)

5.A.1 Theoretical Classification Error 178(1)

for the Simple Combiners

5.A.1.1 Set-up and Assumptions 178(2)

5.A.1.2 Individual Error 180(1)

5.A.1.3 Minimum and Maximum 180(1)

5.A.1.4 Average (Sum) 181(1)

5.A.1.5 Median and Majority Vote 182(1)

5.A.1.6 Oracle 183(1)

5.A.2 Selected MATLAB Code 183(3)

6 Ensemble Methods 186(44)

6.1 Bagging 186(4)

6.1.1 The Origins: Bagging Predictors 186(1)

6.1.2 Why Does Bagging Work? 187(2)

6.1.3 Out-of-bag Estimates 189(1)

6.1.4 Variants of Bagging 190(1)

6.2 Random Forests 190(2)

6.3 AdaBoost 192(11)

6.3.1 The AdaBoost Algorithm 192(2)

6.3.2 The arc-x4 Algorithm 194(1)

6.3.3 Why Does AdaBoost Work? 195(4)

6.3.4 Variants of Boosting 199(1)

6.3.5 A Famous Application: AdaBoost 199(4)

for Face Detection

6.4 Random Subspace Ensembles 203(1)

6.5 Rotation Forest 204(4)

6.6 Random Linear Oracle 208(3)

6.7 Error Correcting Output Codes (ECOC) 211(19)

6.7.1 Code Designs 212(2)

6.7.2 Decoding 214(2)

6.7.3 Ensembles of Nested Dichotomies 216(2)

Appendix 218(1)

6.A.1 Bagging 218(2)

6.A.2 AdaBoost 220(3)

6.A.3 Random Subspace 223(2)

6.A.4 Rotation Forest 225(3)

6.A.5 Random Linear Oracle 228(1)

6.A.6 ECOC 229(1)

7 Classifier Selection 230(17)

7.1 Preliminaries 230(1)

7.2 Why Classifier Selection Works 231(2)

7.3 Estimating Local Competence 233(6)

Dynamically

7.3.1 Decision-Independent Estimates 233(5)

7.3.2 Decision-Dependent Estimates 238(1)

7.4 Pre-Estimation of the Competence 239(3)

Regions

7.4.1 Bespoke Classifiers 240(1)

7.4.2 Clustering and Selection 241(1)

7.5 Simultaneous Training of Regions and 242(2)

Classifiers

7.6 Cascade Classifiers 244(3)

Appendix: Selected MATLAB Code 244(1)

7.A.1 Banana Data 244(1)

7.A.2 Evolutionary Algorithm for a 245(2)

Selection Ensemble for the Banana Data

8 Diversity in Classifier Ensembles 247(43)

8.1 What Is Diversity? 247(3)

8.1.1 Diversity for a Point-Value 248(1)

Estimate

8.1.2 Diversity in Software Engineering 248(1)

8.1.3 Statistical Measures of 249(1)

Relationship

8.2 Measuring Diversity in Classifier 250(6)

Ensembles

8.2.1 Pairwise Measures 250(1)

8.2.2 Nonpairwise Measures 251(5)

8.3 Relationship Between Diversity and 256(14)

Accuracy

8.3.1 An Example 256(2)

8.3.2 Relationship Patterns 258(4)

8.3.3 A Caveat: Independent Outputs 262(3)

≠ Independent Errors

8.3.4 Independence Is Not the Best 265(2)

Scenario

8.3.5 Diversity and Ensemble Margins 267(3)

8.4 Using Diversity 270(9)

8.4.1 Diversity for Finding Bounds and 270(1)

Theoretical Relationships

8.4.2 Kappa-error Diagrams and Ensemble 271(4)

Maps

8.4.3 Overproduce and Select 275(4)

8.5 Conclusions: Diversity of Diversity 279(11)

Appendix 280(1)

8.A.1 Derivation of Diversity Measures 280(1)

for Oracle Outputs

8.A.1.1 Correlation ρ 280(1)

8.A.1.2 Interrater Agreement κ 281(1)

8.A.2 Diversity Measure Equivalence 282(2)

8.A.3 Independent Outputs ≠ 284(2)

Independent Errors

8.A.4 A Bound on the Kappa-Error Diagram 286(1)

8.A.5 Calculation of the Pareto Frontier 287(3)

9 Ensemble Feature Selection 290(36)

9.1 Preliminaries 290(5)

9.1.1 Right and Wrong Protocols 290(4)

9.1.2 Ensemble Feature Selection 294(1)

Approaches

9.1.3 Natural Grouping 294(1)

9.2 Ranking by Decision Tree Ensembles 295(4)

9.2.1 Simple Count and Split Criterion 295(2)

9.2.2 Permuted Features or the 297(2)

"Noised-up" Method

9.3 Ensembles of Rankers 299(6)

9.3.1 The Approach 299(1)

9.3.2 Ranking Methods (Criteria) 300(5)

9.4 Random Feature Selection for the 305(10)

Ensemble

9.4.1 Random Subspace Revisited 305(1)

9.4.2 Usability, Coverage, and Feature 306(6)

Diversity

9.4.3 Genetic Algorithms 312(3)

9.5 Nonrandom Selection 315(2)

9.5.1 The "Favorite Class" Model 315(1)

9.5.2 The Iterative Model 315(1)

9.5.3 The Incremental Model 316(1)

9.6 A Stability Index 317(9)

9.6.1 Consistency Between a Pair of 317(2)

Subsets

9.6.2 A Stability Index for K Sequences 319(1)

9.6.3 An Example of Applying the 320(2)

Stability Index

Appendix 322(1)

9.A.1 MATLAB Code for the Numerical 322(1)

Example of Ensemble Ranking

9.A.2 MATLAB GA Nuggets 322(2)

9.A.3 MATLAB Code for the Stability 324(2)

Index

10 A Final Thought 326(1)

References 327(26)

Index 353

上一条：Chinese Cybersecurity and Cyberdefense
下一条：Bitemporal Data : Theory and Practice

【关闭】