Recognized as an essential component of Chinese culture, Traditional Chinese Medicine (TCM) is both an ancient medical system and one still used widely in China today. TCM's independently evolved knowledge system is expressed mainly in the Chinese language and the information is frequently only available through ancient classics and confidential family records, making it difficult to utilize. The major concern in TCM is how to consolidate and integrate the data, enabling efficient retrieval and discovery of novel knowledge from the dispersed data. Computational approaches such as data mining, semantic reasoning and computational intelligence have emerged as innovative approaches for the reservation and utilization of this knowledge system. Typically, this requires an inter-disciplinary approach involving Chinese culture, computer science, modern healthcare and life sciences. This book examines the computerization of TCM information and knowledge to provide intelligent resources and supporting evidences for clinical decision-making, drug discovery, and education. Recent research results from the Traditional Chinese Medicine Informatics Group of Zhejiang University are presented, gathering in one resource systematic approaches for massive data processing in TCM. These include the utilization of modern Semantic Web and data mining methods for more advanced data integration, data analysis and integrative knowledge discovery. This book will appeal to medical professionals, life sciences students, computer scientists, and those interested in integrative, complementary, and alternative medicine. This interdisciplinary book bringing together Traditional Chinese Medicine and computer scientists. It introduces novel network technologies to Traditional Chinese Medicine informatics. It provides theory and practical examples and case studies of new techniques.
Preface xi
1 Overview of Knowledge Discovery in 1 (26)
Traditional Chinese Medicine
1.1 Introduction 1 (2)
1.2 The State of the Art of TCM Data 3 (3)
Resources
1.2.1 Traditional Chinese Medical 4 (1)
Literature Analysis and Retrieval System
1.2.2 Figures and Photographs of 4 (1)
Traditional Chinese Drug Database
1.2.3 Database of Chinese Medical 5 (1)
Formulae
1.2.4 Database of Chemical Composition 5 (1)
from Chinese Herbal Medicine
1.2.5 Clinical Medicine Database 5 (1)
1.2.6 TCM Electronic Medical Record 6 (1)
Database
1.3 Review of KDTCM Research 6 (13)
1.3.1 Knowledge Discovery for CMF 6 (5)
Research
1.3.2 Knowledge Discovery for CHM 11 (3)
Research
1.3.3 Knowledge Discovery for Research 14 (2)
of TCM Syndrome
1.3.4 Knowledge Discovery for TCM 16 (3)
Clinical Diagnosis
1.4 Discussions and Future Directions 19 (3)
1.5 Conclusions 22 (5)
2 Integrative Mining of Traditional Chinese 27 (26)
Medicine Literature and MEDLINE for
Functional Gene Networks
2.1 Introduction 27 (2)
2.2 Connecting TCM Syndrome to Modern 29 (1)
Biomedicine by Integrative Literature
Mining
2.3 Related Work on Biomedical Literature 30 (3)
Mining
2.4 Name Entity and Relation Extraction 33 (3)
Methods
2.4.1 Bubble-Bootstrapping Method 33 (2)
2.4.2 Relation Weight Computing 35 (1)
2.5 MeDisco/3S System 36 (2)
2.6 Results 38 (9)
2.6.1 Functional Gene Networks 43 (2)
2.6.2 Functional Analysis of Genes from 45 (2)
Syndrome Perspective
2.7 Conclusions 47 (6)
3 MapReduce-Based Network Motif Detection 53 (14)
for Traditional Chinese Medicine
3.1 Introduction 53 (1)
3.2 Related Work 54 (1)
3.3 MapReduce-Based Pattern Finding 55 (6)
3.3.1 MRPF Framework 55 (2)
3.3.2 Neighbor Vertices Finding and 57 (1)
Pattern Initialization
3.3.3 Pattern Extension 58 (1)
3.3.4 Frequency Computing 59 (2)
3.4 Application to Prescription 61 (3)
Compatibility Structure Detection
3.4.1 Motifs Detection Results 61 (1)
3.4.2 Performance Analysis 62 (2)
3.5 Conclusions 64 (3)
4 Data Quality for Knowledge Discovery in 67 (8)
Traditional Chinese Medicine
4.1 Introduction 67 (2)
4.2 Key Data Quality Dimensions in TCM 69 (1)
4.2.1 Representation Granularity 69 (1)
4.2.2 Representation Consistency 69 (1)
4.2.3 Completeness 70 (1)
4.3 Methods to Handle Data Quality 70 (3)
Problems
4.3.1 Handling Representation 70 (1)
Granularity
4.3.2 Handling Representation 71 (1)
Consistency
4.3.3 Handling Completeness 72 (1)
4.4 Conclusions 73 (2)
5 Service-Oriented Data Mining in 75 (12)
Traditional Chinese Medicine
5.1 Introduction 75 (1)
5.2 Related Work 76 (2)
5.2.1 Traditional Data Mining Software 76 (1)
5.2.2 Data Mining Systems for Specific 77 (1)
Field
5.2.3 Distributed Data Mining Platform 77 (1)
5.2.4 The Spora Demo 78 (1)
5.3 System Architecture and Data Mining 78 (4)
Service
5.3.1 Hierarchical Structure 78 (2)
5.3.2 Service Operator Organization 80 (1)
5.3.3 User Interaction and Visualization 81 (1)
5.4 Case Studies 82 (3)
5.4.1 Case 1: Domain-Driven KDD Support 82 (2)
for TCM
5.4.2 Case 2: Data Mining Based on 84 (1)
Distributed Resources
5.4.3 Case 3: Data Mining Process as a 84 (1)
Service
5.5 Conclusions 85 (2)
6 Semantic E-Science for Traditional 87 (22)
Chinese Medicine
6.1 Introduction 87 (2)
6.2 Results 89 (13)
6.2.1 System Architecture 89 (2)
6.2.2 TCM Domain Ontology 91 (2)
6.2.3 DartMapping 93 (1)
6.2.4 DartSearch 94 (1)
6.2.5 DartQuery 95 (3)
6.2.6 TCM Service Coordination 98 (1)
6.2.7 Knowledge Discovery Service 98 (1)
6.2.8 DartFlow 99 (1)
6.2.9 TCM Collaborative Research 100(1)
Scenario
6.2.10 Task-Driven Information 100(1)
Allocation
6.2.11 Collaborative Information Sharing 101(1)
6.2.12 Scientific Service Coordination 102(1)
6.3 Discussion 102(1)
6.4 Conclusions 103(1)
6.5 Methods 103(6)
6.5.1 TCM Ontology Engineering 103(1)
6.5.2 View-Based Semantic Mapping 104(1)
6.5.3 Semantic-Based Service Matchmaking 105(4)
7 Ontology Development for Unified 109(20)
Traditional Chinese Medical Language System
7.1 Introduction 109(1)
7.2 The Principle and Knowledge System of 110(1)
TCM
7.3 What Is an Ontology? 111(1)
7.4 Protege 2000: The Tool We Use 111(1)
7.5 Ontology Design and Development for 112(5)
UTCMLS
7.5.1 Methodology of Ontology 113(2)
Development
7.5.2 Knowledge Acquisition 115(2)
7.5.3 Integrating and Merging of TCM 117(1)
Ontology
7.6 Results 117(7)
7.6.1 The Core Top-Level Categories 120(1)
7.6.2 Subontologies and the 120(1)
Hierarchical Structure
7.6.3 Concept Structure 120(1)
7.6.4 Semantic Structure 121(1)
7.6.5 Semantic Types and Semantic 121(3)
Relationships
7.7 Conclusions 124(5)
8 Causal Knowledge Modeling for Traditional 129(6)
Chinese Medicine Using OWL 2
8.1 Introduction 129(1)
8.2 Causal TCM Knowledge Modeling 130(1)
8.3 Causal Reasoning 130(1)
8.4 Evaluation 131(1)
8.5 Conclusions 132(3)
9 Dynamic Subontology Evolution for 135(36)
Traditional Chinese Medicine Web Ontology
9.1 Introduction 135(1)
9.2 TCM Domain Ontology 136(4)
9.2.1 Ontology Framework 136(3)
9.2.2 User Interface 139(1)
9.3 Subontology Model 140(6)
9.3.1 Preliminaries 142(1)
9.3.2 Subontology Definition 143(1)
9.3.3 Subontology Operators 144(2)
9.4 Ontology Cache for Knowledge Reuse 146(6)
9.4.1 Reusing Subontologies as Ontology 146(1)
Cache
9.4.2 Knowledge Search with Ontology 147(4)
Cache
9.4.3 On SubO Structural Optimality 151(1)
9.5 Dynamic Subontology Evolution 152(6)
9.5.1 Chromosome Representation 152(2)
9.5.2 Fitness Evaluation 154(1)
9.5.3 Genetic Operators 154(3)
9.5.4 Evolution Procedure 157(1)
9.5.5 Consistency 158(1)
9.6 Experiment and Evaluation 158(7)
9.6.1 Experiment Design 158(2)
9.6.2 Compare Cache Performance 160(3)
9.6.3 Knowledge Structure 163(1)
9.6.4 Traversal Depth for SubO 164(1)
Extraction
9.7 Related Work 165(1)
9.8 Conclusions 166(5)
10 Semantic Association Mining for 171(28)
Traditional Chinese Medicine
10.1 Introduction 171(3)
10.1.1 The Semantic Web for 171(1)
Collaborative Knowledge Discovery
10.1.2 The Motivating Story 172(1)
10.1.3 HerbNet: The Knowledge Network 173(1)
for Herbal Medicine
10.1.4 Paper Organization 174(1)
10.2 Related Work 174(3)
10.2.1 Domain-Driven Relationship 174(1)
Mining for Biomedicine
10.2.2 Linked Data on the Semantic Web 175(1)
10.2.3 Semantic Association Mining 176(1)
10.3 Methods 177(8)
10.3.1 Semantic Graph Model 177(1)
10.3.2 Hypothesis and Hypothetical Graph 178(1)
10.3.3 Evidence and Evidentiary Graph 179(2)
10.3.4 Semantic Schema 181(1)
10.3.5 Semantic Association Mining 182(2)
10.3.6 Semantic Association Ranking 184(1)
10.3.7 Summary 185(1)
10.4 Evaluation 185(6)
10.4.1 Synthetic Graph Generation 186(1)
10.4.2 Engine Implementation 186(1)
10.4.3 Miner Implementation 187(2)
10.4.4 Collaborative Discovery Process 189(1)
10.4.5 Result Analysis 190(1)
10.5 Use Cases 191(4)
10.5.1 The HerbNet 192(1)
10.5.2 Formula System Interpretation 193(1)
10.5.3 Herb---Drug Interaction Network 194(1)
Analysis
10.6 Conclusions 195(4)
11 Semantic-Based Database Integration for 199(14)
Traditional Chinese Medicine
11.1 Introduction 199(2)
11.2 System Architecture and Technical 201(1)
Features
11.2.1 System Architecture 201(1)
11.2.2 Technical Features 201(1)
11.3 Semantic Mediation 202(3)
11.3.1 Semantic View and View-Based 202(2)
Mapping
11.3.2 Visualized Semantic Mapping Tool 204(1)
11.4 TCM Semantic Portals 205(3)
11.4.1 Dynamic Semantic Query Interface 205(1)
11.4.2 Intuitive Search Interface with 206(2)
Concepts Ranking and Semantic Navigation
11.5 User Evaluation and Lesson Learned 208(1)
11.5.1 Feedback from CATCM 208(1)
11.5.2 A Survey on the Usage of RDF/OWL 209(1)
Predicates
11.6 Related Work 209(2)
11.6.1 Semantic Web Context 209(2)
11.6.2 Conventional Data Integration 211(1)
Context
11.7 Conclusions 211(2)
12 Probabilistic Semantic Relationship 213(10)
Discovery from Traditional Chinese Medical
Literature
12.1 Background 213(1)
12.2 Related Work 214(1)
12.3 Methods 215(5)
12.3.1 Instance Extraction 215(1)
12.3.2 Instance Pair Discovery 215(2)
12.3.3 Semantic Relationship Evaluation 217(1)
12.3.4 Probability-Based Semantic 218(2)
Relationship Extraction
12.4 Results and Discussions 220(1)
12.5 Conclusions 221(2)
13 Deriving Similarity Graphs from 223
Traditional Chinese Medicine Linked Data on
the Semantic Web
13.1 Introduction 223(1)
13.2 Related Work 224(1)
13.2.1 Taxonomy-Based Approach 224(1)
13.2.2 Relationship-Based Approach 224(1)
13.3 SST Approach 225(2)
13.3.1 Similarity Transition 225(1)
13.3.2 Similarity between Sets of 226(1)
Objects
13.4 Experiments and Results 227(5)
13.4.1 Dataset Preparation 228(1)
13.4.2 Results Analysis 229(2)
13.4.3 Result Visualization 231(1)
13.5 Conclusions 232