Solution Manual for Data Mining Concepts and Techniques, Third Edition by Jiawei Han, Micheline Kamber, Jian Pei


Data Mining Concepts and Techniques, Third Edition by Jiawei Han, Micheline Kamber, Jian Pei (Solution Manual) ISBN-13: 9780123814791 ISBN-10: 0123814790



Data Mining Concepts and Techniques, Third Edition by Jiawei Han, Micheline Kamber, Jian Pei (Solution Manual) ISBN-13: 978-0123814791  ISBN-10: 0123814790

Table Of Contents:
Chapter 1 Introduction

1.1 Why Data Mining?

1.1.1 Moving toward the Information Age

1.1.2 Data Mining as the Evolution of Information Technology

1.2 What Is Data Mining?

1.3 What Kinds of Data Can Be Mined?

1.3.1 Database Data

1.3.2 Data Warehouses

1.3.3 Transactional Data

1.3.4 Other Kinds of Data

1.4 What Kinds of Patterns Can Be Mined?

1.4.1 Class/Concept Description: Characterization and Discrimination

1.4.2 Mining Frequent Patterns, Associations, and Correlations

1.4.3 Classification and Regression for Predictive Analysis

1.4.4 Cluster Analysis

1.4.5 Outlier Analysis

1.4.6 Are All Patterns Interesting?

1.5 Which Technologies Are Used?

1.5.1 Statistics

1.5.2 Machine Learning

1.5.3 Database Systems and Data Warehouses

1.5.4 Information Retrieval

1.6 Which Kinds of Applications Are Targeted?

1.6.1 Business Intelligence

1.6.2 Web Search Engines

1.7 Major Issues in Data Mining

1.7.1 Mining Methodology

1.7.2 User Interaction

1.7.3 Efficiency and Scalability

1.7.4 Diversity of Database Types

1.7.5 Data Mining and Society

1.8 Summary

1.9 Exercises

1.10 Bibliographic Notes

Chapter 2 Getting to Know Your Data

2.1 Data Objects and Attribute Types

2.1.1 What Is an Attribute?

2.1.2 Nominal Attributes

2.1.3 Binary Attributes

2.1.4 Ordinal Attributes

2.1.5 Numeric Attributes

2.1.6 Discrete versus Continuous Attributes

2.2 Basic Statistical Descriptions of Data

2.2.1 Measuring the Central Tendency: Mean, Median, and Mode

2.2.2 Measuring the Dispersion of Data: Range, Quartiles, Variance, Standard Deviation, and Interquartile Range

2.2.3 Graphic Displays of Basic Statistical Descriptions of Data

2.3 Data Visualization

2.3.1 Pixel-Oriented Visualization Techniques

2.3.2 Geometric Projection Visualization Techniques

2.3.3 Icon-Based Visualization Techniques

2.3.4 Hierarchical Visualization Techniques

2.3.5 Visualizing Complex Data and Relations

2.4 Measuring Data Similarity and Dissimilarity

2.4.1 Data Matrix versus Dissimilarity Matrix

2.4.2 Proximity Measures for Nominal Attributes

2.4.3 Proximity Measures for Binary Attributes

2.4.4 Dissimilarity of Numeric Data: Minkowski Distance

2.4.5 Proximity Measures for Ordinal Attributes

2.4.6 Dissimilarity for Attributes of Mixed Types

2.4.7 Cosine Similarity

2.5 Summary

2.6 Exercises

2.7 Bibliographic Notes

Chapter 3 Data Preprocessing

3.1 Data Preprocessing: An Overview

3.1.1 Data Quality: Why Preprocess the Data?

3.1.2 Major Tasks in Data Preprocessing

3.2 Data Cleaning

3.2.1 Missing Values

3.2.2 Noisy Data

3.2.3 Data Cleaning as a Process

3.3 Data Integration

3.3.1 Entity Identification Problem

3.3.2 Redundancy and Correlation Analysis

3.3.3 Tuple Duplication

3.3.4 Data Value Conflict Detection and Resolution

3.4 Data Reduction

3.4.1 Overview of Data Reduction Strategies

3.4.2 Wavelet Transforms

3.4.3 Principal Components Analysis

3.4.4 Attribute Subset Selection

3.4.5 Regression and Log-Linear Models: Parametric Data Reduction

3.4.6 Histograms

3.4.7 Clustering

3.4.8 Sampling

3.4.9 Data Cube Aggregation

3.5 Data Transformation and Data Discretization

3.5.1 Data Transformation Strategies Overview

3.5.2 Data Transformation by Normalization

3.5.3 Discretization by Binning

3.5.4 Discretization by Histogram Analysis

3.5.5 Discretization by Cluster, Decision Tree, and Correlation Analyses

3.5.6 Concept Hierarchy Generation for Nominal Data

3.6 Summary

3.7 Exercises

3.8 Bibliographic Notes

Chapter 4 Data Warehousing and Online Analytical Processing

4.1 Data Warehouse: Basic Concepts

4.1.1 What Is a Data Warehouse?

4.1.2 Differences between Operational Database Systems and Data Warehouses

4.1.3 But, Why Have a Separate Data Warehouse?

4.1.4 Data Warehousing: A Multitiered Architecture

4.1.5 Data Warehouse Models: Enterprise Warehouse, Data Mart, and Virtual Warehouse

4.1.6 Extraction, Transformation, and Loading

4.1.7 Metadata Repository

4.2 Data Warehouse Modeling: Data Cube and OLAP

4.2.1 Data Cube: A Multidimensional Data Model

4.2.2 Stars, Snowflakes, and Fact Constellations: Schemas for Multidimensional Data Models

4.2.3 Dimensions: The Role of Concept Hierarchies

4.2.4 Measures: Their Categorization and Computation

4.2.5 Typical OLAP Operations

4.2.6 A Starnet Query Model for Querying Multidimensional Databases

4.3 Data Warehouse Design and Usage

4.3.1 A Business Analysis Framework for Data Warehouse Design

4.3.2 Data Warehouse Design Process

4.3.3 Data Warehouse Usage for Information Processing

4.3.4 From Online Analytical Processing to Multidimensional Data Mining

4.4 Data Warehouse Implementation

4.4.1 Efficient Data Cube Computation: An Overview

4.4.2 Indexing OLAP Data: Bitmap Index and Join Index

4.4.3 Efficient Processing of OLAP Queries

4.4.4 OLAP Server Architectures: ROLAP versus MOLAP versus HOLAP

4.5 Data Generalization by Attribute-Oriented Induction

4.5.1 Attribute-Oriented Induction for Data Characterization

4.5.2 Efficient Implementation of Attribute-Oriented Induction

4.5.3 Attribute-Oriented Induction for Class Comparisons

4.6 Summary

4.7 Exercises

4.8 Bibliographic Notes

Chapter 5 Data Cube Technology

5.1 Data Cube Computation: Preliminary Concepts

5.1.1 Cube Materialization: Full Cube, Iceberg Cube, Closed Cube, and Cube Shell

5.1.2 General Strategies for Data Cube Computation

5.2 Data Cube Computation Methods

5.2.1 Multiway Array Aggregation for Full Cube Computation

5.2.2 BUC: Computing Iceberg Cubes from the Apex Cuboid Downward

5.2.3 Star-Cubing: Computing Iceberg Cubes Using a Dynamic Star-Tree Structure

5.2.4 Precomputing Shell Fragments for Fast High-Dimensional OLAP

5.3 Processing Advanced Kinds of Queries by Exploring Cube Technology

5.3.1 Sampling Cubes: OLAP-Based Mining on Sampling Data

5.3.2 Ranking Cubes: Efficient Computation of Top-k Queries

5.4 Multidimensional Data Analysis in Cube Space

5.4.1 Prediction Cubes: Prediction Mining in Cube Space

5.4.2 Multifeature Cubes: Complex Aggregation at Multiple Granularities

5.4.3 Exception-Based, Discovery-Driven Cube Space Exploration

5.5 Summary

5.6 Exercises

5.7 Bibliographic Notes

Chapter 6 Mining Frequent Patterns, Associations, and Correlations: Basic Concepts and Methods

6.1 Basic Concepts

6.1.1 Market Basket Analysis: A Motivating Example

6.1.2 Frequent Itemsets, Closed Itemsets, and Association Rules

6.2 Frequent Itemset Mining Methods

6.2.1 Apriori Algorithm: Finding Frequent Itemsets by Confined Candidate Generation

6.2.2 Generating Association Rules from Frequent Itemsets

6.2.3 Improving the Efficiency of Apriori

6.2.4 A Pattern-Growth Approach for Mining Frequent Itemsets

6.2.5 Mining Frequent Itemsets Using Vertical Data Format

6.2.6 Mining Closed and Max Patterns

6.3 Which Patterns Are Interesting?—Pattern Evaluation Methods

6.3.1 Strong Rules Are Not Necessarily Interesting

6.3.2 From Association Analysis to Correlation Analysis

6.3.3 A Comparison of Pattern Evaluation Measures

6.4 Summary

6.5 Exercises

6.6 Bibliographic Notes

Chapter 7 Advanced Pattern Mining

7.1 Pattern Mining: A Road Map

7.2 Pattern Mining in Multilevel, Multidimensional Space

7.2.1 Mining Multilevel Associations

7.2.2 Mining Multidimensional Associations

7.2.3 Mining Quantitative Association Rules

7.2.4 Mining Rare Patterns and Negative Patterns

7.3 Constraint-Based Frequent Pattern Mining

7.3.1 Metarule-Guided Mining of Association Rules

7.3.2 Constraint-Based Pattern Generation: Pruning Pattern Space and Pruning Data Space

7.4 Mining High-Dimensional Data and Colossal Patterns

7.4.1 Mining Colossal Patterns by Pattern-Fusion

7.5 Mining Compressed or Approximate Patterns

7.5.1 Mining Compressed Patterns by Pattern Clustering

7.5.2 Extracting Redundancy-Aware Top-k Patterns

7.6 Pattern Exploration and Application

7.6.1 Semantic Annotation of Frequent Patterns

7.6.2 Applications of Pattern Mining

7.7 Summary

7.8 Exercises

7.9 Bibliographic Notes

Chapter 8 Classification: Basic Concepts

8.1 Basic Concepts

8.1.1 What Is Classification?

8.1.2 General Approach to Classification

8.2 Decision Tree Induction

8.2.1 Decision Tree Induction

8.2.2 Attribute Selection Measures

8.2.3 Tree Pruning

8.2.4 Scalability and Decision Tree Induction

8.2.5 Visual Mining for Decision Tree Induction

8.3 Bayes Classification Methods

8.3.1 Bayes’ Theorem

8.3.2 Na¨ive Bayesian Classification

8.4 Rule-Based Classification

8.4.1 Using IF-THEN Rules for Classification

8.4.2 Rule Extraction from a Decision Tree

8.4.3 Rule Induction Using a Sequential Covering Algorithm

8.5 Model Evaluation and Selection

8.5.1 Metrics for Evaluating Classifier Performance

8.5.2 Holdout Method and Random Subsampling

8.5.3 Cross-Validation

8.5.4 Bootstrap

8.5.5 Model Selection Using Statistical Tests of Significance

8.5.6 Comparing Classifiers Based on Cost–Benefit and ROC Curves

8.6 Techniques to Improve Classification Accuracy

8.6.1 Introducing Ensemble Methods

8.6.2 Bagging

8.6.3 Boosting and AdaBoost

8.6.4 Random Forests

8.6.5 Improving Classification Accuracy of Class-Imbalanced Data

8.7 Summary

8.8 Exercises

8.9 Bibliographic Notes

Chapter 9 Classification: Advanced Methods

9.1 Bayesian Belief Networks

9.1.1 Concepts and Mechanisms

9.1.2 Training Bayesian Belief Networks

9.2 Classification by Backpropagation

9.2.1 A Multilayer Feed-Forward Neural Network

9.2.2 Defining a Network Topology

9.2.3 Backpropagation

9.2.4 Inside the Black Box: Backpropagation and Interpretability

9.3 Support Vector Machines

9.3.1 The Case When the Data Are Linearly Separable

9.3.2 The Case When the Data Are Linearly Inseparable

9.4 Classification Using Frequent Patterns

9.4.1 Associative Classification

9.4.2 Discriminative Frequent Pattern–Based Classification

9.5 Lazy Learners (or Learning from Your Neighbors)

9.5.1 ?-Nearest-Neighbor Classifiers

9.5.2 Case-Based Reasoning

9.6 Other Classification Methods

9.6.1 Genetic Algorithms

9.6.2 Rough Set Approach

9.6.3 Fuzzy Set Approaches

9.7 Additional Topics Regarding Classification

9.7.1 Multiclass Classification

9.7.2 Semi-Supervised Classification

9.7.3 Active Learning

9.7.4 Transfer Learning

9.8 Summary

9.9 Exercises

9.10 Bibliographic Notes

Chapter 10 Cluster Analysis: Basic Concepts and Methods

10.1 Cluster Analysis

10.1.1 What Is Cluster Analysis?

10.1.2 Requirements for Cluster Analysis

10.1.3 Overview of Basic Clustering Methods

10.2 Partitioning Methods

10.2.1 ?-Means: A Centroid-Based Technique

10.2.2 ?-Medoids: A Representative Object-Based Technique

10.3 Hierarchical Methods

10.3.1 Agglomerative versus Divisive Hierarchical Clustering

10.3.2 Distance Measures in Algorithmic Methods

10.3.3 BIRCH: Multiphase Hierarchical Clustering Using Clustering Feature Trees

10.3.4 Chameleon: Multiphase Hierarchical Clustering Using Dynamic Modeling

10.3.5 Probabilistic Hierarchical Clustering

10.4 Density-Based Methods

10.4.1 DBSCAN: Density-Based Clustering Based on Connected Regions with High Density

10.4.2 OPTICS: Ordering Points to Identify the Clustering Structure

10.4.3 DENCLUE: Clustering Based on Density Distribution Functions

10.5 Grid-Based Methods

10.5.1 STING: STatistical INformation Grid

10.5.2 CLIQUE: An Apriori-like Subspace Clustering Method

10.6 Evaluation of Clustering

10.6.1 Assessing Clustering Tendency

10.6.2 Determining the Number of Clusters

10.6.3 Measuring Clustering Quality

10.7 Summary

10.8 Exercises

10.9 Bibliographic Notes

Chapter 11 Advanced Cluster Analysis

11.1 Probabilistic Model-Based Clustering

11.1.1 Fuzzy Clusters

11.1.2 Probabilistic Model-Based Clusters

11.1.3 Expectation-Maximization Algorithm

11.2 Clustering High-Dimensional Data

11.2.1 Clustering High-Dimensional Data: Problems, Challenges, and Major Methodologies

11.2.2 Subspace Clustering Methods

11.2.3 Biclustering

11.2.4 Dimensionality Reduction Methods and Spectral Clustering

11.3 Clustering Graph and Network Data

11.3.1 Applications and Challenges

11.3.2 Similarity Measures

11.3.3 Graph Clustering Methods

11.4 Clustering with Constraints

11.4.1 Categorization of Constraints

11.4.2 Methods for Clustering with Constraints

11.5 Summary

11.6 Exercises

11.7 Bibliographic Notes

Chapter 12 Outlier Detection

12.1 Outliers and Outlier Analysis

12.1.1 What Are Outliers?

12.1.2 Types of Outliers

12.1.3 Challenges of Outlier Detection

12.2 Outlier Detection Methods

12.2.1 Supervised, Semi-Supervised, and Unsupervised Methods

12.2.2 Statistical Methods, Proximity-Based Methods, and Clustering-Based Methods

12.3 Statistical Approaches

12.3.1 Parametric Methods

12.3.2 Nonparametric Methods

12.4 Proximity-Based Approaches

12.4.1 Distance-Based Outlier Detection and a Nested Loop Method

12.4.2 A Grid-Based Method

12.4.3 Density-Based Outlier Detection

12.5 Clustering-Based Approaches

12.6 Classification-Based Approaches

12.7 Mining Contextual and Collective Outliers

12.7.1 Transforming Contextual Outlier Detection to Conventional Outlier Detection

12.7.2 Modeling Normal Behavior with Respect to Contexts

12.7.3 Mining Collective Outliers

12.8 Outlier Detection in High-Dimensional Data

12.8.1 Extending Conventional Outlier Detection

12.8.2 Finding Outliers in Subspaces

12.8.3 Modeling High-Dimensional Outliers

12.9 Summary

12.10 Exercises

12.11 Bibliographic Notes

Chapter 13 Data Mining Trends and Research Frontiers

13.1 Mining Complex Data Types

13.1.1 Mining Sequence Data: Time-Series, Symbolic Sequences, and Biological Sequences

13.1.2 Mining Graphs and Networks

13.1.3 Mining Other Kinds of Data

13.2 Other Methodologies of Data Mining

13.2.1 Statistical Data Mining

13.2.2 Views on Data Mining Foundations

13.2.3 Visual and Audio Data Mining

13.3 Data Mining Applications

13.3.1 Data Mining for Financial Data Analysis

13.3.2 Data Mining for Retail and Telecommunication Industries

13.3.3 Data Mining in Science and Engineering

13.3.4 Data Mining for Intrusion Detection and Prevention

13.3.5 Data Mining and Recommender Systems

13.4 Data Mining and Society

13.4.1 Ubiquitous and Invisible Data Mining

13.4.2 Privacy, Security, and Social Impacts of Data Mining

13.5 Data Mining Trends

13.6 Summary

13.7 Exercises

13.8 Bibliographic Notes

Instant Access After Placing The Order.
All The Chapters Are Included.
Electronic Versions Only DOC/PDF. No Shipping Address Required.
This is the SOLUTION MANUAL Only. Not The Textbook.


Solutions Manual contains all answers to all the questions and case studies in your text book, but usually broken down into more understandable steps separated by chapters.

Other terms for the Solutions Manual are solution manual, solutions manuals, answer book, case answers, textbook answers and instructor manual, instructor solutions manual and SM.


The clock is ticking and every second you spend stressing over your academic performance is time that you can spend taking action to turn your grades around. And just in case you are wondering, getting good grades dose not mean you have to turn into a BOOK -WORM or nerd.

After all, you still wish to have a life, go to the movies, go out with friends and have fun.

That is why you must learn the secrets that will help you digest, absorb and remember large chunks of info easily and quickly so you get the best grades!

Solutions Manuals do not cut corners, but they cut to the chase so you can get best grades!