Data Mining

The vast mountains of data in modern databases contain undiscovered knowledge that is difficult to uncover without the right tools. This is where data mining comes in, providing methods and algorithms for discovering previously unknown connections.

The book covers the material of a one-semester lecture on data mining at universities and technical colleges and is designed as a classic textbook.

Chapter 1 - Introduction

This chapter provides a general introduction to the field of data mining and explains some basic terms.

  • What is data mining and what can it do?
  • Structure of the book
  • Data analysis process
  • Interdisciplinarity
  • Tools

Files

  • [1] Weather data, numercal, [2] and nominal
  • [3] Knime-Weather-Workflow Fig. 1.6 / 1.8 (you have to configure the FileReader)
  • [4] Iris data, arff-Format
  • [5] Knime-Loops Fig. 1.13 (you have to configure the FileReader)

Tools

Competitions

Data collections:

Here are some notes on the tasks in the book

Chapter 2 - Fundamentals

This chapter explains some basic terms.

  • Basic terms
  • Data types
  • Similarity/distance measures
  • Fundamentals of artificial neural networks
  • Logic
  • Supervised and unsupervised learning

Chapter 3 - Application classes

This chapter provides an overview of the application classes in data mining.

  • Cluster analysis
  • Classification
  • Numerical prediction
  • Association analysis
  • Text mining
  • Web mining

Chapter 4 - Knowlede Representation

This chapter provides an overview of the possibilities for representing (Data Mining) knowledge on a computer.

  • Decision table
  • Decision trees
  • Rules
  • Association rules
  • Instance-based representation
  • Representation of clusters
  • Neural networks as knowledge stores

Chapter 5 - Classification

This chapter provides an overview of basic classification methods.

  • k-Nearest Neighbour
  • Decision Trees
  • Naive Bayes
  • Feed-forward NN
  • SVM

Files

  • [1] Income data
  • [2] Knime-KNN-Income (you have to reconfigure the FileReader)
  • [3] und [3a] Weather data (nominal, csv)
  • [4] Knime-NaiveBayes-Weather (you have to reconfigure the FileReader)
  • [5] Employee data (csv)
  • [6] Knime-SVM Employee data (you have to reconfigure the FileReader)

Chapter 6 - Cluster analysis

This chapter provides an overview of basic clustering methods.

  • k-Means
  • k-Medoid
  • Expectation maximisation
  • Density-based clustering
  • SOM
  • Neural gas
  • ART nets

Files

  • [1] Iris data, arff-Format
  • [2] Knime-KMeans Fig. 6.14 (configure the FileReader)
  • [3] File for String Replace-Node

Chapter 7 - Association analysis

This chapter provides an overview of basic association analysis methods.

  • A-Priori
  • Frequent Pattern Growth

Files

Chapter 8 - Data Preparation

This chapter provides an overview of basic data preparation techniques.

  • Data selection and integration
  • Data cleansing
  • Data reduction
  • Data transformation

Files

  • [1] Iris-Daten, arff-Format
  • [2] Condition data, csv-Format (Ex. 8.7)
  • [3] KNIME-Workflow 1, KNN without Normalisationg
  • [4] KNIME-Workflow 2, KNN with Normilisation

Chapter 9 - Evaluation

This chapter provides an overview of basic options for evaluating results and for visualisation.

  • Interest measures
  • Quality measures and error costs
  • Test and validation sets
  • Visualisation

Files

  • [1] GnuPlot-File Fig. 9.6
  • [2] GnuPlot-File Fig. 9.14

Chapter 10 - Example

This chapter discusses possible approaches to the Data Mining Cup 2002.

Best score in the DM Cup 2002

  • 12.12.13 Tanja Ciernioch (Wismar University of Applied Sciences): 7771.80

Dateien

  • Die Aufgabe des Data-Mining-Cups 2002.
  • [2]Knime-Workflow für den ersten Versuch (Abb. 10.1).
  • [3]Knime-Workflow für kNN (Abb. 10.5).
  • [4]Knime-Workflow für Naive Bayes (Abb. 10.6).
  • [5]Knime-Workflow 1 für DecTree (Abb. 10.7).
  • [6]Knime-Workflow 2 für DecTree (Abb. 10.8).
  • [7]Knime-Workflow 1 für NN (Abb. 10.9).