Data Mining Book - Wismar Business School

Data Mining

The vast mountains of data in modern databases contain undiscovered knowledge that is difficult to uncover without the right tools. This is where data mining comes in, providing methods and algorithms for discovering previously unknown connections.

The book covers the material of a one-semester lecture on data mining at universities and technical colleges and is designed as a classic textbook.

Chapter 1 - Introduction

This chapter provides a general introduction to the field of data mining and explains some basic terms.

What is data mining and what can it do?
Structure of the book
Data analysis process
Interdisciplinarity
Tools

Files

[1] Weather data, numercal, [2] and nominal
[3] Knime-Weather-Workflow Fig. 1.6 / 1.8 (you have to configure the FileReader)
[4] Iris data, arff-Format
[5] Knime-Loops Fig. 1.13 (you have to configure the FileReader)

Tools

Knime: http://www.knime.org/
WEKA: http://www.cs.waikato.ac.nz/ml/weka/
The JavaNNS is a Java-based GUI for SNNS.
Rapid Miner http://rapid-i.com

Competitions

Kaggle: http://www.kaggle.com/competitions

Data collections:

Kaggle: http://www.kaggle.com/
Open Data Inception opendatainception.io
Google Clud https://cloud.google.com/public-datasets/
GitHub https://github.com/awesomedata/awesome-public-datasets
Open Data on AWS https://registry.opendata.aws/
EU Open Data Portal https://data.europa.eu/euodp/en/data/
World-Bank https://data.worldbank.org/
WHO https://www.who.int/gho/database/en/
Data.gov https://catalog.data.gov/dataset

Here are some notes on the tasks in the book.

Chapter 2 - Fundamentals

This chapter explains some basic terms.

Basic terms
Data types
Similarity/distance measures
Fundamentals of artificial neural networks
Logic
Supervised and unsupervised learning

Chapter 3 - Application classes

This chapter provides an overview of the application classes in data mining.

Cluster analysis
Classification
Numerical prediction
Association analysis
Text mining
Web mining

Chapter 4 - Knowlede Representation

This chapter provides an overview of the possibilities for representing (Data Mining) knowledge on a computer.

Decision table
Decision trees
Rules
Association rules
Instance-based representation
Representation of clusters
Neural networks as knowledge stores

Chapter 5 - Classification

This chapter provides an overview of basic classification methods.

k-Nearest Neighbour
Decision Trees
Naive Bayes
Feed-forward NN
SVM

Files

[1] Income data
[2] Knime-KNN-Income (you have to reconfigure the FileReader)
[3] und [3a] Weather data (nominal, csv)
[4] Knime-NaiveBayes-Weather (you have to reconfigure the FileReader)
[5] Employee data (csv)
[6] Knime-SVM Employee data (you have to reconfigure the FileReader)

Chapter 6 - Cluster analysis

This chapter provides an overview of basic clustering methods.

k-Means
k-Medoid
Expectation maximisation
Density-based clustering
SOM
Neural gas
ART nets

Files

[1] Iris data, arff-Format
[2] Knime-KMeans Fig. 6.14 (configure the FileReader)
[3] File for String Replace-Node

Chapter 7 - Association analysis

This chapter provides an overview of basic association analysis methods.

A-Priori
Frequent Pattern Growth

Files

[1] Condition data, csv-Format (Tab. 7.1)
Link auf die Umfrage 2006

Chapter 8 - Data Preparation

This chapter provides an overview of basic data preparation techniques.

Data selection and integration
Data cleansing
Data reduction
Data transformation

Files

[1] Iris-Daten, arff-Format
[2] Condition data, csv-Format (Ex. 8.7)
[3] KNIME-Workflow 1, KNN without Normalisationg
[4] KNIME-Workflow 2, KNN with Normilisation

Chapter 9 - Evaluation

This chapter provides an overview of basic options for evaluating results and for visualisation.

Interest measures
Quality measures and error costs
Test and validation sets
Visualisation

Files

[1] GnuPlot-File Fig. 9.6
[2] GnuPlot-File Fig. 9.14

Chapter 10 - Example

This chapter discusses possible approaches to the Data Mining Cup 2002.

Best score in the DM Cup 2002

12.12.13 Tanja Ciernioch (Wismar University of Applied Sciences): 7771.80

Dateien

Die Aufgabe des Data-Mining-Cups 2002.
[2]Knime-Workflow für den ersten Versuch (Abb. 10.1).
[3]Knime-Workflow für kNN (Abb. 10.5).
[4]Knime-Workflow für Naive Bayes (Abb. 10.6).
[5]Knime-Workflow 1 für DecTree (Abb. 10.7).
[6]Knime-Workflow 2 für DecTree (Abb. 10.8).
[7]Knime-Workflow 1 für NN (Abb. 10.9).

Ihre Datenschutz-Optionen

Data Mining

Chapter 1 - Introduction

Chapter 2 - Fundamentals

Chapter 3 - Application classes

Chapter 4 - Knowlede Representation

Chapter 5 - Classification

Chapter 6 - Cluster analysis

Chapter 7 - Association analysis

Chapter 8 - Data Preparation

Chapter 9 - Evaluation

Chapter 10 - Example

Kontakt

Fakultät

Bildungspartner

Studium

Forschung & Kooperationen

Hochschul-Unternehmen

Social Media