Datamining
Full course description
Data mining is a relatively new scientific field that enables finding interesting knowledge from (very large) data. In practice it is often a mixed-initiative process that has the potential to predict events or to analyze them in retrospect. Data mining has elements of artificial intelligence, machine learning,
and statistics.
A typical database contains data, information or even knowledge if the appropriate queries are submitted and answered. The situation changes if you have to analyze large databases with many variables. Elementary database queries and standard statistical analysis are not sufficient to answer your information need. Your intuition guides you to understand that the database contains more knowledge on a specific topic that you would like to know explicitly. Data mining can assist you in acquiring this knowledge. The course shows you within two months how this works. You will learn new techniques, new methods, and tools of data mining. The course focuses on techniques with a direct practical use. A step-by-step introduction to powerful (freeware) data-mining tools will enable you to achieve specific skills, autonomy and hands-on experience. A number of real data sets will be analyzed and discussed. In the end of the course you will be able to apply data-mining techniques for research and business purposes.
The following points will be addressed during the course:
* Data Mining and Knowledge Discovery
* Data Preparation
* Basic Techniques for Data Mining:
- Decision-Tree Induction
- Rule Induction
- Instance-Based Learning
- Bayesian Learning
- Ensemble Techniques
- Clustering
- Association Rules
- Tools for Data Mining
- How to Interpret and Evaluate Data-Mining Results
Course objectives
- To provide an introduction to the fundamental concepts found throughout the field of data mining.
- To provide a practical experience of applying data-mining techniques for analyzing data and deriving new knowledge.
Prerequisites
SCI2039 Computer Science or SCI2011 Introduction to Programming and SSC2061 Statistics I.
Recommended reading
- Mitchell, T. (1997). Machine Learning. McGraw Hill. ISBN 0070428077.