Often people collect and store data because they think some valuable information is implicitly hidden in it. For instance, in business, data may capture information about customers, competitors, critical markets, fraud, etc. Although most of the information is stored in large databases, SQL queries are not always feasible to support analysis of the data. Many interesting queries, such as
See this interesting recent article Data Mining hottest skills, as cited by respondents to Computerworld's annual Forecast survey.
Data mining is part of the knowledge discovery process consisting of algorithms that aim at finding patterns or models hidden in the data. Hence, data mining algorithms help to find answers for many interesting queries with which traditional database techniques cannot cope. Modern data mining techniques are nowadys used by most web search engines (e.g. google) and can also be used to reveal patterns hidden in the vast amount of data of the world wide web. For instance, have you heard of Wal*Mart success ?
Data mining connects with several other important fields. Some are listed below.
The major topics discuss discussed in this course are the following.
Course plan from studiehandbok.
The course is based on
Other course material, such as papers, will be suggested during the course. You are also free to choose any other book(s) at your preference.
Top
You can pass this course with grade 3, 4, or 5. To be approved in this course you must
You should form groups of 3 students and work together with your group for tasks 2 and 3 above. Not later than October 29th, all groups should be formed and each group should inform, by e-mail, the course responsible about the members of the group.
The presentation, for point 2. above, should be divided by each group member and in the begining of the presentation it should clearly be indicate what is under the responsabilitry of each member. The final pratical problem should also be divided in different sub-tasks that are then assigned to each group member. The final report must also indicate what was the contribution of each person in the group.
You can read more about seminars and pratical problem. Top
The following concrete topics are presented during lectures.
Lecture slides are based on the slides of the following books
Notes for the lectures (including the slides) will be posted in this web page before each lecture. The lecture slides below are from the course given in 2010. If new updates are made in some lecture slides then the new version will be posted in this web page.
Three seminars of 2 hours each, on week 49, have been booked. During each seminar two groups of students present their selected topic. The topic can either be a new topic not discussed in the lectures, a complement to a topic discussed in the lectures, or a practical application of a technique discussed in the lectures. On week 46 a list of topics for the seminars will be made available from the course web page. Each group must choose a topic and inform the course responsible about the group's choice until 18th of November. If more than one group expresses interest in a topic then I apply the rule ''first come, first served''. You inform the course responsible about your choice by sending an e-mail indicating your group number and your chosen topic.
Seminars and presentations follow the rules below.
You are welcome to discuss any issues of your presentation with me in advance.
You can find a suggestion about how to structure your paper here.
Note that I take the following aspects into account when evaluating your presentation.
Your grade for this part will take into account both the presentation and the short paper you submitted about your topic.
Below, you can find a list of proposed topics and some references. You are free to find your own references.
You can get from me the sections of the books recommended.
| Group Number | Group Members |
|---|---|
| Group 1: Web mining
[Slides]
[Paper]
7 of Dec., 8:00-9:00, TP31 |
|
| Group 2: Grafeted decision trees
[Slides]
[Paper]
7 of Dec., 9:00-10:00, TP31 |
|
| Group 3: Self-organizing maps (SOM)
[Slides]
[Paper]
9 of Dec., 15:00-16:00, TP31 |
|
| Group 4: DBSCAN algorithm
[Slides]
[Paper]
9 of Dec.,16:00-17:00, TP31 |
|
The software you are going to use is WEKA and it is installed in the lab rooms. WEKA is open source software and you can also install it in your own machines. You can download it and have access to WEKA manuals and tutorials. Some more information about WEKA is available at
Use the following simple lab exercises to get acquainted with WEKA.
On week 49, during your seminar presentation, every group must also present a practical problem of their own choice. This part of the presentation should take at most 10 minutes. The following points must be clearly addressed in your presentation.
The following aspects are taken in consideration when evaluating your report.
The deadline to submit the final report is January 20th, 2012. You must send a pdf file with your report together with the data set you worked with.
| Task | Deadline |
|---|---|
| Course start | Week 43, Monday, 13h-15h |
| Form group and inform course responsible by e-mail | October 29th, week 43 |
| List of seminar topics available | Week 46 |
| Choose a seminar's topic and inform course responsible by e-mail | November 18th, week 46 |
| Deliver slides and paper for the seminar presentation | December 1st, week 48, 15h |
| Present the practical problem for the final course project | Seminar on week 49 |
| Deliver report for the pratical problem | January 20th, week 3, 2012 |
You can find the schedule for the lectures (Fö), seminars (SE), and labs (LA) of the course here.
Top