Read the Blog It has the reading assignments
Topics for the quarter
- What are Concepts, Instances, Attributes, what is learning?
- Knowledge representation
- Rule Lists, Trees, Linear Models
- Training a machine learning system
- Cleaning and Transforming Data
- Bayesian Networks
- Clustering
- Neural Networks
- Regression
Readings for this week and last week
- in Witten: Chapter 2 and section 11.3 for lab
- From last week, the examples were important: weather, contact lenses, irises, labor
contracts, soybean classification. What was important for each of them. - Also, the types of concepts: classification, clustering, associations
- ethics: why do we need to think about it?
Additional resources
I have moved them to Docs
Concepts, Instances, Attributes
- a conceptis what the system is learning, e.g. classification, clustering, …
- an instanceis an irreducible collection of data that is input to the learning system, i.e. an example of a concept
- an attribute is a property/component of an instance
Relations
- What is the difference between a function and a relation? Can a table be both?
What is the closed world assumption? What is a predicate? - The book describes the sister-of relation. What input would you give the learning
algorithm to train it? - Look at Figure 2.1, does the table have enough information to define the relation?
- What about combined with table 2.4 on P. 45?
- Database relations are optimized for space and time, but not necessarily for learning algorithms.
- Suppose you had two tables (relations in the database sense), how would you combine them?
- What is denormalization? What is a multi-instance problem?
- What do you do when the relation is not finite, e.g. ancestor-of. This needs to
be defined recursively.
Attributes and Data
- What kinds of data are there in Weka?
- What is the difference between string and {yes, no}
- What is another name for categorical?
- Look at P 55 for an example of an ARFF file.
- Why data needs to be cleaned/preprocessed: missing values, inconsistent values
- Summarizing data: mean, standard deviation, min, max, quartiles
- attribute subset selection: finding a minimum set of attributes that adequately describes the concept.
Functions
- rules, P. 6, there are 24 rows, how many possible functions are there from a
set of 24 instances to a set of 3 outcomes? - In general, the data is noisy
- decision list vs decision tree. P. 13
- generalization