Readings for the quiz
- in Witten: Chapters 1 and 2 and section 11.3
- The types of concepts: classification, clustering, associations,
- types of data: nominal, numeric, ordinal
- examples: sister-of, contact lenses, weather, irises, soybean classification, etc
- ethics: why do we need to think about it?
Additional resources
- Computer Vision by Shapiro and Stockman
- Image processing by Steve Tanimoto (probably too elementary)
Attributes and Data
- What kinds of data are there in Weka?
- Why data needs to be cleaned/preprocessed: missing values, inconsistent values
- Summarizing data: mean, standard deviation, min, max, quartiles
- goal: finding a minimum set of attributes that adequately describes the concept.
- What is the syntax for ARFF format?
@relation weather
@attribute outlook {sunny, overcast, rainy}
@attribute temperature numeric
@data - How does ARFF denote string data?
- How does ARFF handle multi-instance data?
- What does the following mean?
@attribute bag relational @attribute outlook {sunny, overcast, rainy} @attribute temperature numeric @end bag @attribute play? {yes, no}
- attribute subset selection: finding a minimum set of attributes that adequately describes the concept.
Filtering data
- A filter can be almost any function that transforms the input
- There are different types of filters: supervised and unsupervised, attribute filters,
instance filters. - Examples:
AddCluster: adds a new nominal attribute which is an ID of each cluster AddNoise: flips some of the values in the input for a nominal attribute NominalToBinary (supervised in the case of numeric values) Randomize(unsupervised, instance) Resample(unsupervised, instance) SpeadSubSample (supervised, instance)