UC Irvine
ML Repository
Theme

ISOLET

Download(9.6 MB)

About

Goal: Predict which letter-name was spoken--a simple classification task. This data set was generated as follows. 150 subjects spoke the name of each letter of the alphabet twice. Hence, we have 52 training examples from each speaker. The speakers are grouped into sets of 30 speakers each, and are referred to as isolet1, isolet2, isolet3, isolet4, and isolet5. The data appears in isolet1+2+3+4.data in sequential order, first the speakers from isolet1, then isolet2, and so on. The test set, isolet5, is a separate file. You will note that 3 examples are missing. I believe they were dropped due to difficulties in recording. I believe this is a good domain for a noisy, perceptual task. It is also a very good domain for testing the scaling abilities of algorithms. For example, C4.5 on this domain is slower than backpropagation! I have formatted the data for C4.5 and provided a C4.5-style names file as well.
Subject Area
Computer Science
Instances
7,797
Features
617
Data Types
Multivariate
Tasks
Classification
Feature Types
Continuous

Features

NameRoleTypeUnitsMissing Values

Introductory Paper

–

Additional Metadata

Keywords
–
Authors
Ron Cole
Mark Fanty
Year Created
1991
License
CC BY 4.0