UC Irvine
ML Repository
Theme

US Census Data (1990)

Download(160.8 MB)
Thumbnail

About

The USCensus1990raw data set contains a one percent sample of the Public Use Microdata Samples (PUMS) person records drawn from the full 1990 census sample. The data was collected as part of the 1990 census. There are 68 categorical attributes. This data set was derived from the USCensus1990raw data set. The attributes are listed in the file USCensus1990.attributes.txt (repeated below) and the coding for the values is described below. Many of the less useful attributes in the original data set have been dropped, the few continuous variables have been discretized and the few discrete variables that have a large number of possible values have been collapsed to have fewer possible values. More specifically the USCensus1990 data set was obtained from the USCensus1990raw data set by the following sequence of operations; - Randomization: The order of the cases in the original USCensus1990raw data set were randomly permuted. - Selection of attributes: The 68 attributes included in the data set are given below. In the USCensus1990 data set we have added a single letter prefix to the original name. We add the letter 'i' to indicate that the original attribute values are used and 'd' to indicate that original attribute values for each case have been mapped to new values (the precise mapping is described below). Hierarchies of values are provided in the file USCensus1990raw.coding.htm and the mapping functions used to transform the USCensus1990raw to the USCensus1990 data sets are giving in the file USCensus1990.mapping.sql. The data is contained in a file called USCensus1990.data.txt. The first row contains the list of attributes. The first attribute is a caseid and should be ignored during analysis. The data is comma delimited with one case per row.
Subject Area
Social Science
Instances
2,458,285
Features
68
Data Types
Multivariate
Tasks
Clustering
Feature Types
Categorical

Features

NameRoleTypeUnitsMissing Values

Introductory Paper

–

Additional Metadata

Keywords
Authors
Chris Meek
Bo Thiesson
David Heckerman
Year Created
2001
License
CC BY 4.0