Welcome to the UC Irvine Machine Learning Repository
We currently maintain 624 datasets as a service to the machine learning community. Here, you can donate and find datasets used by millions of people all around the world!
Popular Datasets
Iris
A small classic dataset from Fisher, 1936. One of the earliest datasets used for evaluation of classification methodologies.
Heart Disease
4 databases: Cleveland, Hungary, Switzerland, and the VA Long Beach
Dry Bean Dataset
Images of 13,611 grains of 7 different registered dry beans were taken with a high-resolution camera. A total of 16 features; 12 dimensions and 4 shape forms, were obtained from the grains.
Adult
Predict whether income exceeds $50K/yr based on census data. Also known as "Census Income" dataset.
Diabetes
This diabetes dataset is from AIM '94
Rice (Cammeo and Osmancik)
A total of 3810 rice grain's images were taken for the two species, processed and feature inferences were made. 7 morphological features were obtained for each grain of rice.
New Datasets
MetroPT-3 Dataset
From a metro train in an operational context, readings from pressure, temperature, motor current, and air intake valves were collected from a compressor's Air Production Unit (APU). This dataset reveals real predictive maintenance challenges encountered in the industry. It can be used for failure predictions, anomaly explanations, and other tasks.
HAR70+
The Human Activity Recognition 70+ (HAR70+) dataset is a professionally-annotated dataset containing 18 fit-to-frail older-adult subjects (70-95 years old) wearing two 3-axial accelerometers for around 40 minutes during a semi-structured free-living protocol. The sensors were attached to the right thigh and lower back.
HARTH
The Human Activity Recognition Trondheim (HARTH) dataset is a professionally-annotated dataset containing 22 subjects wearing two 3-axial accelerometers for around 2 hours in a free-living setting. The sensors were attached to the right thigh and lower back. The professional recordings and annotations provide a promising benchmark dataset for researchers to develop innovative machine learning approaches for precise HAR in free living.
DeFungi
DeFungi is a dataset for direct mycological examination of microscopic fungi images. The images are from superficial fungal infections caused by yeasts, moulds, or dermatophyte fungi. The images have been manually labelled into five classes and curated with a subject matter expert assistance. The images have been cropped with automated algorithms to produce the final dataset.
NASA Flood Extent Detection
This dataset contains synthetic aperture radar (SAR) raster imagery for various flood events acquired from the European Space Agencys Sentinel-1A and Sentinel-1B missions, providing C-Band dual-polarized imagery that spans geographical areas of interest in the United States and Bangladesh. The main emphasis was on the labeling of open water areas where specular reflection of the radar signal off of the relatively still, flat open water surface results in reduced backscatter, low amplitude, and an overall darkened appearance within the image. The labels for the water surface reflectance are also provided in GeoTiff rasterized file format in scenes aligned with the SAR source raster imagery.
Turkish User Review Dataset
This dataset contains Turkish comments made by customers on products (computer, tea machine, head phones, modem, parfume, mobile phone, TV, usb)sold on a website. This dataset created by Asst. Prof. Dr. Ekin Ekinci and Prof. Sevinç İlhan Omurca. Please refer to the study "An alternative word embedding approach for knowledge representation in online consumers’ reviews" when using this dataset.