Welcome to the UC Irvine Machine Learning Repository

We currently maintain 623 datasets as a service to the machine learning community. Here, you can donate and find datasets used by millions of people all around the world!

Popular Datasets

Iris

Iris

A small classic dataset from Fisher, 1936. One of the earliest datasets used for evaluation of classification methodologies.

Dry Bean Dataset

Dry Bean Dataset

Images of 13,611 grains of 7 different registered dry beans were taken with a high-resolution camera. A total of 16 features; 12 dimensions and 4 shape forms, were obtained from the grains.

Heart Disease

Heart Disease

4 databases: Cleveland, Hungary, Switzerland, and the VA Long Beach

Adult

Adult

Predict whether income exceeds $50K/yr based on census data. Also known as "Census Income" dataset.

Diabetes

Diabetes

This diabetes dataset is from AIM '94

Rice (Cammeo and Osmancik)

Rice (Cammeo and Osmancik)

A total of 3810 rice grain's images were taken for the two species, processed and feature inferences were made. 7 morphological features were obtained for each grain of rice.

See More Popular Datasets

New Datasets

DeFungi

DeFungi

DeFungi is a dataset for direct mycological examination of microscopic fungi image. The images are from superficial fungal infections caused by yeasts, moulds, or dermatophyte fungi. The images have been manually labelled into five classes and curated with a subject matter expert assistance. The images have been cropped with automated algorithms to produce the final dataset.

NASA Flood Extent Detection

NASA Flood Extent Detection

This dataset contains synthetic aperture radar (SAR) raster imagery for various flood events acquired from the European Space Agencys Sentinel-1A and Sentinel-1B missions, providing C-Band dual-polarized imagery that spans geographical areas of interest in the United States and Bangladesh. The main emphasis was on the labeling of open water areas where specular reflection of the radar signal off of the relatively still, flat open water surface results in reduced backscatter, low amplitude, and an overall darkened appearance within the image. The labels for the water surface reflectance are also provided in GeoTiff rasterized file format in scenes aligned with the SAR source raster imagery.

Land Mines

Land Mines

Detection of mines buried in the ground is very important in terms of safety of life and property. Many different methods have been used in this regard; however, it has not yet been possible to achieve 100% success. Mine detection process consists of sensor design, data analysis and decision algorithm phases. The magnetic anomaly method works according to the principle of measuring the anomalies resulting from the object in the magnetic field that disturbs the structure of it, the magnetic field, and the data obtained at this point are used to determine the conditions such as motion and position. The determination of parameters such as position, depth or direction of motion using magnetic anomaly has been carried out since 1970.

Multivariate Gait Data

Multivariate Gait Data

Bilateral (left, right) joint angle (ankle, knee, hip) times series data collected from 10 healthy subjects under 3 walking conditions (unbraced, knee braced, ankle braced). For each condition, each subject’s data consists of 10 consecutive gait cycles.

Glioma Grading Clinical and Mutation Features Dataset

Glioma Grading Clinical and Mutation Features Dataset

Gliomas are the most common primary tumors of the brain. They can be graded as LGG (Lower-Grade Glioma) or GBM (Glioblastoma Multiforme) depending on the histological/imaging criteria. Clinical and molecular/mutation factors are also very crucial for the grading process. Molecular tests are expensive to help accurately diagnose glioma patients. In this dataset, the most frequently mutated 20 genes and 3 clinical features are considered from TCGA-LGG and TCGA-GBM brain glioma projects. The prediction task is to determine whether a patient is LGG or GBM with a given clinical and molecular/mutation features. The main objective is to find the optimal subset of mutation genes and clinical features for the glioma grading process to improve performance and reduce costs.

accelerometer_gyro_mobile_phone_dataset

accelerometer_gyro_mobile_phone_dataset

data collected on 2022, in King Saud University in riyadh for recognizing human activities using mobile phone IMU sensors (Accelerometer, and Gyroscope). these activity is calssified to standing(stop), or walking.

See More New Datasets

By using the UCI Machine Learning Repository, you acknowledge and accept the cookies and privacy practices used by the UCI Machine Learning Repository.

Learn More