Welcome to the UC Irvine Machine Learning Repository
We currently maintain 623 datasets as a service to the machine learning community. Here, you can donate and find datasets used by millions of people all around the world!
Popular Datasets

Iris
A small classic dataset from Fisher, 1936. One of the earliest datasets used for evaluation of classification methodologies.

Dry Bean Dataset
Images of 13,611 grains of 7 different registered dry beans were taken with a high-resolution camera. A total of 16 features; 12 dimensions and 4 shape forms, were obtained from the grains.

Heart Disease
4 databases: Cleveland, Hungary, Switzerland, and the VA Long Beach

Adult
Predict whether income exceeds $50K/yr based on census data. Also known as "Census Income" dataset.

Diabetes
This diabetes dataset is from AIM '94

Rice (Cammeo and Osmancik)
A total of 3810 rice grain's images were taken for the two species, processed and feature inferences were made. 7 morphological features were obtained for each grain of rice.
New Datasets

DeFungi
DeFungi is a dataset for direct mycological examination of microscopic fungi image. The images are from superficial fungal infections caused by yeasts, moulds, or dermatophyte fungi. The images have been manually labelled into five classes and curated with a subject matter expert assistance. The images have been cropped with automated algorithms to produce the final dataset.

NASA Flood Extent Detection
This dataset contains synthetic aperture radar (SAR) raster imagery for various flood events acquired from the European Space Agencys Sentinel-1A and Sentinel-1B missions, providing C-Band dual-polarized imagery that spans geographical areas of interest in the United States and Bangladesh. The main emphasis was on the labeling of open water areas where specular reflection of the radar signal off of the relatively still, flat open water surface results in reduced backscatter, low amplitude, and an overall darkened appearance within the image. The labels for the water surface reflectance are also provided in GeoTiff rasterized file format in scenes aligned with the SAR source raster imagery.

Land Mines
Detection of mines buried in the ground is very important in terms of safety of life and property. Many different methods have been used in this regard; however, it has not yet been possible to achieve 100% success. Mine detection process consists of sensor design, data analysis and decision algorithm phases. The magnetic anomaly method works according to the principle of measuring the anomalies resulting from the object in the magnetic field that disturbs the structure of it, the magnetic field, and the data obtained at this point are used to determine the conditions such as motion and position. The determination of parameters such as position, depth or direction of motion using magnetic anomaly has been carried out since 1970.

Multivariate Gait Data
Bilateral (left, right) joint angle (ankle, knee, hip) times series data collected from 10 healthy subjects under 3 walking conditions (unbraced, knee braced, ankle braced). For each condition, each subject’s data consists of 10 consecutive gait cycles.

Glioma Grading Clinical and Mutation Features Dataset
Gliomas are the most common primary tumors of the brain. They can be graded as LGG (Lower-Grade Glioma) or GBM (Glioblastoma Multiforme) depending on the histological/imaging criteria. Clinical and molecular/mutation factors are also very crucial for the grading process. Molecular tests are expensive to help accurately diagnose glioma patients. In this dataset, the most frequently mutated 20 genes and 3 clinical features are considered from TCGA-LGG and TCGA-GBM brain glioma projects. The prediction task is to determine whether a patient is LGG or GBM with a given clinical and molecular/mutation features. The main objective is to find the optimal subset of mutation genes and clinical features for the glioma grading process to improve performance and reduce costs.

accelerometer_gyro_mobile_phone_dataset
data collected on 2022, in King Saud University in riyadh for recognizing human activities using mobile phone IMU sensors (Accelerometer, and Gyroscope). these activity is calssified to standing(stop), or walking.