UC Irvine
ML Repository
Theme

Toxicity

Download(1.2 MB)
Thumbnail

About

The dataset includes 171 molecules designed for functional domains of a core clock protein, CRY1, responsible for generating circadian rhythm. 56 of the molecules are toxic and the rest are non-toxic. Preprocessing description: The data consists a complete set of 1203 molecular descriptors and needs feature selection before classification since some of the features are redundant. We used Recursive Feature Elimination together with Decision Tree Classifier (DTC) to get the best set of molecular descriptors for DTC. Subsetted data with 13 features is included as supplementary file.
Subject Area
Biology
Instances
171
Features
1,203
Data Types
Tabular
Tasks
Classification
Feature Types

Features

NameRoleTypeUnitsMissing Values

Introductory Paper

Structure-based design and classifications of small molecules regulating the circadian rhythm period
Seref Gul, F. Rahim, Safak Isin, Fatma Yilmaz, Nuri Ozturk, M. Turkay, I. Kavakli. 2021.
Scientific reports

Additional Metadata

Authors
Şeref Gül
FATIH RAHIM
Year Created
2021
License
CC BY 4.0