UC Irvine
ML Repository
Theme

Ecoli

Download(5.5 KB)
Thumbnail

About

This data contains protein localization sites The references below describe a predecessor to this dataset and its development. They also give results (not cross-validated) for classification by a rule-based expert system with that version of the dataset. Reference: "Expert Sytem for Predicting Protein Localization Sites in Gram-Negative Bacteria", Kenta Nakai & Minoru Kanehisa, PROTEINS: Structure, Function, and Genetics 11:95-110, 1991. Reference: "A Knowledge Base for Predicting Protein Localization Sites in Eukaryotic Cells", Kenta Nakai & Minoru Kanehisa, Genomics 14:897-911, 1992. Variables Info: 1. Sequence Name: Accession number for the SWISS-PROT database 2. mcg: McGeoch's method for signal sequence recognition. 3. gvh: von Heijne's method for signal sequence recognition. 4. lip: von Heijne's Signal Peptidase II consensus sequence score. Binary attribute. 5. chg: Presence of charge on N-terminus of predicted lipoproteins. Binary attribute. 6. aac: score of discriminant analysis of the amino acid content of outer membrane and periplasmic proteins. 7. alm1: score of the ALOM membrane spanning region prediction program. 8. alm2: score of ALOM program after excluding putative cleavable signal regions from the sequence. Class labels: cp (cytoplasm) 143 im (inner membrane without signal sequence) 77 pp (perisplasm) 52 imU (inner membrane, uncleavable signal sequence) 35 om (outer membrane) 20 omL (outer membrane lipoprotein) 5 imL (inner membrane lipoprotein) 2 imS (inner membrane, cleavable signal sequence) 2
Subject Area
Biology
Instances
336
Features
8
Data Types
Multivariate
Tasks
Classification
Feature Types
Continuous

Features

NameRoleTypeUnitsMissing ValuesDescription

Introductory Paper

A Probabilistic Classification System for Predicting the Cellular Localization Sites of Proteins
P. Horton, K. Nakai. 1996.
Intelligent Systems in Molecular Biology

Additional Metadata

Keywords
Authors
Kenta Nakai
Year Created
1996
License
CC BY 4.0