KEEL-dataset - data set description

This section describes main characteristics of the mutagenesis-chains data set and its attributes:

General information

Mutagenesis-Chains data set
TypeMulti instanceOriginReal world
Features 25(Real / Integer / Nominal)(24 / 0 / 1)
Instances5349 Classes2
Missing values?No

Attribute description

Chains-bag-id{5262, ... , 4835}E1=f[0.0, 1.0]E3=c[0.0, 1.0]
Bond1[1.0, 7.0]E1=h[0.0, 1.0]E3=f[0.0, 1.0]
Bond2[1.0, 7.0]E1=i[0.0, 1.0]E3=h[0.0, 1.0]
Charge1[-0.781, 1.002]E1=n[0.0, 1.0]E3=n[0.0, 1.0]
Charge2[-0.781, 1.002]E1=o[0.0, 1.0]E3=o[0.0, 1.0]
Charge3[-0.755, 0.597]E2=c[0.0, 1.0]Q1[1.0, 232.0]
E1=br[0.0, 1.0]E2=n[0.0, 1.0]Q2[10.0, 232.0]
E1=c[0.0, 1.0]E2=o[0.0, 1.0]Q3[1.0, 195.0]
E1=cl[0.0, 1.0]Class{0, 1}

Additional information

The problem consists of predicting the mutagenicity of the molecules, that is, determining whether a molecule is mutagenic or non-mutagenic. The dataset for mutagenesis consists of 188 molecules, of which 125 are mutagenic (active) and 63 are non-mutagenic (inactive). From a MIL perspective different transformations are considered, concretely, mutagenesis-chains represensts all adjacent pairs of bounds of a compound molecule as a bag.

In this section you can download some files related to the mutagenesis-chains data set:

  • The complete data set already formatted in KEEL format can be downloaded from herezip.gif.
  • A copy of the data set already partitioned by means of a 10-folds cross validation procedure can be downloaded from herezip.gif.
  • The header file associated to this data set can be downloaded from heretxt.png.

