main main
KEEL-dataset - data set description

This section describes main characteristics of the musk2 data set and its attributes:

General information

Musk 2 data set
TypeMulti instanceOriginReal world
Features 167(Real / Integer / Nominal)(166 / 0 / 1)
Instances6598 Classes2
Missing values?No

Attribute description

Molecule_name{MUSK-jf66, ... , MUSK-331}F56[-279.0, 214.0]F112[-266.0, 214.0]
F1[-31.0, 292.0]F57[-170.0, 229.0]F113[-224.0, 194.0]
F2[-199.0, 95.0]F58[-178.0, 113.0]F114[-204.0, 180.0]
F3[-167.0, 81.0]F59[-172.0, 200.0]F115[-250.0, 216.0]
F4[-114.0, 161.0]F60[-250.0, 200.0]F116[-257.0, 253.0]
F5[-118.0, 325.0]F61[-102.0, 254.0]F117[-103.0, 315.0]
F6[-183.0, 200.0]F62[-196.0, 180.0]F118[-212.0, 156.0]
F7[-171.0, 220.0]F63[-100.0, 284.0]F119[-196.0, 209.0]
F8[-225.0, 320.0]F64[-195.0, 225.0]F120[-201.0, 152.0]
F9[-245.0, 147.0]F65[-165.0, 112.0]F121[-121.0, 267.0]
F10[-286.0, 231.0]F66[-9.0, 315.0]F122[-117.0, 258.0]
F11[-328.0, 176.0]F67[-167.0, 234.0]F123[-129.0, 276.0]
F12[-321.0, 184.0]F68[-195.0, 149.0]F124[-127.0, 227.0]
F13[-305.0, 195.0]F69[-134.0, 232.0]F125[-144.0, 299.0]
F14[-342.0, 158.0]F70[-191.0, 158.0]F126[-69.0, 308.0]
F15[-294.0, 172.0]F71[-174.0, 86.0]F127[-286.0, 219.0]
F16[-327.0, 80.0]F72[-152.0, 99.0]F128[-221.0, 241.0]
F17[-224.0, 138.0]F73[-324.0, 181.0]F129[-307.0, 206.0]
F18[-308.0, 189.0]F74[-333.0, 172.0]F130[-189.0, 122.0]
F19[-286.0, 225.0]F75[-274.0, 203.0]F131[-123.0, 281.0]
F20[-252.0, 227.0]F76[-195.0, 21.0]F132[-140.0, 255.0]
F21[-295.0, 194.0]F77[-259.0, 156.0]F133[-319.0, 176.0]
F22[-185.0, 190.0]F78[-313.0, 235.0]F134[-338.0, 169.0]
F23[-253.0, 213.0]F79[-306.0, 193.0]F135[-336.0, 219.0]
F24[-76.0, 317.0]F80[-202.0, 309.0]F136[-196.0, 125.0]
F25[-100.0, 277.0]F81[-255.0, 198.0]F137[-197.0, 186.0]
F26[-242.0, 183.0]F82[-175.0, 201.0]F138[-199.0, 130.0]
F27[-205.0, 164.0]F83[-299.0, 175.0]F139[-243.0, 202.0]
F28[-166.0, 145.0]F84[-98.0, 273.0]F140[-283.0, 203.0]
F29[-142.0, 174.0]F85[-220.0, 193.0]F141[-290.0, 188.0]
F30[-162.0, 266.0]F86[-203.0, 194.0]F142[-185.0, 184.0]
F31[-117.0, 309.0]F87[-207.0, 109.0]F143[-157.0, 239.0]
F32[-143.0, 310.0]F88[-213.0, 172.0]F144[-171.0, 208.0]
F33[-139.0, 207.0]F89[-111.0, 152.0]F145[-179.0, 213.0]
F34[-279.0, 160.0]F90[-157.0, 269.0]F146[-106.0, 261.0]
F35[-160.0, 220.0]F91[-202.0, 235.0]F147[-136.0, 172.0]
F36[-7.0, 324.0]F92[-16.0, 306.0]F148[-200.0, 130.0]
F37[-175.0, 147.0]F93[-125.0, 223.0]F149[-213.0, 117.0]
F38[-190.0, 187.0]F94[-328.0, 184.0]F150[-190.0, 185.0]
F39[-148.0, 107.0]F95[-119.0, 238.0]F151[-140.0, 244.0]
F40[-180.0, 194.0]F96[-69.0, 347.0]F152[-128.0, 153.0]
F41[-188.0, 90.0]F97[-191.0, 165.0]F153[-114.0, 211.0]
F42[-150.0, 367.0]F98[-190.0, 203.0]F154[-173.0, 120.0]
F43[-295.0, 225.0]F99[-157.0, 40.0]F155[-143.0, 379.0]
F44[-343.0, 198.0]F100[-156.0, 237.0]F156[-198.0, 153.0]
F45[-310.0, 147.0]F101[-209.0, 91.0]F157[-257.0, 145.0]
F46[-340.0, 161.0]F102[-33.0, 348.0]F158[-328.0, 94.0]
F47[-159.0, 110.0]F103[-299.0, 173.0]F159[-219.0, 179.0]
F48[-290.0, 179.0]F104[-324.0, 191.0]F160[-136.0, 192.0]
F49[-265.0, 273.0]F105[-319.0, 154.0]F161[-120.0, 411.0]
F50[-279.0, 215.0]F106[-284.0, 212.0]F162[-69.0, 355.0]
F51[-326.0, 172.0]F107[-200.0, 159.0]F163[73.0, 625.0]
F52[-206.0, 177.0]F108[-292.0, 167.0]F164[-289.0, 295.0]
F53[-206.0, 169.0]F109[-249.0, 200.0]F165[-428.0, 168.0]
F54[-147.0, 335.0]F110[-291.0, 141.0]F166[-471.0, 367.0]
F55[-112.0, 269.0]F111[-250.0, 209.0]Class{0, 1}

Additional information

The problem consists of determining whether a drug molecule will bind strongly to a target protein. Each molecule may adopt a wide range of shapes or conformations. A positive molecule has at least one shape that can bind well (although it is not known which one) and a negative molecule means none of its shapes can make the molecule bind well. This problem could be represented in a very natural way in MIL settings: each molecule would be a bag and the conformations it can adopt would be the instances in that bag.

In this section you can download some files related to the musk2 data set:

  • The complete data set already formatted in KEEL format can be downloaded from herezip.gif.
  • A copy of the data set already partitioned by means of a 10-folds cross validation procedure can be downloaded from herezip.gif.
  • The header file associated to this data set can be downloaded from heretxt.png.

 Copyright 2004-2018, KEEL (Knowledge Extraction based on Evolutionary Learning)
About the Webmaster Team
Valid XHTML 1.1   Valid CSS!