main main
KEEL-dataset - data set description

This section describes main characteristics of the wdbc data set and its attributes:

General information

Breast Cancer Wisconsin (Diagnostic) data set
TypeClassificationOriginReal world
Features 30(Real / Integer / Nominal)(30 / 0 / 0)
Instances569 Classes2
Missing values?No

Attribute description

Radius1[6.981, 28.11]Radius2[0.112, 2.873]Radius3[7.93, 36.04]
Texture1[9.71, 39.28]Texture2[0.36, 4.885]Texture3[12.02, 49.54]
Perimeter1[43.79, 188.5]Perimeter2[0.757, 21.98]Perimeter3[50.41, 251.2]
Area1[143.5, 2501.0]Area2[6.802, 542.2]Area3[185.2, 4254.0]
Smoothness1[0.053, 0.163]Smoothness2[0.0020, 0.031]Smoothness3[0.071, 0.223]
Compactness1[0.019, 0.345]Compactness2[0.0020, 0.135]Compactness3[0.027, 1.058]
Concavity1[0.0, 0.427]Concavity2[0.0, 0.396]Concavity3[0.0, 1.252]
Concave_points1[0.0, 0.201]Concave_points2[0.0, 0.053]Concave_points3[0.0, 0.291]
Symmetry1[0.106, 0.304]Symmetry2[0.0080, 0.079]Symmetry3[0.156, 0.664]
Fractal_dimension1[0.05, 0.097]Fractal_dimension2[0.0010, 0.03]Fractal_dimension3[0.055, 0.208]
Class{M, B}

Additional information

This database contains 30 features computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image.

The task is to determine if a found tumor is benign or malignant (M = malignant, B = benign).

The ten real-valued features that are computed for each of three different cell nucleus are the following:

a) radius: mean of distances from center to points on the perimeter.
b) texture: standard deviation of gray-scale values.
c) perimeter.
d) area.
e) smoothness: local variation in radius lengths.
f) compactness: perimeter^2 / area - 1.0
g) concavity: severity of concave portions of the contour.
h) concave points: number of concave portions of the contour.
i) symmetry.
j) fractal dimension: "coastline approximation" - 1

In this section you can download some files related to the wdbc data set:

  • The complete data set already formatted in KEEL format can be downloaded from herezip.gif.
  • A copy of the data set already partitioned by means of a 10-folds cross validation procedure can be downloaded from herezip.gif.
  • A copy of the data set already partitioned by means of a 5-folds cross validation procedure can be downloaded from herezip.gif.
  • The header file associated to this data set can be downloaded from heretxt.png.
  • This is not a native data set from the KEEL project. It has been obtained from the UCI Machine Learning Repository. The original page where the data set can be found is:

 Copyright 2004-2018, KEEL (Knowledge Extraction based on Evolutionary Learning)
About the Webmaster Team
Valid XHTML 1.1   Valid CSS!