Este conjunto de datos contiene información extraída del censo de EE.UU. de 1994. El objetivo consiste en predecir y explicar qué hace a una persona poder ganar o no más de 50.000$ al año en función de las siguientes 14 variables sociolaborales: age: continuous. workclass: Private, Self-emp-not-inc, Self-emp-inc, Federal-gov, Local-gov, State-gov, Without-pay, Never-worked. fnlwgt: continuous. education: Bachelors, Some-college, 11th, HS-grad, Prof-school, Assoc-acdm, Assoc-voc, 9th, 7th-8th, 12th, Masters, 1st-4th, 10th, Doctorate, 5th-6th, Preschool. education-num: continuous. marital-status: Married-civ-spouse, Divorced, Never-married, Separated, Widowed, Married-spouse-absent, Married-AF-spouse. occupation: Tech-support, Craft-repair, Other-service, Sales, Exec-managerial, Prof-specialty, Handlers-cleaners, Machine-op-inspct, Adm-clerical, Farming-fishing, Transport-moving, Priv-house-serv, Protective-serv, Armed-Forces. relationship: Wife, Own-child, Husband, Not-in-family, Other-relative, Unmarried. race: White, Asian-Pac-Islander, Amer-Indian-Eskimo, Other, Black. sex: Female, Male. capital-gain: continuous. capital-loss: continuous. hours-per-week: continuous. native-country: United-States, Cambodia, England, Puerto-Rico, Canada, Germany, Outlying-US(Guam-USVI-etc), India, Japan, Greece, South, China, Cuba, Iran, Honduras, Philippines, Italy, Poland, Jamaica, Vietnam, Mexico, Portugal, Ireland, France, Dominican-Republic, Laos, Ecuador, Taiwan, Haiti, Columbia, Hungary, Guatemala, Nicaragua, Scotland, Thailand, Yugoslavia, El-Salvador, Trinadad&Tobago, Peru, Hong, Holand-Netherlands. El data set "adult" disponible aquí: https://archive.ics.uci.edu/ml/datasets/Adult que contiene 48.842 instancias se ha separado en entrenamiento y prueba al 80%-20%, quedando todos los casos con valores perdidos (3.620) en el conjunto de entrenamiento. A continuación, se ha aplicado ruido de clase en el conjunto de entrenamiento variando el valor de la clase a un 10% de casos. ----------- RESULTADOS DE C4.5 CON LOS PARÁMETROS POR DEFECTO WEKA 3.7 ENTRENAMIENTO Correctly Classified Instances 31459 80.5113 % Incorrectly Classified Instances 7615 19.4887 % Kappa statistic 0.472 Mean absolute error 0.2931 Root mean squared error 0.3826 Relative absolute error 71.3663 % Root relative squared error 84.4243 % Coverage of cases (0.95 level) 100 % Mean rel. region size (0.95 level) 99.4702 % Total Number of Instances 39074 PRUEBA Evaluation: Correctly Classified Instances 8392 85.9132 % Incorrectly Classified Instances 1376 14.0868 % Kappa statistic 0.5966 Mean absolute error 0.2565 Root mean squared error 0.3299 Relative absolute error 65.1911 % Root relative squared error 76.059 % Coverage of cases (0.95 level) 99.5803 % Mean rel. region size (0.95 level) 99.5035 % Total Number of Instances 9768 KEEL V2.0 TRAIN RESULTS ============ Classifier= adult Summary of data, Classifiers: adult Fold 0 : CORRECT=0.805087782156933 N/C=0.0 Global Classification Error + N/C: 0.194912217843067 stddev Global Classification Error + N/C: 0.0 Correctly classified: 0.805087782156933 Global N/C: 0.0 TEST RESULTS ============ Classifier= adult Fold 0 : CORRECT=0.8592342342342343 N/C=0.0 Global Classification Error + N/C: 0.14076576576576577 stddev Global Classification Error + N/C: 0.0 Correctly classified: 0.8592342342342343 Global N/C: 0.0