Missing Values in Data Mining - Bibliography

Bibliography on Missing Values in Data Mining sorted by years:


1997 (1 paper)

  • S.M. Chen, M.S. Yeh. Generating fuzzy rules from relational database systems for estimating null values. Cybernetics and Systems 28:8 (1997) 695-723 doi:10.1080/019697297125912

1998 (1 paper)

  • M.R. Berthold, K.P. Huber. Missing Values and Learning of Fuzzy Rules. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 6:2 (1998) 171-178 doi:10.1142/S021848859800015X

1999 (1 paper)

  • M. Kryszkiewicz. Rules in incomplete information systems. Information Sciences 113 (1999) 271-292 doi:10.1016/S0020-0255(98)10065-8

2001 (3 papers)

  • C.M. Ennett, M. Frize, C.R. Walker. Influence of missing values on artificial neural network performance. Medinfo 10 (2001) 449-453 doi:10.1175/1520-0442(2001)014<0853:AOICDE>2.0.CO;2
  • T. Schneider. Analysis of incomplete climate data: Estimation of Mean Values and covariance matrices and imputation of Missing values. Journal of Climate 14 (2001) 853-871 doi:10.1175/1520-0442(2001)014<0853:AOICDE>2.0.CO;2
  • O. Troyanskaya, M. Cantor, G. Sherlock, P. Brown, T. Hastie, R. Tibshirani, D. Botstein, R.B. Altman. Missing value estimation methods for DNA microarrays. Bioinformatics 17:6 (2001) 520-525 doi:10.1093/bioinformatics/17.6.520

2002 (4 papers)

  • B. Gabrys. Neuro-fuzzy approach to processing inputs with missing values in pattern recognition problems. International Journal of Approximate Reasoning 30:3 (2002) 149-179 doi:10.1016/S0888-613X(02)00070-1
  • X. Huang, Q. Zhu. A pseudo-nearest-neighbor approach for missing data recovery on Gaussian random data sets. Pattern Recognition Letters 23:13 (2002) 1613-1622 doi:10.1016/S0167-8655(02)00125-3
  • J.L. Schafer, J.W. Graham. Missing data: our view of the state of the art. Psychol Methods 7:2 (2002) 147-177 doi:10.1037/1082-989X.7.2.147
  • J.L. Schafer, R.M. Yucel. Computational strategies for multivariate linear mixed-effects models with missing values. Journal of Computational and Graphical Statistics 11:2 (2002) 437-457 doi:10.1198/106186002760180608

2003 (6 papers)

  • G.E.A.P.A. Batista, M.C. Monard. An analysis of four missing data treatment methods for supervised learning. Applied Artificial Intelligence 17 (2003) 519-533 doi:10.1080/713827181
  • S.M. Chen, C.M. Huang. Generating weighted fuzzy rules from relational database systems for estimating null values using genetic algorithms. IEEE Transactions on Fuzzy Systems 11:4 (2003) 495-506 doi:10.1109/ICECE.2006.355637
  • S.M. Chen, S.W. Lee. A new method to generate fuzzy rules from relational database systems for estimating null values. Cybernetics and Systems 34:1 (2003) 33-57 doi:10.1080/01969720302850
  • S.A. Oba, M.A. Sato, I.C. Takemasa, M.C. Monden, K.I. Matsubara, S.A. Ishii. A Bayesian missing value estimation method for gene expression profile data. Bioinformatics 19:16 (2003) 2088-2096 doi:10.1093/bioinformatics/btg287
  • S.M. Tseng, K.H. Wang, C.I. Lee. A pre-processing method to deal with missing values by integrating clustering and regression techniques. Applied Artificial Intelligence 17:5-6 (2003) 535-544 doi:10.1080/713827170
  • X.A. Zhou, X.B. Wang, E.R. Dougherty. Missing-value estimation using linear and non-linear regression with Bayesian gene selection. Bioinformatics 19:17 (2003) 2302-2307 doi:10.1093/bioinformatics/btg323

2004 (7 papers)

  • O.T. Abdala, M.A. Saeed. Estimation of missing values in clinical laboratory measurements of ICU patients using a weighted K-nearest neighbors algorithm. Computers in Cardiology 31 (2004) 693-696 doi:10.1109/CIC.2004.1443033
  • F.A. Barzi, M.A. Woodward. Imputations of missing values in practice: Results from imputations of serum cholesterol in 28 cohort studies. American Journal of Epidemiology 160:1 (2004) 34-45 doi:10.1093/aje/kwh175
  • T.H. Bo, B. Dysvik, I. Jonassen. LSimpute: accurate estimation of missing values in microarray data with least squares methods. Nucleic acids research 32:3 (2004) 1-8 doi:10.1093/nar/gnh026
  • A. Figueroa, J.B. Borneman, T.A. Jiang. Clustering binary fingerprint vectors with missing values for DNA array data analysis. Journal of Computational Biology 11:5 (2004) 887-901 doi:10.1109/CSB.2003.1227302
  • P.A. Gourraud, E.B. Genin, A.A. Cambon-Thomsen. Handling missing values in population data: Consequences for maximum likelihood estimation of haplotype frequencies. European Journal of Human Genetics 12:10 (2004) 805-812 doi:10.1038/sj.ejhg.5201233
  • K. Honda, H. Ichihashi. Linear fuzzy clustering techniques with missing values and their application to local principal component analysis. IEEE Transactions on Fuzzy Systems 12:2 (2004) 183-193 doi:10.1109/TFUZZ.2004.825073
  • R.A. Little, H.A. An. Robust likelihood-based analysis of multivariate data with missing values. Statistica Sinica 14:3 (2004) 949-968

2005 (8 papers)

  • M. Abdella, T. Marwala. The use of genetic algorithms and neural networks to approximate missing data in database. Computing and Informatics 24:6 (2005) 577-589
  • S.M. Chen, S.W. Lee. Estimating null values in relational database systems based on genetic algorithms. Cybernetics and Systems 36:1 (2005) 85-106 doi:10.1080/01969720590887333
  • S.M. Chen, H.R. Hsiao. A new method to estimate null values in relational database systems based on automatic clustering techniques. Information Sciences 169:1 (2005) 47-69 doi:10.1016/j.ins.2004.02.012
  • H.A. Kim, G.H.B. Golub, H.A. Park. Missing value estimation for DNA microarray gene expression data: Local least squares imputation. Bioinformatics 21:2 (2005) 187-198 doi:10.1093/bioinformatics/bth499
  • S. Konias, I.A. Chouvarda, I.B. Vlahavas, N.A. Maglaveras. A novel approach for incremental uncertainty rule generation from databases with missing values handling: Application to dynamic medical databases. Medical Informatics and the Internet in Medicine 30:3 (2005) 211-225 doi:10.1080/14639230500209336
  • K.A. Pelckmans, J.B. De Brabanter, J.A.K.A. Suykens, B.A. De Moor. Handling missing values in support vector machine classifiers. Neural Networks 18:5-6 (2005) 684-692 doi:10.1016/j.neunet.2005.06.025
  • I.A. Scheel, M.B. Aldrin, I.K.A. Glad, R.A. Sorum, H.C. Lyng, A.B. Frigessi. The influence of missing value imputation on detection of differentially expressed genes from microarray data. Bioinformatics 21:23 (2005) 4272-4279 doi:10.1093/bioinformatics/bti708
  • S. Zhang, Z. Qin, C.X. Ling, S. Sheng. “Missing is Useful”: Missing values in cost-sensitive decision trees. IEEE Transactions on Knowledge and data engineering 17:12 (2005) 1-5 doi:10.1109/TKDE.2005.188

2006 (6 papers)

  • X.B.C. Chai, R.A.D. Pan. Test-cost sensitive classification on data with missing values. IEEE Transactions on Knowledge and Data Engineering 18:5 (2006) 626-637 doi:10.1109/TKDE.2006.84
  • I.A. Fortes, L.B. Mora-Lopez, R.B. Morales, F.B. Triguero. Inductive learning models with missing values. Mathematical and Computer Modelling 44:9-10 (2006) 790-806 doi:10.1016/j.mcm.2006.02.013
  • R.S. Lokupitiya, E.B. Lokupitiya, K.B. Paustian. Comparison of missing value imputation methods for crop yield data. Environmetrics 17:4 (2006) 339-349 doi:10.1002/env.773
  • M.K. Markey, G.D. Tourassi, M. Margolis, D.M. DeLong. Impact of missing data in evaluating artificial neural networks trained on complete data. Computers in Biology and Medicine 36:5 (2006) 516-525 doi:10.1016/j.compbiomed.2005.02.001
  • A. Vellido. Missing data imputation through GTM as a mixture of t-distributions. Neural Networks 19:10 (2006) 1624-1635 doi:10.1016/j.neunet.2005.11.003
  • X. Wang, Z. Jiang, H. Feng. Missing value estimation for DNA microarray gene expression data by Support Vector Regression imputation and orthogonal coding scheme. BMC Bioinformatics 7:32 (2006) 1-10 doi:10.1186/1471-2105-7-32

2007 (18 papers)

  • J. Banasik, J. Crook. Reject inference, augmentation, and sample selection. European Journal of Operational Research 183:3 (2007) 1582-1594 doi:10.1016/j.ejor.2006.06.072
  • L.P. Bras, J.C. Menezes. Improving cluster-based missing value estimation of DNA microarray data. Biomolecular Engineering 24:2 (2007) 273-282 doi:10.1016/j.bioeng.2007.04.003
  • F.A. Dah. Convergence of random k-nearest-neighbour imputation. Computational Statistics & Data Analysis 51:12 (2007) 5913-5917 doi:10.1016/j.csda.2006.11.007
  • M. Di Zio, U. Guarnera, O. Luzi. Imputation through finite Gaussian mixture models. Computational Statistics and Data Analysis 51:11 (2007) 5305-5316 doi:10.1016/j.csda.2006.10.002
  • A. Farhangfar, L.A. Kurgan, W. Pedrycz. A novel framework for imputation of missing values in databases. IEEE Transactions on Systems, Man, and Cybernetics 37:5 (2007) 692-709 doi:10.1109/TSMCA.2007.902631
  • J.W. Graham, A.E. Olchowski, T.D. Gilreath. How many imputations are really needed? Some practical clarifications of multiple imputation theory. Prevention Science 8:3 (2007) 206-213 doi:10.1007/s11121-007-0070-9
  • E.R. Hruschka Jr., E.R. Hruschka, N.F.F. Ebecken. Bayesian networks for imputation in classification problems. Journal of Intelligent Information Systems 29:3 (2007) 231-252 doi:10.1007/s10844-006-0016-x
  • K. Metaxoglou, A. Smith. Maximum likelihood estimation of VARMA models using a state-space em algorithm. Journal of Time Series Analysis 28:5 (2007) 666-685 doi:10.1111/j.1467-9892.2007.00529.x
  • M. Mojirsheibani. Nonparametric curve estimation with missing data: A general empirical process approach. ournal of Statistical Planning and Inference 137:9 (2007) 2733-2758 doi:10.1016/j.jspi.2006.02.016
  • J.D. Parker, N. Schenker. Multiple imputation for national public-use datasets and its possible application for gestational age in United States Natality files. Paediatric and Perinatal Epidemiology 21:2 (2007) 97-105 doi:10.1111/j.1365-3016.2007.00866.x
  • H. Peng, S. Zhu. Handling of incomplete data sets using ICA and SOM in data mining. Neural Computing and Applications 16:2 (2007) 167-172 doi:10.1007/s00521-006-0058-6
  • Y. Qin, S. Zhang, X. Zhu, J. Zhang, C. Zhang. Semi-parametric optimization for missing data imputation. Applied Intelligence 27:1 (2007) 79-88 doi:10.1007/s10489-006-0032-0
  • M. Saar-Tsechansky, F. Provost. Handling missing values when applying classification models. Journal of Machine Learning Research 8 (2007) 1625-1657
  • T.H. Scheike, Y. Sun. Maximum likelihood estimation for tied survival data under Cox regression model via EM-algorithm. Lifetime Data Analysis 13:3 (2007) 399-420 doi:10.1007/s10985-007-9043-3
  • Q. Song, M. Shepperd. A new imputation method for small software project data sets. Journal of Systems and Software 80:1 (2007) 51-62 doi:10.1016/j.jss.2006.05.003
  • D. Williams, X. Liao, Y. Xue, L. Carin, B. Krishnapuram. On Classification with Incomplete Data. IEEE Transactions on Pattern Analysis and Machine Intelligence 29:3 (2007) 427-436 doi:10.1109/TPAMI.2007.52
  • D.S.V. Wong, F.K. Wong, G.R. Wood. A multi-stage approach to clustering and imputation of gene expression profiles. Bioinformatics 23:8 (2007) 998-1005 doi:10.1093/bioinformatics/btm053
  • D. Yoon, E.K. Lee, T. Park. Robust imputation method for missing values in microarray data. BMC bioinformatics 8:2 (2007) 1-7 doi:10.1186/1471-2105-8-S2-S6

2008 (3 papers)

  • G. Corani, M. Zaffalon. Learning Reliable Classifiers From Small or Incomplete Data Sets: The Naive Credal Classifier 2. Journal of Machine Learning Research 9 (2008) 581-621
  • A. Farhangfar, L. Kurgan, J. Dy. Impact of imputation of missing values on classification error for discrete data. Pattern Recognition 41 (2008) 3692-3705 doi:10.1016/j.patcog.2008.05.019
  • Q. Song, M. Shepperd, X. Chen, J. Liu. Can k-NN imputation improve the performance of C4.5 with small software project data sets? A comparative evaluation. Journal of Systems and Software 81:12 (2008) 2361-2370 doi:10.1016/j.jss.2008.05.008

2009 (2 papers)

  • P.J. García-Laencina, J.L. Sancho-Gómez, A.R. Figueiras-Vidal. Pattern classification with missing data: a review. Neural Computation & Applications 9:1 (2009) 1-12 doi:10.1007/s00521-009-0295-6
  • B. Twala. An empirical comparison of techniques for handling incomplete data using decision trees. Applied Artificial Intelligence 23 (2009) 373-405 doi:10.1080/08839510902872223

2010 (9 papers)

  • W-K. Ching, L. Li, N.K. Tsing, C.W. Tai, T.W. Ng, A.S. Wong. A Weighted Local Least Squares Imputation method for missing value estimation in microarray gene expression data. International Journal of Data Mining and Bioinformatics 4:3 (2010) 331-347
  • B. Twala, M. Cartwright. Ensemble missing data techniques for software effort prediction. Intelligent Data Analysis 14:3 (2010) 299-331
  • T.P. Hong, L.H. Tseng, B.C. Chien. Mining from incomplete quantitative data by fuzzy rough sets. Expert Systems With Applications 37:3 (2010) 2644-2653
  • I.A. Gheyas, L.S. Smith. A neural network-based framework for the reconstruction of incomplete data sets. Neurocomputing 73:16-18 (2010) 3039-3065
  • W-K. Ching, L. Li, N.K. Tsing, C.W. Tai, T.W. Ng, A.S. Wong. A Weighted Local Least Squares Imputation method for missing value estimation in microarray gene expression data. International Journal of Data Mining and Bioinformatics 4:3 (2010) 331-347
  • Y. Ding, J.S. Simonoff. An Investigation of Missing Data Methods for Classification Trees Applied to Binary Response Data. Journal of Machine Learning Research 11 (2010) 131-170
  • M. Ghannad-Rezaie, H. Soltanian-Zadeh, H. Ying, M. Dong. Selection-fusion approach for classification of data sets with missing values. Pattern Recognition 43 (2010) 2340-2350 doi:10.1016/j.patcog.2009.12.003
  • J. Luengo, S. García, F. Herrera. A Study on the Use of Imputation Methods for Experimentation with Radial Basis Function Network Classifiers Handling Missing Attribute Values: The good synergy between RBFs and EventCovering method. Neural Networks 23 (2010) 406-418 doi:10.1016/j.neunet.2009.11.014
  • P. Merlin, A. Sorjamaa, B. Maillet, A. Lendasse. X-SOM and L-SOM: A double classification approach for missing value imputation. Neurocomputing 73 (7-9) (2010) 1103-1108

2011 (7 papers)

  • J. Ning, P.E. Cheng. A comparison study of nonparametric imputation methods. Statistics and Computing in press (2011) 1-13
  • Y. Endo, Y. Hasegawa, Y. Hamasuna, Y. Kanzawa. Fuzzy c-means clustering for uncertain data using quadratic penalty-vector regularization. Journal of Advanced Computational Intelligence and Intelligent Informatic 15:1 (2011) 76-82
  • P. Rey-del-Castillo, J. Cardeñosa. Fuzzy min-max neural networks for categorical data: application to missing data imputation. Neural Computing and Applications in press (2011) 1-14
  • E.L. Silva-Ramírez, R. Pino-Mejías, M. López-Coello, M.D. Cubiles-de-la-Vega. Missing value imputation on missing completely at random data using multilayer perceptrons. Neural Networks 24:1 (2011) 121-129
  • Y. Endo, Y. Hasegawa, Y. Hamasuna, Y. Kanzawa. Fuzzy c-means clustering for uncertain data using quadratic penalty-vector regularization. Journal of Advanced Computational Intelligence and Intelligent Informatic 15:1 (2011) 76-82
  • S. Zhang. Shell-neighbor method and its application in missing data imputation. Applied Intelligence 35:1 (2011) 123-133
  • X. Zhu, S. Zhang, Z. Jin, Z. Zhang, Z. Xu. Missing value estimation for mixed-attribute data sets. IEEE Transactions on Knowledge and Data Engineering 23:1 (2011) 110-121