Fingerprint identification in large databases

The web is organized according to the following summary:

  1. Introduction to fingerprint recognition
  2. Preprocessing fingerprints
  3. Fingerprint identification in large databases
  4. Database penetration rate reduction: fingerprint classification

This Website contains SCI2S research material on fingerprint recognition and classification, focusing on very large-scale environments. All information shown here is related to the following SCI2S journal papers and algorithms developed:

Throughout this Website, we have also included the source code for the algorithms associated with the former papers, as well as new approaches that are under development. Readers may find the implementations in the corresponding Github and Spark Packages linksplaced in those sections devoted to describe each framework. Both are marked with the corresponding logo:

Gitthub

Introduction to fingerprint recognition

Personal identification has been an increasingly important issue in a wide variety of fields, such as access control, criminology, forensics or automatic payment. In particular, in the last few years the amount of people that must be identified has been hugely increased: the identification requirements of big companies, law-enforcement departments or public administrations reach the hundreds of millions of individuals. This problem has been dealt with in various manners, some of the most popular of which are passwords and tokens. However, these solutions present some problems: passwords can be forgotten, tokens can be lost, and both can be stolen with relative easiness. Therefore, there has been a great interest in the scientific community to find a mean for identification that is not based on what we know (like passwords) nor what we have (like tokens), but rather on who we are (A. K. Jain, P. Flynn, and A. Ross, Handbook of biometrics, Springer, 2007).

Biometrics provide an answer to that question by using features that are intrinsic to each person for the identification. The biometric features that can be used to identify individuals are diverse, such as face, fingerprints, ear, palmprint, finger veins, DNA, iris, and many others. Among these, fingerprints are the most used ones due to their desirable properties (D. Maltoni, D. Maio, A. Jain, S. Prabhakar, Handbook of Fingerprint Recognition, Springer-Verlag, 2009):

  • Universality: everybody has fingerprints, except in rare cases of severe amputations.
  • Uniqueness: every finger of every person in the planet has a unique fingerprint.
  • Invariability: fingerprints do not change along a person's life.
  • Easiness of use: collecting fingerprint images is fast, cheap and non-invasive, especially with the development of specific electronic devices for this purpose.

A fingerprint is essentially a pattern of ridges and valleys located on a fingertip. These ridges and valleys form different types of patterns that can be used for their recognition. Although fingerprint patterns have been scientifically studied for more than a century, manual comparison is a tedious and time-consuming process.

In this context, automatic fingerprint recognition aims to speed up this process by registering the image of a fingerprint in a computer support, where the matching between fingerprints can be carried out in a systematic and efficient way.

Although fingerprints can be compared directly at the image level, such approaches usually do not yield good results due to the variances between different images of the same fingerprints, such as rotations, translations and deformations of the skin. Image level comparison is also time consuming due to the high number of pixels of the image matrices. Therefore, the first step in the fingerprint recognition process is feature extraction. This process consists of obtaining the relevant information of the image, so as to use it in further steps of the recognition. The various types of features that can be extracted from fingeprints are usually grouped into three levels: global (singular points, orientation maps and pseudoridges), local (minutiae) and detail (pores and intra-ridge features).

Among these, minutiae are by far the most used features for fingerprint recognition. Minutiae are the bifurcations and the endings of the fingerprint ridges. They are easily described by their position and angle, and their number allows for efficient comparison algorithms.

Minutiae-based fingerprint matching algorithms compare two sets of minutiae to determine whether they belong to the same fingerprint or not. The final output of the matching function is a measure of the similarity between the two compared fingerprints, which is usually either a binary truth-or-false value either a real-valued similarity score. Most of the current matching algorithms in the specialized literature start by computing a set of local structures, usually involving minutiae neighborhoods. Then, the local structures are compared with each other and a final consolidation step is applied to obtain the final similarity score or matching decision. Some matching methods involve complex computations and are very accurate, whilst others can compare sets of minutiae very fast, with a slightly lower accuracy.

The fingerprint recognition problem can be addressed from two points of view, each of which represents a different problem on its own

  • Verification consists of comparing two fingerprint captures to determine whether they were taken from the same finger or not. It is a 1:1 comparison problem that typically involves a single application of a matching function.
  • Identification consists of exploring a database of template fingerprints to find the match of an input fingerprint, that is, a 1:n comparison problem for a database of n templates.

This website focuses on the identification problem. The general steps of an Automatic Fingerprint Identification System (AFIS) are depicted in the following figure:

Figure 1: Workflow of an AFIS

First, the fingerprints of all the users that are to be identified are captured in a process called enrollment, to build the template fingerprint database. Then, when a new input fingerprint has to be identified, it is compared with each template fingerprint and the best match is returned.

The main requirements of an AFIS can be synthesized into the following:

  • Accuracy: the error rate of the identification. It should be as low as possible, to avoid both false positives (accepted impostors) and false negatives (rejected genuine users).
  • Efficiency: the time needed to locate a fingerprint in the database. It should be kept as small as possible. Many real applications of fingerprint identification involve real-time constraints, so that a late response of the system is equivalent to a system failure. Very often, this time threshold is in the order of a few seconds.
  • Scalability: the current needs of large-scale identification systems involve the possibility that template databases are likely to grow in all contexts. Therefore, an AFIS must be able to efficiently cope with such increased sizes, for instance by an adequate increase of the hardware resources.
  • Flexibility: the system should be able to cope with any database size, any fingerprint characteristics (such as low-quality images or rolled prints), any performance requirement and any hardware configuration.

It is immediate to deduce that identification is intrinsically more difficult than verification, as it can be approached as a succession of n verification steps. In fact, this is the approach followed by many AFIS: the input fingerprint is compared to each fingerprint in the database, and the identity that yields the highest matching score is returned. Most approaches also apply a certain threshold on the response, to take into account the possibility that the input fingerprint is actually not in the database. The difficulty of identification with respect to verification is twofold:

  • High identification time: a basic identification system takes at least n times longer than the underlying verification algorithm to identify a given fingerprint.
  • Accuracy loss: an identification system not only has to find the single correct match for the input fingerprint among all database templates; it must also ensure that non-matching templates will not be detected as genuine matches. As the number of non-matching templates is at least n-1, the probability of a false matching is usually not negligible.

A review on the literature on fingerprint matching and identification can be found in the following reference:

D. Peralta, M. Galar, I. Triguero, D. Paternain, S. García, E. Barrenechea, J. M. Benítez, H. Bustince, F. Herrera. A Survey on Fingerprint Minutiae-based Local Matching for Verification and Identification: Taxonomy and Experimental Evaluation.Information Sciences 315 (2015) 67-87. doi: 10.1016/j.ins.2015.04.013

 

Preprocessing fingerprints

Preprocessing is one of the key components of any process involving data mining, as it is necessary to obtain all the benefits from the data mining techniques. Data preprocessing typically involves dealing with the aspects that hinder the performance such as noise, missing values, inconsistent and superfluous data. A complete Website with information regarding this topic can be accessed through the Webpage associated to the book S. García, J. Luengo, and F. Herrera, Data Preprocessing in Data Mining. Springer, (2015) clicking on the following figure:

 

In the context of fingerprint identification, preprocessing techniques can be applied to enhance the accuracy and the runtime of the identification itself, for instance to enhance the features extracted from the fingerprints.

Although there are several minutiae extractors proposed in the literature, the problems encountered with the process are common for all of them and can be classified into two types:

  • Missing minutiae that are not detected by the extractor and therefore cannot be used for the matching.
  • Spurious minutiae that are erroneously detected by the extractor, introducing noise into the resulting minutiae set. Most spurious minutiae are detected on the borderline of the fingerprint.

In general, minutiae extractors tend to suffer more from the latter, so as not to omit real minutiae that might be crucial for the comparison. These problems increase the difference between captures of the same fingerprint, whilst they can also cause false similarities between captures of different fingers. They become more acute when dealing with low quality fingerprints or with large-scale identification, in which the huge amount of non-matching templates requires an accurate matching algorithm to avoid identification errors.

We have tackled the problem of removing spurious borderline minutiae after their extraction. For this goal we have applied different strategies to extract a candidate set of spurious minutiae for a fingerprint:

  • Computing the convex hull of the extracted minutiae set; all the minutiae in the convex hull are included into the candidate set.
  • Using an image segmentation-based approach aimed at discerning the background and the foreground of the fingerprint image. The approach includes four steps: normalization, block-wise variance computation, thresholding and refinement. The minutiae that are detected within background areas are included into the candidate set.

Both strategies have been combined with the quality information provided by the minutiae extractor, so that the minutiae selected by the preprocessing that fall under a certain quality threshold are eliminated prior to the recognition itself. As a side objective, the influence of the spurious minutiae on several different matching methods has also been studied.

The proposals and the results obtained are published in the following reference:

D. Peralta, M. Galar, I. Triguero, O. Miguel-Hurtado, J.M. BenítezF. Herrera. Minutiae Filtering to Improve Both Efficacy and Efficiency of Fingerprint Matching Algorithms. Engineering Applications of Artificial Intelligence, 32 (2014) 37-53. doi: 1010.1016/j.engappai.2014.02.016

 

Fingerprint identification in large databases

By definition, the two main difficulties that arise from fingerprint identification (that is, the high identification time and the loss of accuracy) increase along with the number of fingerprints in the database, n. Therefore, a direct brute-force approach for an AFIS that must identify within a database of more than a few thousands of people is not possible. From the identification time point of view, most of these systems have real time constraints; for instance, the identification to access a building should not take much more than one second. From the accuracy point of view, a large value of n implies a large number of non-matching templates in the database, with the consequent increase of the probability of a wrong identification. As current society necessities for identification are reaching the order of hundreds of millions of people, there is a strong need of scalable, accurate solutions that can adequately deal with this problem.

High Performance Computing (HPC) is one of the tools that support the modern Science, as it enables the computation of multiple calculations in a reasonable time by means of massive computational resources (H. Stone, High-Performance Computer Architecture, Addison-Wesley Longman, 1992). The scientific literature thrives with successful applications of HPC systems to real problems, and most big companies and public institutions implement their own HPC infrastructures to process their data. As such, HPC is a very suitable tool for the problem of fingerprint identification in large databases.

We designed an HPC-based two-level parallel framework for fingerprint identification, which divides the computation in several processes and threads so as to extract the highest performance from a computing cluster. The software developed is published in the following reference:

D. Peralta, I. Triguero, R. Sanchez-Reillo, F. HerreraJ.M. Benítez. Fast Fingerprint Identification for Large Databases. Pattern Recognition 47:2 (2014) 588–602. doi: 10.1016/j.patcog.2013.08.002 

 dperaltac/mpi-afis

In this context, one of the hot research topics during the last few years has been big data, which can be defined as the amount of data that cannot be processed within a single machine. Big data poses an interesting challenge in many fields, but also offers an opportunity to extract better and more valuable knowledge. So far, big data techniques have been successfully applied to various problems (A. Fernandez, S. Río, V. López, A. Bawakid, M.J. del Jesus, J.M. BenítezF. Herrera. Big Data with Cloud Computing: An Insight on the Computing Environment, MapReduce and Programming Frameworks. WIREs Data Mining and Knowledge Discovery 4(5): 380-409). Several HPC-based frameworks have been developed to help with this task; two of the most popular ones are Apache Hadoop (which implements the MapReduce paradigm) and Apache Spark. Information fusion is a paradigm used in many disciplines to improve the overall precision of a given process, including biometrics (A. Ross and A.K. Jain, Information fusion in biometrics, Pattern Recognition Letters, 24(13): 2115–2125).

We developed a generic decomposition methodology to adapt virtually any fingerprint matching algorithm to these big data frameworks, so as to take advantage of their main strengths: scalability, massively parallel computing and fail-tolerance. This methodology introduces the concept of partial score, which is a piece of information that describes the similarity between subsets of local structures of the two compared fingerprints. Partial scores can be aggregated to generate new partial scores, and ultimately to calculate the overall similarity score between the two fingerprints.

The developed software is publicly available in the following reference:

 dperaltac/bigdata-fingerprint

 

Biometric problems are intrinsically adapted to information fusion approaches; the fusion of information can be performed at many levels (A. Lumini, L. Nanni, Overview of the combination of biometric matchers. Information Fusion 33 (2017) 71–85): data level, feature level, score level or decision level. In particular, fusion approaches in fingerprint identification are usually grouped into two categories: using several fingerprint images (A. K. Jain, P. Flynn and A. Ross, Handbook of biometrics, Springer, 2007) and using several matching algorithms (A. K. Jain, S. Prabhakar, and S. Chen, Combining multiple matchers for a high security fingerprint verification system, Pattern Recognition Letters 20 (1999) 1371–1379). Information fusion has been steadily used to increase the accuracy of fingerprint and other biometric recognition systems. However, it affects negatively the runtime, as more computation has to be performed to obtain redundant information. In this context, it would be desirable to use information fusion from a different point of view to tackle both the accuracy and the runtime problems.

A new framework for fingerprint identification has been designed: DPD-DFF, which uses two fingerprints and two matching algorithms to accelerate the identification process. First, a fast matching algorithm is used to compare the input fingerprint pair with those of the database, so as to obtain a set of candidate identities. Then, an accurate matcher is applied on the identities of the candidate set, to obtain the most similar match. The proposal is published in the following reference, and the software has been integrated with the aforementioned parallel framework:

D. Peralta, I. Triguero, S. GarcíaF. HerreraJ.M. Benítez. DPD-DFF: A Dual Phase Distributed Scheme with Double Fingerprint Fusion for Fast and Accurate Identification in Large Databases. Information Fusion 32 (2016) 40–51. doi: 10.1016/j.inffus.2016.03.002

 dperaltac/mpi-afis

Database penetration rate reduction: fingerprint classification

One of the most extended ways to improve the identification time of an AFIS is the use of classification techniques. Fingerprints can be divided into five classes according to the visual pattern of their ridges (E. Henry, Classification and Uses of Finger Prints, George Routledge and Sons, 1900). If the class of an input fingerprint is correctly determined, then it is possible to perform the identification by comparing it only with the template fingerprints belonging to the same class (D. Maltoni, D. Maio, A. Jain, S. Prabhakar, Handbook of Fingerprint Recognition, Springer-Verlag, 2009). The number of comparisons with template fingerprints with respect to the database size is called penetration rate and its reduction can be key in the performance of identification system. However, the misclassification of a fingerprint can lead to identification errors; therefore, it is crucial to reduce the classification error as much as possible.

In this context, two reviews on fingerprint classification and feature extraction have been published, along with the associated software:

So far, we have proposed two different approaches to improve the accuracy and the rejection of the classification.

On the one hand, a hierarchical procedure has been designed, which combines several sets of feature extractors. Different combinations of feature extractors are fused by concatenating their output vector, so as to improve the classification accuracy. Then, a feature selection algorithm is applied to reduce the size of the feature vector and to further improve the accuracy. Each combination of feature extractors represents a level of the hierarchy, so that if a fingerprint is rejected by one of the levels, it may be classified by another one. Finally, the classifier outputs an ordering of the five classes according to their probability, so that the database can be incrementally explored and the penetration rate is reduced as much as possible.

On the other hand, we explored the possibilities of deep learning approaches for fingerprint classification. Several deep neural networks were designed and trained over sets of fingerprints of different qualities. The results obtained in the classification improved all those of the traditional combinations of feature extractors and classifiers.