نوع المستند : مقالات علمیة محکمة
المؤلفون
1 Dept. of Computer Science Faculty of Specific Education Mansoura, Egypt
2 Dept of Home Economy Faculty of Specific Education Mansoura, Egypt
المستخلص
الموضوعات الرئيسية
1. Introduction
Bacteria are unicellular microscopic organisms which can only be seen through microscope. Bacteria exist in different sizes and shapes and they measure in micro-meter (which is a millionth part of a meter). Bacteria are found everywhere and in all types of environments. There are numerous types of bacteria in the world. Bacteria are mainly classified based on their shapes, biochemistry and staining methods [1].
Bacteria are the microorganisms which have both the positive and negative impacts for human. They are beneficial for human because they are used in a number of industries like for producing dairy products such yogurt, cheese etc. They are also used in leather industry for making leather. Bacteria's are also economical beneficial organisms because Nitrogen fixing bacteria's, increases the fertility of the soil by processing nitrogen in the soil. Bacteria are also important for human health because they are present in our body and are producing a vitamin type in our body. Moreover, bacteria are also used in manufacturing medicines like anti-bacterial medicines. Along with the beneficial importance of bacteria, they also have some harmful effect on the human body.
Figure 1 shows some types of harmful bacteria.
The direct approach to examine the microbe's world from its own perspective is microscopy, which is one of the most important techniques in microbial ecology. The value of quantitative microscopy in microbial ecological studies can be increased even further when used in conjunction with computer-assisted image analysis [2].
Image processing and computer modeling are important tools in most medical imaging domains, and have more recently started to attract the attention of the biological community and to take a growing role in biological imaging applications.
Todate, many of the biological and microbiological data analysis entail a substantial amount of human intervention. Manual procedures are based on subjective human interpretation, are prone to large variability between the human experts, are time consuming and are of great cost [3].
Digital image processing and pattern recognition techniques are used in conjunction with microscopy for quantitative studies of microbial ecology. These techniques provide an important quantitative tool to analyze the structures and spatial features of complex microbial communities [4].
One of the most important and yet most tedious tasks performed during microscopical analysis of microbial communities is the classification of observed cells into known morphological categories and recognition of new categories as well if new distinct characteristics are captured [5].
Content Based Image Retrieval (CBIR) is a technology that in principle helps to organize digital image archives by their visual content. By this definition, anything ranging from an image similarity function to a robust image annotation engine falls under the purview of CBIR [6].
In CBIR systems, images automatically indexed by summarizing their visual features. A feature is a characteristic that can capture a certain visual property of an image either globally for the entire image or locally for regions or objects. Color, texture and shape are commonly used features in CBIR systems [7].
Feature extraction is the basis of content-based image retrieval. It is the process of extracting features from the image such as color, shape and texture. It computes a numerical or alphabetical representation of some attribute of digital images [8].
The main goal of CBIR is efficiency during image indexing and retrieval, thereby reducing the need for human intervention in the indexing process [6]. The computer must be able to retrieve images from a database without any human assumption on specific domain [9].
The process of CBIR consists of three stages [10]:
(1) Image acquisition
(2) Feature Extraction
(3) Similarity Matching
Figure 2 shows Architecture of CBIR system [11] .
Fig.2 Architecture of CBIR system
For the given image database, features are extracted first from individual images. The features can be visual features like color, texture, shape, region or spatial features or some compressed domain features. The extracted features are described by feature vectors. These feature vectors are then stored to form image feature database. For a given query image, we similarly extract its features and form a feature vector. This feature vector is matched with the already stored vectors in image feature database. Sometimes dimensionality reduction techniques are employed to reduce the computations. The distance between the feature vector of the query image and those of the images in the database is then calculated. The distance of a query image with itself is zero if it is in database. The distances are then stored in increasing order and retrieval is performed with the help of indexing scheme .
2.Survey of Content Based Image Retrieval
In CBIR systems, a feature is a characteristic that can capture a certain visual property of an image either globally for the entire image or locally for regions or objects . The low level features commonly used in CBIR are color, texture, shape and edge.
2.1 Color Features
Color features are extracted using color moments, color histogram, color coherence vector, invariant color histogram, and dominant color. color moments and color coherence vector are explained in the following section.
2.1.1 Color Moments
There are four color moments used for color feature extraction. These moments are: the mean, the standard deviation, the skewness and the kurtosis [12] .
The first color moment (Ei) can be calculated by using the following formula [13]:
(1)
Where:
N = number of pixels in the image
= value of the jth pixel of the image at the ith color channel.
The second color moment () can be calculated by using the following formula;
(2)
The third color moment is the skewness (Si). It can be calculated by using the following formula;
(3)
The fourth color moment is the kurtosis (Ki). It can be calculated by using the following formula;
(4)
2.1.2 Color Coherence Vector
Color's coherence is defined as the degree to which pixels of that color are members of large similarly-colored regions. The significant regions are importance in characterizing images. Colored pixels are either coherent or incoherent. Coherent pixels are part of some sizable contiguous region, while incoherent pixels are not. A color coherence vector(CCV) represents this classification for each color in the image [14].
A pixel is coherent if the size of its connected component exceeds a fixed value τ; otherwise, the pixel is incoherent. The color coherence pair's vector for the image consists of [15] :
Where; αn, βn are the number of coherent pixels of the nth discrete color and the number of incoherent pixels respectively.
2.2 Texture Features
Texture is another important property of images. Various texture representations have been investigated in pattern recognition and computer vision [16]. Texture features are extracted using Gray Level Co-occurrence matrix (GLCM), Gabor Transform and Tamura Features [17]. Gray Level Co-occurrence matrix (GLCM) is explained in the following section.
2.2.1 Gray Level Co-occurrence matrix
The GLCM is created from a gray-scale image. It finds how often a pixel with a gray-level value i occurs either horizontally, vertically, or diagonally to adjacent pixels with the value j [18]. It is given by the relative frequency of the occurrences of two gray-level pixels i & j, separated by d pixels in the θ orientation and θ is the direction. The ʻdʼ can take values 1, 2, 3, etc., and θ can take values 0° (horizontal), 90° (vertical), 45° and 135° (diagonal) [19].
The GLCM is used for texture feature extraction. These features are: the energy, contrast, correlation and the homogeneity.
The first GLCM texture feature can be calculated by using the following formula[20] ;
(5)
Where:
i , j are a single pixel.
p(i, j) is the probability.
The second GLCM texture feature can be calculated using the following formula;
(6)
The third GLCM texture feature is the correlation. It can be calculated using the following formula;
(7)
Where μi represents the horizontal mean, μj represents the vertical mean in the matrix, σi and σj represent dispersion around the mean of combinations of target and neighbor pixel.
The fourth GLCM texture feature is the homogeneity. It can be calculated using the following formula;
(8)
Pattern discrimination
There are many methods that can be used for pattern classification such as Weighted Euclidean distance measure [21] , Correlation coefficient method, Logarithmic magnitude distance measure , Minimum mean distance rule , Artificial Neural Networks (ANN) [22], decision tree [23] and K -means [24] . In this paper the Weight Eculdion Distance (WED) is used.
(9)
Where:
to balance the variations in the dynamic range.
the weight added to the component.
is the matched image index.
(10)
N = the number of images in databases.
(11)
3.Performance Evaluation
Evaluation of retrieval performance is a crucial problem in content-based image retrieval (CBIR). Many different methods for measuring the performance of a system have been created and used by researchers. The most common evaluation measures used in CBIR are precision and recall which are defined
as [11]:
Precision =
Recall =
A single measure that trades off precision versus recall is the F-measure which is the weighted harmonic mean of precision and recall [25]:
F- Measure =
4. A Comparative Study
A comparison among feature extraction methods (Gray Level Co-occurrence Matrix, Color Moments and Color Coherent Vector) for content based bacterial image classification is presented.
Each of the feature extraction technique has their own strong and weak points.
The extracted features by using previous methods are described by feature vectors.
These feature vectors are then stored to form image feature database. Thus, three image feature databases have been formed , one to each feature extraction method . For a given query image, the previous methods have been applied and similarly extract its features and form a feature vector. This feature vector is matched with the already stored vectors in image feature database. Recall , f-measure and precision measures have been calculated to each method for a query image to know the best method of them. Run time and CPU time have been calculated to each method also.
5.Experimental Results
An image database is used for bacteria classification [26]. It includes 150 images . It is consisted of 3 classes of bacteria namely , Bacilli, Cocci and Spiral . Matlab program is developed and used for image retrieval.
Table 1 shows Run time and CPU time for feature extraction methods
F. Extraction Methods |
Time |
|||
Learn Time (sec) |
Test Time (sec) |
|||
CPU |
Run |
CPU |
Run |
|
GLCM |
53. 9 |
53.5 |
250.9 |
2.1 |
CCV |
5.9e+003 |
5913 |
20.4 |
20.6 |
Color Moments |
28.4 |
20.0 |
290.2 |
1.0 |
When tracing CPU time and Run time results, one can find that color moments results are better than Gray Level Co-occurrence Matrix and Color Coherent Vector .
Figure 3 shows bacilli Bacterial cell as a query image.
Figure 4 shows the resulted images due to using GLCM feature extraction method for content based bacterial image retrieval.
Figure 5 shows the resulted images due to using CCV feature extraction method for content based bacterial image retrieval.
Figure 6 shows the resulted images due to using Color Moments feature extraction method for content based bacterial image retrieval.
Figure 7 shows recall curve for Color Coherent Vector feature extraction method.
Figure 8 shows recall curve for GLCM feature extraction method.
Figure 9 shows recall curve for Color Moments feature extraction method.
From the previous figures one conclude that Color Moments results are better than GLCM and Color Coherent Vector methods.
Figure 10 shows Precision curve for GLCM feature extraction method.
Figure 11 shows Precision curve for Color Coherent Vector feature extraction method.
Figure 12 shows Precision curve for Color Moments feature extraction method.
When tracing Precision curve results, one can find that Color Moments results are better than GLCM and Color Coherent Vector .
Figure 13 shows f-measure curve for GLCM feature extraction method.
Figure 14 shows f-measure curve for Color Coherent Vector feature extraction method.
Figure 15 shows f-measure curve for Color Moments feature extraction method.
When tracing f-measure curve results, one can find that Color Moments results are better than GLCM and Color Coherent Vector .
6.Conclusion
Feature extraction techniques play an important role in content based bacterial cells classification.
In this paper a comparative study of different feature extraction techniques is presented. The selected GLCM, CCV and Color Moments techniques are used for performance evaluation. Based on the experimental result it was concluded that Color Moments technique consumes least CPU time and least Run time , also better than other feature extraction methods in recall, precision and f-measure performances. This can lead to better classification for bacterial cells and can be used for multiple purposes such diagnosis of bacterial diseases, help researchers in the identification of the type of bacteria and their relation to the corruption of food. Also it can help students in the Department of Microbiology to understand the subject of bacteria classification. The identification of bacterial cell growth and the time of cell division have a great benefit in sensitivity tests against a particular drug when designing effective medication to control bacterial disease.