IMS 1: Iris matching using multi-dimensional artificial neural network

Iris recognition is one of the most widely used biometric technique for personal identification. This identification is achieved in this work by using the concept that, the iris patterns are statistically unique and suitable for biometric measurements. In this study, a novel method of recognition of these patterns of an iris is considered by using a multidimensional artificial neural network. The proposed technique has the distinct advantage of using the entire resized iris as an input at once. It is capable of excellent pattern recognition properties as the iris texture is unique for every person used for recognition. The system is trained and tested using two publicly available databases (CASIA and UBIRIS). The proposed approach shows significant promise and potential for improvements, compared with the other conventional matching techniques with regard to time and efficiency of results

 

 

 

 

 

 

IMS2: Privacy Protection of Fingerprint Database

A fingerprint authentication system for the privacy protection of the fingerprint template stored in a database is introduced here. The considered fingerprint data is a binary thinned fingerprint image, which will be embedded with some private user information without causing obvious abnormality in the enrollment phase. In the authentication phase, these hidden user data can be extracted from the stored template for verifying the authenticity of the person who provides the query fingerprint. A novel data hiding scheme is proposed for the thinned fingerprint template. This scheme does not produce any boundary pixel in the thinned fingerprint during data embedding. Thus, the abnormality caused by data hiding is visually imperceptible in the marked-thinned fingerprint. Compared with using existing binary image data hiding techniques, the proposed method causes the least abnormality for a thinned fingerprint without compromising the performance of the fingerprint identification

IMS 3: Smart card with iris recognition for high security access environment

Smart cards are increasingly being used as a form of identification and authentication. One inherent problem with smart cards, however, is the possibility of loss or theft. Current options for securing smart cards against unauthorized use are primarily restricted to passwords. Passwords are easy enough for others to steal so that they do not offer sufficient protection. This has promoted interest in biometric identification methods, including iris recognition. The iris is, due to its unique biological properties, exceptionally suited for identification. It is protected from the environment, stable over time, unique in shape and contains a high amount of discriminating information. This paper proposes a method to integrate iris recognition with the smart card to develop a high security access environment. An iris recognition system and smart card programming circuit with its software have been designed. Template on card (TOC) category has been employed. Hence, the extracted iris features stored in smart card are compared against the data acquired from a camera or database for authentication. The proposed algorithm has superior performance in terms of security, accuracy and consistency compared with other published technology.

IMS 4: Digital signature with localization for image authentication

 

This paper proposes a method of extracting a digital signature that can localize tampered areas. The method of generating the digital signature of an image is based on the regularity properties of wavelet transform coefficients

Objecive: Person identification as a security means has a variety of important applications. Many techniques and automated systems have been developed over the past few decades; each has its own advantages and limitations. There are often trade-offs amongst reliability, the ease of use,ethical/human rights issues, and acceptability in a particular application. Multimodal identification and authentication can, to some extent, alleviate the dilemmas and improve the overall performance. This paper proposes a new method of the combined use of signatures and utterances of pronounced names to identify or authenticate persons. Unlike typical signature verification methods, the dynamic features of signatures are captured as sound.The multimodal approach shows increased reliability, providing a relatively simple and potentially useful method for person identification and authentication.

 

Video Processing

IMS 5: An Advanced Motion Detection Algorithm With Video Quality Analysis for Video Surveillance Systems

Motion detection is the first essential process in the extraction of information regarding moving objects and makes use of stabilization in functional areas, such as tracking, classification, recognition, and so on. In this paper, we propose a novel and accurate approach to motion detection for the automatic video surveillance system. Our method achieves complete detection of moving objects by involving three significant proposed modules: a background modeling (BM) module, an alarm trigger (AT) module, and an object extraction (OE) module. For our proposed BM module, a unique two-phase background matching procedure is performed using rapid matching followed by accurate matching in order to produce optimum background pixels for the background model. Next, our proposed AT module eliminates the unnecessary examination of the entire background region, allowing the subsequent OE module to only process blocks containing moving objects. Finally, the OE module forms the binary object detection mask in order to achieve highly complete detection of moving objects. The detection results produced by our proposed (PRO) method were both qualitatively and quantitatively analyzed through visual inspection and for accuracy, along with comparisons to the results produced by other state-of-the-art methods. The analyses show that our PRO method has a substantially higher degree of efficacy, outperforming other methods by an metric accuracy rate of up to 53.43%.

 

 

IMS 6: Face Region Based Conversational Video Coding

Face regions are visual focuses in conversational video communications, thus better reconstruction quality of the regions of interest (ROI) is highly desired or necessary in the bandwidth-constrained conversational video coding. In this paper, we introduce an efficient motion based face detection method to identify face blocks in the first step, which can reduce computational complexity substantially without any loss in face detection results. Then an active contour model is applied to find face contours for more refined and compact face regions. Based on the well-located and compact face regions, facial feature priority based bit allocation is proposed for face ROI based conversational video coding. Experimental results demonstrate that the proposed face region based coding can considerably improve the coding results in the face regions, compared with two other relevant video coding schemes, in terms of objective rate-distortion performance as well as subjective visual quality

 

 

 

 

 

IMS 7: Motion and feature based person tracking in surveillance videos

 

 

This work describes a method for accurately tracking persons in indoor surveillance video stream obtained from a static camera with difficult scene properties including illumination changes and solves the major occlusion problem. First, moving objects are precisely extracted by determining its motion, for further processing. The scene illumination changes are averaged to obtain the accurate moving object during background subtraction process. In case of objects occlusion, we use the color feature information to accurately distinguish between objects. The method is able to identify moving persons, track them and provide unique tag for the tracked persons. The effectiveness of the proposed method is demonstrated with experiments in an indoor environment.

 

 

IMS 8: Identification and analysis of human pose in video

 

Human figure identification is always a challenging move in field of pattern recognition. This paper presents a complete algorithm to find a single object (human body) and identify the object as human being. The algorithm starts the segmentation process with basic frame difference method and use morphological operators, edge detection, feature point generation and finally spline interpolation to find the human like object. After completion of successful segmentation the algorithm takes a gentle approach to identify the object as human body and detect the pose by matching the templates. This paper describes every single step of single human body detection with perfection and ready for real life use

 

 

 

 

 

IMG1: 3-D Reconstruction of Microtubules From Multi-Angle Total Internal Reflection Fluorescence Microscopy Using Bayesian Framework

Abstract

Total internal reflection fluorescence (TIRF) microscopy excites a thin evanescent field which theoretically decays exponentially. Each TIRF image is actually the projection of a 3-D volume and hence cannot alone produce an accurate localization of structures in the z-dimension, however, it provides greatly improved axial resolution for biological samples. Multiple angle-TIRF microscopy allows controlled variation of the incident angle of the illuminating laser beam, thus generating a set of images of different penetration depths with the potential to reconstruct the 3-D volume of the sample. With the ultimate goal to quantify important biological parameters of microtubules, we present a method to reconstruct 3-D position and orientation of microtubules based on multi-angle TIRF data, as well as experimental calibration of the actual decay function of the evanescent field at each angle. We validate our method using computer simulations, by creating a phantom simulating the curvilinear characteristics of microtubules and project the artificially constructed volume into a set of TIRF image for different penetration depth. The reconstructed depth information for the phantom data is shown to be accurate and robust to noise. We apply our method to microtubule TIRF images of PtK2 cells in vivo. By comparing microtubule curvatures of the reconstruction results and several electron microscopy (EM) images of vertically sliced sample of microtubules, we find that the curvature statistics of our reconstruction agree well with the ground truth (EM data). Quantifying the distribution of microtubule curvature reveals an interesting discovery that microtubules can buckle and form local bendings of considerably small radius of curvature which is also visually spotted on the EM images, while microtubule bendings on a larger scale generally have a much larger radius and cannot bear the stress of a large curvature. The presented method has the potential to provide a - - reliable tool for 3-D reconstruction and tracking of microtubules.

 

IMG2: Two Efficient Label-Equivalence-Based Connected-Component Labeling Algorithms for 3-D Binary Images

Abstract

Whenever one wants to distinguish, recognize, and/or measure objects (connected components) in binary images, labeling is required. This paper presents two efficient label-equivalence-based connected-component labeling algorithms for 3-D binary images. One is voxel based and the other is run based. For the voxel-based one, we present an efficient method of deciding the order for checking voxels in the mask. For the run-based one, instead of assigning each foreground voxel, we assign each run a provisional label. Moreover, we use run data to label foreground voxels without scanning any background voxel in the second scan. Experimental results have demonstrated that our voxel-based algorithm is efficient for 3-D binary images with complicated connected components, that our run-based one is efficient for those with simple connected components, and that both are much more efficient than conventional 3-D labeling algorithms.

 

 

 

 

 

 

 

IMG3: Stationary Probability Model for Bitplane Image Coding Through Local Average of Wavelet Coefficients

Abstract

This paper introduces a probability model for symbols emitted by bitplane image coding engines, which is conceived from a precise characterization of the signal produced by a wavelet transform. Main insights behind the proposed model are the estimation of the magnitude of wavelet coefficients as the arithmetic mean of its neighbors' magnitude (the so-called local average), and the assumption that emitted bits are under-complete representations of the underlying signal. The local average-based probability model is introduced in the framework of JPEG2000. While the resulting system is not JPEG2000 compatible, it preserves all features of the standard. Practical benefits of our model are enhanced coding efficiency, more opportunities for parallelism, and improved spatial scalability.

 

 

 

 

 

IMG4: Natural Image Segmentation Based on Tree Equipartition, Bayesian Flooding and Region Merging

Abstract

We propose a general purpose image segmentation framework, which involves feature extraction and classification in feature space, followed by flooding and merging in spatial domain. Region growing is based on the computed local measurements and distances from the distribution of features describing the different classes. Using the properties of the label dependent distances spatial coherence is ensured, since the image features are described globally. The distribution of the features for the different classes are obtained by block-wise unsupervised clustering based on the construction of the minimum spanning tree of the blocks' grid using the Mallows distance and the equipartition of the resulting tree. The final clustering is obtained by using the k-centroids algorithm. With high probability and under topological constraints, connected components of the maximum likelihood classification map are used to compute a map of initially labelled pixels. An efficient flooding algorithm is introduced, namely, Priority Multi-Class Flooding Algorithm (PMCFA), that assign pixels to labels using Bayesian dissimilarity criteria. A new region merging method, which incorporates boundary information, is introduced for obtaining the final segmentation map. Therefore, the merging stage is based on region features and edge localization. Segmentation results on the Berkeley benchmark data set demonstrate the effectiveness of the proposed methods.

 

IMG5: LCD Motion Blur: Modeling, Analysis, and Algorithm

Abstract

Liquid crystal display (LCD) devices are well known for their slow responses due to the physical limitations of liquid crystals. Therefore, fast moving objects in a scene are often perceived as blurred. This effect is known as the LCD motion blur. In order to reduce LCD motion blur, an accurate LCD model and an efficient deblurring algorithm are needed. However, existing LCD motion blur models are insufficient to reflect the limitation of human-eye-tracking system. Also, the spatiotemporal equivalence in LCD motion blur models has not been proven directly in the discrete 2-D spatial domain, although it is widely used. There are three main contributions of this paper: modeling, analysis, and algorithm. First, a comprehensive LCD motion blur model is presented, in which human-eye-tracking limits are taken into consideration. Second, a complete analysis of spatiotemporal equivalence is provided and verified using real video sequences. Third, an LCD motion blur reduction algorithm is proposed. The proposed algorithm solves an l1-norm regularized least-squares minimization problem using a subgradient projection method. Numerical results show that the proposed algorithm gives higher peak SNR, lower temporal error, and lower spatial error than motion-compensated inverse filtering and Lucy-Richardson deconvolution algorithm, which are two state-of-the-art LCD deblurring algorithms.

 

 

IMG6: Illumination Recovery From Image With Cast Shadows Via Sparse Representation

Abstract

In this paper, we propose using sparse representation for recovering the illumination of a scene from a single image with cast shadows, given the geometry of the scene. The images with cast shadows can be quite complex and, therefore, cannot be well approximated by low-dimensional linear subspaces. However, it can be shown that the set of images produced by a Lambertian scene with cast shadows can be efficiently represented by a sparse set of images generated by directional light sources. We first model an image with cast shadows composed of a diffusive part (without cast shadows) and a residual part that captures cast shadows. Then, we express the problem in an 1-regularized least-squares formulation, with nonnegativity constraints (as light has to be non-negative at any point in space). This sparse representation enjoys an effective and fast solution thanks to recent advances in compressive sensing. In experiments on synthetic and real data, our approach performs favorably in comparison with several previously proposed methods.

 

 

 

 

 

IMG7: FSIM: A Feature Similarity Index for Image Quality Assessment

Abstract

Image quality assessment (IQA) aims to use computational models to measure the image quality consistently with subjective evaluations. The well-known structural similarity index brings IQA from pixel- to structure-based stage. In this paper, a novel feature similarity (FSIM) index for full reference IQA is proposed based on the fact that human visual system (HVS) understands an image mainly according to its low-level features. Specifically, the phase congruency (PC), which is a dimensionless measure of the significance of a local structure, is used as the primary feature in FSIM. Considering that PC is contrast invariant while the contrast information does affect HVS' perception of image quality, the image gradient magnitude (GM) is employed as the secondary feature in FSIM. PC and GM play complementary roles in characterizing the image local quality. After obtaining the local quality map, we use PC again as a weighting function to derive a single quality score. Extensive experiments performed on six benchmark IQA databases demonstrate that FSIM can achieve much higher consistency with the subjective evaluations than state-of-the-art IQA metrics.

 

 

 

 

IMG8: Enhanced Adaptive Loop Filter for Motion Compensated Frame

Abstract

We propose an adaptive loop filter to remove the redundancy between current and motion compensated frames so that the residual signal is minimized, thus coding efficiency increases. The loop filter coefficients and offset are optimized for each frame or a set of blocks to minimize the total energy of the residual signal resulting from motion estimation and compensation. The optimized loop filter with offset is applied for the set of blocks where the filtering process gives coding gain based upon rate-distortion cost. The proposed loop filter is used for the motion compensated frame whereas the conventional adaptive interpolation filter (AIF) is applied to the reference frames to interpolate the subpixel values. Another conventional scheme adaptive loop filter (ALF), is used after deblocking filtering to enhance quality of reconstructed frames, not to minimize energy of residual signal. The proposed loop filter can be used in combination with the AIF and ALF. Experimental results show that proposed algorithm provides the averaged bit reduction of 8% compared to conventional H.264/AVC scheme. When the proposed scheme is combined with AIF and ALF, the coding gain increases even further.

 

 

 

IMG9: Composite Model-Based DC Dithering for Suppressing Contour Artifacts in Decompressed Video

Abstract

Because of the outstanding contribution in improving compression efficiency, block-based quantization has been widely accepted in state-of-the-art image/video coding standards. However, false contour artifacts are introduced, which result in reducing the fidelity of the decoded image/video especially in terms of subjective quality. In this paper, a block-based decontouring method is proposed to reduce the false contour artifacts in the decoded image/video by automatically dithering its direct current (DC) value according to a composite model established between gradient smoothness and block-edge smoothness. Feature points on the model with the corresponding criteria in suppressing contour artifacts are compared to show a good consistency between the model and the actual processing effects. Discrete cosine transform (DCT)-based block level contour artifacts detection mechanism ensures the blocks within the texture region are not affected by the DC dithering. Both the implementation method and the algorithm complexity are analyzed to present the feasibility in integrating the proposed method into an existing video decoder on an embedded platform or system-on-chip (SoC). Experimental results demonstrate the effectiveness of the proposed method both in terms of subjective quality and processing complexity in comparison with the previous methods.

 

IMG10: ADART: An Adaptive Algebraic Reconstruction Algorithm for Discrete Tomography

Abstract

In this paper we suggest an algorithm based on the Discrete Algebraic Reconstruction Technique (DART) which is capable of computing high quality reconstructions from substantially fewer projections than required for conventional continuous tomography. Adaptive DART (ADART) goes a step further than DART on the reduction of the number of unknowns of the associated linear system achieving a significant reduction in the pixel error rate of reconstructed objects. The proposed methodology automatically adapts the border definition criterion at each iteration, resulting in a reduction of the number of pixels belonging to the border, and consequently of the number of unknowns in the general algebraic reconstruction linear system to be solved, being this reduction specially important at the final stage of the iterative process. Experimental results show that reconstruction errors are considerably reduced using ADART when compared to original DART, both in clean and noisy environments.

 

 

 

 

 

 

IMG11: $t$ -Tests, $F$ -Tests and Otsu's Methods for Image Thresholding

Abstract

Otsu's binarization method is one of the most popular image-thresholding methods; Student's t -test is one of the most widely-used statistical tests to compare two groups. This paper aims to stress the equivalence between Otsu's binarization method and the search for an optimal threshold that provides the largest absolute Student's t-statistic. It is then naturally demonstrated that the extension of Otsu's binarization method to multi-level thresholding is equivalent to the search for optimal thresholds that provide the largest F -statistic through one-way analysis of variance (ANOVA). Furthermore, general equivalences between some parametric image-thresholding methods and the search for optimal thresholds with the largest likelihood-ratio test statistics are briefly discussed.

 

 

 

 

 

 

 

IMG12: Rate Control Scheme for Consistent Video Quality in Scalable Video Codec

Abstract

Multimedia data delivered to mobile devices over wireless channels or the Internet are complicated by bandwidth fluctuation and the variety of mobile devices. Scalable video coding has been developed as an extension of H.264/AVC to solve this problem. Since scalable video codec provides various scalabilities to adapt the bitstream for the channel conditions and terminal types, scalable codec is one of the useful codecs for wired or wireless multimedia communication systems, such as IPTV and streaming services. In such scalable multimedia communication systems, video quality fluctuation degrades the visual perception significantly. It is important to efficiently use the target bits in order to maintain a consistent video quality or achieve a small distortion variation throughout the whole video sequence. The scheme proposed in this paper provides a useful function to control video quality in applications supporting scalability, whereas conventional schemes have been proposed to control video quality in the H.264 and MPEG-4 systems. The proposed algorithm decides the quantization parameter of the enhancement layer to maintain a consistent video quality throughout the entire sequence. The video quality of the enhancement layer is controlled based on a closed-form formula which utilizes the residual data and quantization error of the base layer. The simulation results show that the proposed algorithm controls the frame quality of the enhancement layer in a simple operation, where the parameter decision algorithm is applied to each frame.

 

IMG13: Learning Adaptive Metric for Robust Visual Tracking

Abstract

Matching the visual appearances of the target over consecutive image frames is the most critical issue in video-based object tracking. Choosing an appropriate distance metric for matching determines its accuracy and robustness, and thus significantly influences the tracking performance. Most existing tracking methods employ fixed pre-specified distance metrics. However, this simple treatment is problematic and limited in practice, because a pre-specified metric does not likely to guarantee the closest match to be the true target of interest. This paper presents a new tracking approach that incorporates adaptive metric learning into the framework of visual object tracking. Collecting a set of supervised training samples on-the-fly in the observed video, this new approach automatically learns the optimal distance metric for more accurate matching. The design of the learned metric ensures that the closest match is very likely to be the true target of interest based on the supervised training. Such a learned metric is discriminative and adaptive. This paper substantializes this new approach in a solid case study of adaptive-metric differential tracking, and obtains a closed-form analytical solution to motion estimation and visual tracking. Moreover, this paper extends the basic linear distance metric learning method to a more powerful nonlinear kernel metric learning method. Extensive experiments validate the effectiveness of the proposed approach, and demonstrate the improved performance of the proposed new tracking method.

 

IMG14: Image Decomposition With Multilabel Context: Algorithms and Applications

Abstract

Most research on image decomposition, e.g., image segmentation and image parsing, has predominantly focused on the low-level visual clues within a single image and neglected the contextual information across images. In this paper, we present a new perspective to image decomposition piloted by the multilabel context associated with each individual image. Observing that the contextual information (i.e., local label representations of the same label are similar while those from different labels are dissimilar) exists across images, we propose to perform image decomposition in a collective way and obtain an optimal representation for each label from a set of multilabeled images. We formulate the problem as an optimization problem which maximizes inter-label difference while minimizing the intra-label difference of the target label representations and propose two ways to solve this problem. Such a contextual image decomposition has a wide variety of applications, among which two exemplary ones-multilabel image annotation and label ranking, are presented and evaluated with different classification techniques. Extensive experiments on two benchmark datasets demonstrate promising results.

 

 

 

 

IMG15: Graph Laplace for Occluded Face Completion and Recognition

Abstract

This paper proposes a spectral-graph-based algorithm for face image repairing, which can improve the recognition performance on occluded faces. The face completion algorithm proposed in this paper includes three main procedures: 1) sparse representation for partially occluded face classification; 2) image-based data mining; and 3) graph Laplace (GL) for face image completion. The novel part of the proposed framework is GL, as named from graphical models and the Laplace equation, and can achieve a high-quality repairing of damaged or occluded faces. The relationship between the GL and the traditional Poisson equation is proven. We apply our face repairing algorithm to produce completed faces, and use face recognition to evaluate the performance of the algorithm. Experimental results verify the effectiveness of the GL method for occluded face completion.

 

IMG16: Fast Transforms for Acoustic Imaging—Part II: Applications

Abstract

In Part I [“Fast Transforms for Acoustic Imaging-Part I: Theory,” IEEE Transactions on Image Processing], we introduced the Kronecker array transform (KAT), a fast transform for imaging with separable arrays. Given a source distribution, the KAT produces the spectral matrix which would be measured by a separable sensor array. In Part II, we establish connections between the KAT, beamforming and 2-D convolutions, and show how these results can be used to accelerate classical and state of the art array imaging algorithms. We also propose using the KAT to accelerate general purpose regularized least-squares solvers. Using this approach, we avoid ill-conditioned deconvolution steps and obtain more accurate reconstructions than previously possible, while maintaining low computational costs. We also show how the KAT performs when imaging near-field source distributions, and illustrate the trade-off between accuracy and computational complexity. Finally, we show that separable designs can deliver accuracy competitive with multi-arm logarithmic spiral geometries, while having the computational advantages of the KAT.

 

IMG17: Denoising-Enhancing Images on Elastic Manifolds

Abstract

The conflicting demands for simultaneous low-pass and high-pass processing, required in image denoising and enhancement, still present an outstanding challenge, although a great deal of progress has been made by means of adaptive diffusion-type algorithms. To further advance such processing methods and algorithms, we introduce a family of second-order (in time) partial differential equations. These equations describe the motion of a thin elastic sheet in a damping environment. They are also derived by a variational approach in the context of image processing. The new operator enables better edge preservation in denoising applications by offering an adaptive lowpass filter, which preserves high-frequency components in the pass-band better than the adaptive diffusion filter, while offering slower error propagation across edges. We explore the action of this powerful operator in the context of image processing and exploit for this purpose the wealth of knowledge accumulated in physics and mathematics about the action and behavior of this operator. The resulting methods are further generalized for color and/or texture image processing, by embedding images in multidimensional manifolds. A specific application of the proposed new approach to superresolution is outlined.

 

IMG18: Affine Legendre Moment Invariants for Image Watermarking Robust to Geometric Distortions

Abstract

Geometric distortions are generally simple and effective attacks for many watermarking methods. They can make detection and extraction of the embedded watermark difficult or even impossible by destroying the synchronization between the watermark reader and the embedded watermark. In this paper, we propose a new watermarking approach which allows watermark detection and extraction under affine transformation attacks. The novelty of our approach stands on a set of affine invariants we derived from Legendre moments. Watermark embedding and detection are directly performed on this set of invariants. We also show how these moments can be exploited for estimating the geometric distortion parameters in order to permit watermark extraction. Experimental results show that the proposed watermarking scheme is robust to a wide range of attacks: geometric distortion, filtering, compression, and additive noise.

 

IMG19: A New Color Filter Array With Optimal Properties for Noiseless and Noisy Color Image Acquisition

Abstract

Digital color cameras acquire color images by means of a sensor on which a color filter array (CFA) is overlaid. The Bayer CFA dominates the consumer market, but there has recently been a renewed interest for the design of CFAs . However, robustness to noise is often neglected in the design, though it is crucial in practice. In this paper, we present a new 2 × 3-periodic CFA which provides, by construction, the optimal tradeoff between robustness to aliasing, chrominance noise and luminance noise. Moreover, a simple and efficient linear demosaicking algorithm is described, which fully exploits the spectral properties of the CFA. Practical experiments confirm the superiority of our design, both in noiseless and noisy scenarios.

 

IMG20: Topology Preserving Warping of 3-D Binary Images According to Continuous One-to-One Mappings

Abstract

The estimation of one-to-one mappings is one of the most intensively studied topics in the research field of nonrigid registration. Although the computation of such mappings can be now accurately and efficiently performed, the solutions for using them in the context of binary image deformation is much less satisfactory. In particular, warping a binary image with such transformations may alter its discrete topological properties if common resampling strategies are considered. In order to deal with this issue, this paper proposes a method for warping such images according to continuous and bijective mappings while preserving their discrete topological properties (i.e., their homotopy type). Results obtained in the context of the atlas-based segmentation of complex anatomical structures highlight the advantages of the proposed approach.

 

IMG21: Robust Spatiotemporal Matching of Electronic Slides to Presentation Videos

Abstract

We describe a robust and efficient method for automatically matching and time-aligning electronic slides to videos of corresponding presentations. Matching electronic slides to videos provides new methods for indexing, searching, and browsing videos in distance-learning applications. However, robust automatic matching is challenging due to varied frame composition, slide distortion, camera movement, low-quality video capture, and arbitrary slides sequence. Our fully automatic approach combines image-based matching of slide to video frames with a temporal model for slide changes and camera events. To address these challenges, we begin by extracting scale-invariant feature-transformation (SIFT) keypoints from both slides and video frames, and matching them subject to a consistent projective transformation (homography) by using random sample consensus (RANSAC). We use the initial set of matches to construct a background model and a binary classifier for separating video frames showing slides from those without. We then introduce a new matching scheme for exploiting less distinctive SIFT keypoints that enables us to tackle more difficult images. Finally, we improve upon the matching based on visual information by using estimated matching probabilities as part of a hidden Markov model (HMM) that integrates temporal information and detected camera operations. Detailed quantitative experiments characterize each part of our approach and demonstrate an average accuracy of over 95% in 13 presentation videos.

 

IMG22: Maintaining Temporal Coherence in Video Retargeting Using Mosaic-Guided Scaling

Abstract

Video retargeting from a full-resolution video to a lower resolution display will inevitably cause information loss. Content-aware video retargeting techniques have been studied to avoid critical visual information loss while resizing a video. Maintaining the spatio-temporal coherence of a retargeted video is very critical on visual quality. Camera motions and object motions, however, usually make it difficult to maintain temporal coherence using existing schemes. In this paper, we propose the use of a panoramic mosaic to guide the scaling of corresponding regions of video frames in a video shot to ensure good temporal coherence. In the proposed method, after aligning video frames in a shot to a panoramic mosaic constructed for the shot, a global scaling map for these frames is derived from the panoramic mosaic. Subsequently, the local scaling maps of individual frames are derived from the global map and is further refined according to spatial coherence constraints. Our experimental results show that the proposed method can effectively maintain temporal coherence so as to achieve good visual quality even a video contains camera motions and object motions.

 

IMG23: Integer Computation of Lossy JPEG2000 Compression

Abstract

In this paper, an integer-based Cohen-Daubechies-Feauvea (CDF) 9/7 wavelet transform as well as an integer quantization method used in a lossy JPEG2000 compression engine is presented. The conjunction of both an integer transform and quantization step allows for a complete integer computation of lossy JPEG2000 compression. The lossy method of compression utilizes the CDF 9/7 wavelet filter, which transforms integer input pixel values into floating-point wavelet coefficients that are then quantized back into integers and finally compressed by the embedded block coding with optimal truncation tier-1 encoder. Integer computation of JPEG2000 allows a reduction in computational complexity of the wavelet transform as well as ease of implementation in embedded systems for higher computational performance. The results of the integer computation show an equivalent rate/distortion curve to the JasPer JPEG2000 compression engine, as well as a 30% reduction in computation time of the wavelet transform and a 56% reduction in computation time of the quantization processing on an average.

 

IMG24: Hybrid No-Reference Natural Image Quality Assessment of Noisy, Blurry, JPEG2000, and JPEG Images

Abstract

In this paper, we propose a new image quality assessment method based on a hybrid of curvelet, wavelet, and cosine transforms called hybrid no-reference (HNR) model. From the properties of natural scene statistics, the peak coordinates of the transformed coefficient histogram of filtered natural images occupy well-defined clusters in peak coordinate space, which makes NR possible. Compared to other methods, HNR has three benefits: 1) It is an NR method applicable to arbitrary images without compromising the prediction accuracy of full-reference methods; 2) as far as we know, it is the only general NR method well suited for four types of filters: noise, blur, JPEG2000, and JPEG compression; and 3) it can classify the filter types of the image and predict filter levels even when the image is results from the application of two different filters. We tested HNR on very intensive video image database (our image library) and Laboratory for Image & Video Engineering (a public library). Results are compared to the state-of-the-art methods including peak SNR, structural similarity, visual information fidelity, and so on.

 

 

 

 

IMG26: Fast Transforms for Acoustic Imaging— Part I: Theory

Abstract

The classical approach for acoustic imaging consists of beamforming, and produces the source distribution of interest convolved with the array point spread function. This convolution smears the image of interest, significantly reducing its effective resolution. Deconvolution methods have been proposed to enhance acoustic images and have produced significant improvements. Other proposals involve covariance fitting techniques, which avoid deconvolution altogether. However, in their traditional presentation, these enhanced reconstruction methods have very high computational costs, mostly because they have no means of efficiently transforming back and forth between a hypothetical image and the measured data. In this paper, we propose the Kronecker Array Transform (KAT), a fast separable transform for array imaging applications. Under the assumption of a separable array, it enables the acceleration of imaging techniques by several orders of magnitude with respect to the fastest previously available methods, and enables the use of state-of-the-art regularized least-squares solvers. Using the KAT, one can reconstruct images with higher resolutions than was previously possible and use more accurate reconstruction techniques, opening new and exciting possibilities for acoustic imaging.

 

 

 

IMG27: Depth No-Synthesis-Error Model for View Synthesis in 3-D Video

Abstract

Currently, 3-D Video targets at the application of disparity-adjustable stereoscopic video, where view synthesis based on depth-image-based rendering (DIBR) is employed to generate virtual views. Distortions in depth information may introduce geometry changes or occlusion variations in the synthesized views. In practice, depth information is stored in 8-bit grayscale format, whereas the disparity range for a visually comfortable stereo pair is usually much less than 256 levels. Thus, several depth levels may correspond to the same integer (or sub-pixel) disparity value in the DIBR-based view synthesis such that some depth distortions may not result in geometry changes in the synthesized view. From this observation, we develop a depth no-synthesis-error (D-NOSE) model to examine the allowable depth distortions in rendering a virtual view without introducing any geometry changes. We further show that the depth distortions prescribed by the proposed D-NOSE profile also do not compromise the occlusion order in view synthesis. Therefore, a virtual view can be synthesized losslessly if depth distortions follow the D-NOSE specified thresholds. Our simulations validate the proposed D-NOSE model in lossless view synthesis and demonstrate the gain with the model in depth coding.

 

 

IMG28: Comparison of Texture Analysis Schemes Under Nonideal Conditions

Abstract

Several recent advancements in the field of texture analysis prompt some fundamental questions. For instance, what is the true impact of these novel advancements under real-world environments? When do these novel advancements fail to perform? Which methods perform better and under what conditions? In this work, we investigate these and other issues under nonideal image acquisition environments, specifically, environments with changing conditions due to illumination variations and those caused by both affine and nonaffine transformations. We study the performance of nine popular texture analysis algorithms using three different datasets, with varying levels of difficulty. Experiments are performed on nonideal texture datasets under five different setups. We find that most state-of-the-art techniques do not perform well under these conditions. To a large extent, their performance under nonideal conditions depends critically on the nature of the textural surface. Moreover, most techniques fail to perform reliably when the number of classes in the dataset is increased significantly, over the regular-size datasets used in previous work. Multiscale features performed reasonably well against variations caused by illumination and rotation but are prone to fail under changes in scale. Surprisingly, the performance for most of the algorithms is generally stable on structured or periodic textures, even with variations in illumination or affine transformations.

 

 

IMG29: A New Scheme for Robust Gradient Vector Estimation in Color Images

Abstract

Gradient estimators are mostly designed to yield accurate and robust estimates of the gradient magnitude, not the gradient direction. This paper proposes a method for the accurate and robust estimation of both the gradient magnitude and direction. It robustly estimates the gradient in the x- and y-directions. The robustness against noise is achieved by prefiltering and postfiltering of the gradient in each direction. To reduce edge blurring effects introduced by these filters, the gradient in a certain direction is obtained by applying the prefilter and postfilter in the perpendicular direction. The basic elements employed in each window are: highpass, lowpass and aggregation operators. The highpass operator is used as a gradient estimator, the lowpass operator is for prefiltering and postfiltering, and the aggregation operator is for aggregating the prefiltered and postfiltered gradients. Four different combinations of highpass, lowpass and aggregation operators are proposed: MVD-Median-Mean, MVD-Median-Max, RCMG-Median-Mean, and RCMG-Median-Max. Experimental results show that the RCMG-Median-Mean has the best performance in estimating the gradient and detecting the edges in noisy color images. It is computationally more efficient than the state-of-the-art gradient estimators and is able to accurately estimate the gradient direction as well as the gradient magnitude. Computer simulation results show that the proposed method outperforms other recently proposed color gradient estimators and edge detectors.

 

 

 

 

 

IMG30: DART: A Practical Reconstruction Algorithm for Discrete Tomography

Abstract

In this paper, we present an iterative reconstruction algorithm for discrete tomography, called discrete algebraic reconstruction technique (DART). DART can be applied if the scanned object is known to consist of only a few different compositions, each corresponding to a constant gray value in the reconstruction. Prior knowledge of the gray values for each of the compositions is exploited to steer the current reconstruction towards a reconstruction that contains only these gray values. Based on experiments with both simulated CT data and experimental $mu$CT data, it is shown that DART is capable of computing more accurate reconstructions from a small number of projection images, or from a small angular range, than alternative methods. It is also shown that DART can deal effectively with noisy projection data and that the algorithm is robust with respect to errors in the estimation of the gray values.

 

 

 

 

IMG31: Automatic Craniofacial Structure Detection on Cephalometric Images

Abstract

Anatomical structure tracing on cephalograms is a significant way to obtain cephalometric analysis. Cephalometric analysis is divided in two categories, manual and automatic approaches. The manual approach is limited in accuracy and repeatability due to differences in inter- and intra-personal marking. In this paper, we have attempted to develop and test a novel method for automatic localization of craniofacial structures based on the detected edges in the region of interest. Before edge detection of the particular region, the region was filtered by adaptive non local filter for noise removal by keeping the edge information undisturbed. According to the gray-scale feature at the different regions of the cephalograms, modified Canny edge detection algorithm for obtaining tissue contour was proposed. With the application of morphological opening and edge linking approaches, an improved bidirectional contour tracing methodology was proposed by an interactive selection of the starting edge pixels, the tracking process searches repetitively for an edge pixel at the neighborhood of previously searched edge pixel to segment images, and then craniofacial structures are obtained. The effectiveness of the algorithm is demonstrated by the preliminary experimental results obtained with the proposed method.

 

 

IMG32: Generating Descriptive Visual Words and Visual Phrases for Large-Scale Image Applications

Abstract

Bag-of-visual Words (BoWs) representation has been applied for various problems in the fields of multimedia and computer vision. The basic idea is to represent images as visual documents composed of repeatable and distinctive visual elements, which are comparable to the text words. Notwithstanding its great success and wide adoption, visual vocabulary created from single-image local descriptors is often shown to be not as effective as desired. In this paper, descriptive visual words (DVWs) and descriptive visual phrases (DVPs) are proposed as the visual correspondences to text words and phrases, where visual phrases refer to the frequently co-occurring visual word pairs. Since images are the carriers of visual objects and scenes, a descriptive visual element set can be composed by the visual words and their combinations which are effective in representing certain visual objects or scenes. Based on this idea, a general framework is proposed for generating DVWs and DVPs for image applications. In a large-scale image database containing 1506 object and scene categories, the visual words and visual word pairs descriptive to certain objects or scenes are identified and collected as the DVWs and DVPs. Experiments show that the DVWs and DVPs are informative and descriptive and, thus, are more comparable with the text words than the classic visual words. We apply the identified DVWs and DVPs in several applications including large-scale near-duplicated image retrieval, image search re-ranking, and object recognition. The combination of DVW and DVP performs better than the state of the art in large-scale near-duplicated image retrieval in terms of accuracy, efficiency and memory consumption. The proposed image search re-ranking algorithm: DWPRank outperforms the state-of-the-art algorithm by 12.4% in mean average precision and about 11 times faster in efficiency.

 

IMG33: Fast Bilateral Filter With Arbitrary Range and Domain Kernels

Abstract

In this paper, we present a fast implementation of the bilateral filter with arbitrary range and domain kernels. It is based on the histogram-based fast bilateral filter approximation that uses uniform box as the domain kernel. Instead of using a single box kernel, multiple box kernels are used and optimally combined to approximate an arbitrary domain kernel. The method achieves better approximation of the bilateral filter compared to the single box kernel version with little increase in computational complexity. We also derive the optimal kernel size when a single box kernel is used.

 

IMG34: Computational Color Constancy: Survey and Experiments

Abstract

Computational color constancy is a fundamental prerequisite for many computer vision applications. This paper presents a survey of many recent developments and state-of-the-art methods. Several criteria are proposed that are used to assess the approaches. A taxonomy of existing algorithms is proposed and methods are separated in three groups: static methods, gamut-based methods, and learning-based methods. Further, the experimental setup is discussed including an overview of publicly available datasets. Finally, various freely available methods, of which some are considered to be state of the art, are evaluated on two datasets.

 

IMG35: A No-Reference Image Blur Metric Based on the Cumulative Probability of Blur Detection (CPBD)

Abstract

This paper presents a no-reference image blur metric that is based on the study of human blur perception for varying contrast values. The metric utilizes a probabilistic model to estimate the probability of detecting blur at each edge in the image, and then the information is pooled by computing the cumulative probability of blur detection (CPBD). The performance of the metric is demonstated by comparing it with existing no-reference sharpness/blurriness metrics for various publicly available image databases.

 

IMG36: A Closed-Form Approximation of the Exact Unbiased Inverse of the Anscombe Variance-Stabilizing Transformation

Abstract

We presented an exact unbiased inverse of the Anscombe variance-stabilizing transformation in [M. Mäkitalo and A. Foi, “Optimal inversion of the Anscombe transformation in low-count Poisson image denoising,” IEEE Trans. Image Process., vol. 20, no. 1, pp. 99–109, Jan. 2011.] and showed that when applied to Poisson image denoising, the combination of variance stabilization and state-of-the-art Gaussian denoising algorithms is competitive with some of the best Poisson denoising algorithms. We also provided a MATLAB implementation of our method, where the exact unbiased inverse transformation appears in nonanalytical form. Here, we propose a closed-form approximation of the exact unbiased inverse in order to facilitate the use of this inverse. The proposed approximation produces results equivalent to those obtained with the accurate (nonanalytical) exact unbiased inverse, and thus, notably better than one would get with the asymptotically unbiased inverse transformation that is commonly used in applications.

 

IMG37: A Bayesian Network Model for Automatic and Interactive Image Segmentation

Abstract

We propose a new Bayesian network (BN) model for both automatic and interactive image segmentation. A multilayer BN is constructed from an oversegmentation to model the statistical dependencies among superpixel regions, edge segments, vertices, and their measurements. The BN also incorporates various local constraints to further restrain the relationships among these image entities. Given the BN model and various image measurements, belief propagation is performed to update the probability of each node. Image segmentation is generated by the most probable explanation inference of the true states of both region and edge nodes from the updated BN. Besides the automatic image segmentation, the proposed model can also be used for interactive image segmentation. While existing interactive segmentation (IS) approaches often passively depend on the user to provide exact intervention, we propose a new active input selection approach to provide suggestions for the user's intervention. Such intervention can be conveniently incorporated into the BN model to perform actively IS. We evaluate the proposed model on both the Weizmann dataset and VOC2006 cow images. The results demonstrate that the BN model can be used for automatic segmentation, and more importantly, for actively IS. The experiments also show that the IS with active input selection can improve both the overall segmentation accuracy and efficiency over the IS with passive intervention.

 

IMG38: A Flexible Content-Adaptive Mesh-Generation Strategy for Image Representation

Abstract

Based on the greedy-point removal (GPR) scheme of Demaret and Iske, a simple yet highly effective framework for constructing triangle-mesh representations of images, called GPRFS, is proposed. By using this framework and ideas from the error diffusion (ED) scheme (for mesh-generation) of Yang , a highly effective mesh-generation method, called GPRFS-ED, is derived and presented. Since the ED scheme plays a crucial role in our work, factors affecting the performance of this scheme are also studied in detail. Through experimental results, our GPRFS-ED method is shown to be capable of generating meshes of quality comparable to, and in many cases better than, the state-of-the-art GPR scheme, while requiring substantially less computation and memory. Furthermore, with our GPRFS-ED method, one can easily trade off between mesh quality and computational/memory complexity. A reduced-complexity version of the GPRFS-ED method (called GPRFS-MED) is also introduced to further demonstrate the computational/memory-complexity scalability of our GPRFS-ED method.

IMGV39:  High Capacity Color Barcodes: Per Channel Data Encoding via Orientation Modulation in Elliptical Dot Arrays

 

We present a new high capacity color barcode. The barcode we propose uses the cyan, magenta, and yellow (C,M,Y) colorant separations available in color printers and enables high capacity by independently encoding data in each of these separations. In each colorant channel, payload data is conveyed by using a periodic array of elliptically shaped dots whose individual orientations are modulated to encode the data. The orientation based data encoding provides beneficial robustness against printer and scanner tone variations. The overall color barcode is obtained when these color separations are printed in overlay as is common in color printing. A reader recovers the barcode data from a conventional color scan of the barcode, using red, green, and blue (R,G,B) channels complementary, respectively, to the print C, M, and Y channels. For each channel, first the periodic arrangement of dots is exploited at the reader to enable synchronization by compensating for both global rotation/scaling in scanning and local distortion in printing. To overcome the color interference resulting from colorant absorptions in noncomplementary scanner channels, we propose a novel interference minimizing data encoding approach and a statistical channel model (at the reader) that captures the characteristics of the interference, enabling more accurate data recovery. We also employ an error correction methodology that effectively utilizes the channel model. The experimental results show that the proposed method works well, offering (error-free) operational rates that are comparable to or better than the highest capacity barcodes known in the literature.

 

 

 

IMGV40: ViBe: A Universal Background Subtraction Algorithm for Video Sequences

 

This paper presents a technique for motion detection that incorporates several innovative mechanisms. For example, our proposed technique stores, for each pixel, a set of values taken in the past at the same location or in the neighborhood. It then compares this set to the current pixel value in order to determine whether that pixel belongs to the background, and adapts the model by choosing randomly which values to substitute from the background model. This approach differs from those based upon the classical belief that the oldest values should be replaced first. Finally, when the pixel is found to be part of the background, its value is propagated into the background model of a neighboring pixel. We describe our method in full details (including pseudo-code and the parameter values used) and compare it to other background subtraction techniques. Efficiency figures show that our method outperforms recent and proven state-of-the-art methods in terms of both computation speed and detection rate. We also analyze the performance of a downscaled version of our algorithm to the absolute minimum of one comparison and one byte of memory per pixel. It appears that even such a simplified version of our algorithm performs better than mainstream techniques

 

 

 

 

IMGV41: GRISSOM Platform: Enabling Distributed Processing and Management of Biological Data Through Fusion of Grid and Web Technologies

 

Transcriptomic technologies have a critical impact in the revolutionary changes that reshape biological research. Through the recruitment of novel high-throughput instrumentation and advanced computational methodologies, an unprecedented wealth of quantitative data is produced. Microarray experiments are considered high-throughput, both in terms of data volumes (data intensive) and processing complexity (computationally intensive). In this paper, we present grids for in silico systems biology and medicine (GRISSOM), a web-based application that exploits GRID infrastructures for distributed data processing and management, of DNA microarrays (cDNA, Affymetrix, Illumina) through a generic, consistent, computational analysis framework. GRISSOM performs versatile annotation and integrative analysis tasks, through the use of third-party application programming interfaces, delivered as web services. In parallel, by conforming to service-oriented architectures, it can be encapsulated in other biomedical processing workflows, with the help of workflow enacting software, like Taverna Workbench, thus rendering access to its algorithms, transparent and generic. GRISSOM aims to set a generic paradigm of efficient metamining that promotes translational research in biomedicine, through the fusion of grid and semantic web computing technologies

 

IMGV42: MR Image Reconstruction From Highly Under sampled k-Space Data by Dictionary Learning

Compressed sensing (CS) utilizes the sparsity of magnetic resonance (MR) images to enable accurate reconstruction from undersampled k-space data. Recent CS methods have employed analytical sparsifying transforms such as wavelets, curvelets, and finite differences. In this paper, we propose a novel framework for adaptively learning the sparsifying transform (dictionary), and reconstructing the image simultaneously from highly undersampled k-space data. The sparsity in this framework is enforced on overlapping image patches emphasizing local structure. Moreover, the dictionary is adapted to the particular image instance thereby favoring better sparsities and consequently much higher undersampling rates. The proposed alternating reconstruction algorithm learns the sparsifying dictionary, and uses it to remove aliasing and noise in one step, and subsequently restores and fills-in the k-space data in the other step. Numerical experiments are conducted on MR images and on real MR data of several anatomies with a variety of sampling schemes. The results demonstrate dramatic improvements on the order of 4-18 dB in reconstruction error and doubling of the acceptable undersampling factor using the proposed adaptive dictionary as compared to previous CS methods. These improvements persist over a wide range of practical data signal-to-noise ratios, without any parameter tuning

 

IMGV43: Game-Theoretic Strategies and Equilibriums in Multimedia Fingerprinting Social Networks

Multimedia social network is a network infrastructure in which the social network users share multimedia contents with all different purposes. Analyzing user behavior in multimedia social networks helps design more secured and efficient multimedia and networking systems. Multimedia fingerprinting protects multimedia from illegal alterations and multiuser collusion is a cost-effective attack. The colluder social network is naturally formed during multiuser collusion with which colluders gain reward by redistributing the colluded multimedia contents. Since the colluders have conflicting interest, the maximal-payoff collusion for one colluder may not be the maximal-payoff collusion for others. Hence, before a collusion being successful, the colluders must bargain with each other to reach agreements. We first model the bargaining behavior among colluders as a noncooperative game and study four different bargaining solutions of this game. Moreover, the market value of the redistributed multimedia content is often time-sensitive. The earlier the colluded copy being released, the more the people are willing to pay for it. Thus, the colluders have to reach agreements on how to distribute reward and risk among themselves as soon as possible. This paper further incorporates this time-sensitiveness of the colluders' reward and studies the time-sensitive bargaining equilibrium. The study in this paper reveals the strategies that are optimal for the colluders; thus, all the colluders have no inventive to disagree. Such understanding reduces the possible types of collusion into a small finite set.

 

IMGV44: Moving Region Segmentation From Compressed Video Using Global Motion Estimation and Markov Random Fields

In this paper, we propose an unsupervised segmentation algorithm for extracting moving regions from compressed video using global motion estimation (GME) and Markov random field (MRF) classification. First, motion vectors (MVs) are compensated from global motion and quantized into several representative classes, from which MRF priors are estimated. Then, a coarse segmentation map of the MV field is obtained using a maximum a posteriori estimate of the MRF label process. Finally, the boundaries of segmented moving regions are refined using color and edge information. The algorithm has been validated on a number of test sequences, and experimental results are provided to demonstrate its advantages over state-of-the-art methods

 

 

 

 

 

IMGV44: Towards Brain First-Aid: A Diagnostic Device for Conscious Awareness -

When the brain is damaged, evaluating an individual's level of awareness can be a major diagnostic challenge (Is he or she in there?). Existing tests typically rely on behavioral indicators, which are incorrect in as many as one out of every two cases. The current paper presents a diagnostic device that addresses this problem. The technology circumvents behavioral limitations through noninvasive brain wave measurements (electroencephalography, or EEG). Unlike traditional EEG, the device is designed for point-of-care use by incorporating a portable, user-friendly, and stable design. It uses a novel software algorithm that automates subject stimulation, data acquisition/analysis, and the reporting of results. The test provides indicators for five identifiable levels of neural processing: sensation, perception, attention, memory, and language. The results are provided as rapidly obtained diagnostic, reliability, validity, and prognostic scores. The device can be applied to a wide variety of patients across a host of different environments. The technology is designed to be wireless-enabled for remote monitoring and assessment capabilities. In essence, the device is developed to scan for conscious awareness in order to optimize subsequent patient care.

 

 

 

 

IMGV45: Size-Controllable Region-of-Interest in Scalable Image Representation

 

Differentiating region-of-interest (ROI) from non-ROI in an image in terms of relative size as well as fidelity becomes an important functionality for future visual communication environment with a variety of display devices. In this paper, we propose a scalable image representation with the ROI functionality in the spatial domain, which allows us to generate a hierarchy of images with arbitrary sizes. The ROI functionality of our scalable representation is a result of a nonuniform grid transformation in the spatial domain, where only the center of ROI and an expansion parameter are to be known. Our grid transformation guarantees no loss of information within the area of ROI.

 

 

 

 

 

IMGV46: Optimizing a Tone Curve for Backward-Compatible High Dynamic Range Image and Video Compression

For backward compatible high dynamic range (HDR) video compression, the HDR sequence is reconstructed by inverse tone-mapping a compressed low dynamic range (LDR) version of the original HDR content. In this paper, we show that the appropriate choice of a tone-mapping operator (TMO) can significantly improve the reconstructed HDR quality. We develop a statistical model that approximates the distortion resulting from the combined processes of tone-mapping and compression. Using this model, we formulate a numerical optimization problem to find the tone-curve that minimizes the expected mean square error (MSE) in the reconstructed HDR sequence. We also develop a simplified model that reduces the computational complexity of the optimization problem to a closed-form solution. Performance evaluations show that the proposed methods provide superior performance in terms of HDR MSE and SSIM compared to existing tone-mapping schemes. It is also shown that the LDR image quality resulting from the proposed methods matches that produced by perceptually-based TMOs.

 

 

 

 

 

IMGV47: Retinal Image Analysis Using Curvelet Transform and Multistructure Elements Morphology by Reconstruction

 

Retinal images can be used in several applications, such as ocular fundus operations as well as human recognition. Also, they play important roles in detection of some diseases in early stages, such as diabetes, which can be performed by comparison of the states of retinal blood vessels. Intrinsic characteristics of retinal images make the blood vessel detection process difficult. Here, we proposed a new algorithm to detect the retinal blood vessels effectively. Due to the high ability of the curvelet transform in representing the edges, modification of curvelet transform coefficients to enhance the retinal image edges better prepares the image for the segmentation part. The directionality feature of the multistructure elements method makes it an effective tool in edge detection. Hence, morphology operators using multistructure elements are applied to the enhanced image in order to find the retinal image ridges. Afterward, morphological operators by reconstruction eliminate the ridges not belonging to the vessel tree while trying to preserve the thin vessels unchanged. In order to increase the efficiency of the morphological operators by reconstruction, they were applied using multistructure elements. A simple thresholding method along with connected components analysis (CCA) indicates the remained ridges belonging to vessels. In order to utilize CCA more efficiently, we locally applied the CCA and length filtering instead of considering the whole image. Experimental results on a known database, DRIVE, and achieving to more than 94% accuracy in about 50 s for blood vessel detection, proved that the blood vessels can be effectively detected by applying our method on the retinal images.

 

 

BIOMEDICINE

IMGV48: Hybrid Genetic and Variational Expectation-Maximization Algorithm for Gaussian-Mixture-Model-Based Brain MR Image Segmentation

The expectation-maximization (EM) algorithm has been widely applied to the estimation of Gaussian mixture model (GMM) in brain MR image segmentation. However, the EM algorithm is deterministic and intrinsically prone to overfitting the training data and being trapped in local optima. In this paper, we propose a hybrid genetic and variational EM (GA-VEM) algorithm for brain MR image segmentation. In this approach, the VEM algorithm is performed to estimate the GMM, and the GA is employed to initialize the hyperparameters of the conjugate prior distributions of GMM parameters involved in the VEM algorithm. Since GA has the potential to achieve global optimization and VEM can steadily avoid overfitting, the hybrid GA-VEM algorithm is capable of overcoming the drawbacks of traditional EM-based methods. We compared our approach to the EM-based, VEM-based, and GA-EM based segmentation algorithms, and the segmentation routines used in the statistical parametric mapping package and FMRIB Software Library in 20 low-resolution and 17 high-resolution brain MR studies. Our results show that the proposed approach can improve substantially the performance of brain MR image segmentation

 

MEDICAL IMAGE

 

IMGV49Vessel Boundary Delineation on Fundus Images Using Graph-Based Approach

This paper proposes an algorithm to measure the width of retinal vessels in fundus photographs using graph-based algorithm to segment both vessel edges simultaneously. First, the simultaneous two-boundary segmentation problem is modeled as a two-slice, 3-D surface segmentation problem, which is further converted into the problem of computing a minimum closed set in a node-weighted graph. An initial segmentation is generated from a vessel probability image. We use the REVIEW database to evaluate diameter measurement performance. The algorithm is robust and estimates the vessel width with subpixel accuracy. The method is used to explore the relationship between the average vessel width and the distance from the optic disc in 600 subjects

 

 

 

 

 

MULTIMEDIA

 

IMGV50: Routing-Aware Multiple Description Video Coding Over Mobile Ad-Hoc Networks -

This paper proposes a cross-layer approach called routing-aware multiple description coding with multipath transport to support video communications over mobile ad-hoc networks. This approach establishes a packet loss model based on the MAC access mechanism and network parameters, and utilizes it along with the routing messages from multipath routing to estimate the packet loss probability of transmitted video packets. Then the estimated results are passed to the application layer to assist reference frame selection for multiple description coding in order to mitigate error propagation introduced in the motion-compensated loop. Results show that this is an effective approach to improve error resilience of video transmission over mobile ad-hoc networks and enhance the video experience for multiple users. Optimal Bandwidth Assignment for Multiple-Description-Coded Video - 05659908.pdf

 

 

 

 

IMGV51: Scalable Video Multicast in Hybrid 3G/Ad-Hoc Networks

Mobile video broadcasting service, or mobile TV, is expected to become a popular application for 3G wireless network operators. Most existing solutions for video Broadcast Multicast Services (BCMCS) in 3G networks employ a single transmission rate to cover all viewers. The system-wide video quality of the cell is therefore throttled by a few viewers close to the boundary, and is far from reaching the social-optimum allowed by the radio resources available at the base station. In this paper, we propose a novel scalable video broadcast/multicast solution, SV-BCMCS, that efficiently integrates scalable video coding, 3G broadcast, and ad-hoc forwarding to balance the system-wide and worst-case video quality of all viewers at 3G cell. We solve the optimal resource allocation problem in SV-BCMCS and develop practical helper discovery and relay routing algorithms. Moreover, we analytically study the gain of using ad-hoc relay, in terms of users' effective distance to the base station. Through extensive real video sequence driven simulations, we show that SV-BCMCS significantly improves the system-wide perceived video quality. The users' average PSNR increases by as much as 1.70 dB with slight quality degradation for the few users close to the 3G cell boundary.

 

 

 

 

NEURAL NETWORK

IMGV52: A New Supervised Method for Blood Vessel Segmentation in Retinal Images by Using Gray-Level and Moment Invariants-Based Features

This paper presents a new supervised method for blood vessel detection in digital retinal images. This method uses a neural network (NN) scheme for pixel classification and computes a 7-D vector composed of gray-level and moment invariants-based features for pixel representation. The method was evaluated on the publicly available DRIVE and STARE databases, widely used for this purpose, since they contain retinal images where the vascular structure has been precisely marked by experts. Method performance on both sets of test images is better than other existing solutions in literature. The method proves especially accurate for vessel detection in STARE images. Its application to this database (even when the NN was trained on the DRIVE database) outperforms all analyzed segmentation approaches. Its effectiveness and robustness with different image conditions, together with its simplicity and fast implementation, make this blood vessel segmentation proposal suitable for retinal image computer analyses such as automated screening for early diabetic retinopathy detection