IMS 1: Iris matching using
multi-dimensional artificial neural network
Iris recognition is one of the most widely used biometric technique for personal identification. This identification is achieved in this work by using the concept that, the iris patterns are statistically unique and suitable for biometric measurements. In this study, a novel method of recognition of these patterns of an iris is considered by using a multidimensional artificial neural network. The proposed technique has the distinct advantage of using the entire resized iris as an input at once. It is capable of excellent pattern recognition properties as the iris texture is unique for every person used for recognition. The system is trained and tested using two publicly available databases (CASIA and UBIRIS). The proposed approach shows significant promise and potential for improvements, compared with the other conventional matching techniques with regard to time and efficiency of results

IMS2: Privacy
Protection of Fingerprint Database
A
fingerprint authentication system for the privacy protection of the fingerprint
template stored in a database is introduced here. The considered fingerprint
data is a binary thinned fingerprint image, which will be embedded with some
private user information without causing obvious abnormality in the enrollment
phase. In the authentication phase, these hidden user data can be extracted
from the stored template for verifying the authenticity of the person who
provides the query fingerprint. A novel data hiding scheme is proposed for the
thinned fingerprint template. This scheme does not produce any boundary pixel
in the thinned fingerprint during data embedding. Thus, the abnormality caused
by data hiding is visually imperceptible in the marked-thinned fingerprint.
Compared with using existing binary image data hiding techniques, the proposed
method causes the least abnormality for a thinned fingerprint without
compromising the performance of the fingerprint identification

IMS 3: Smart card with iris recognition for high security
access environment
Smart cards are increasingly being used as a form of identification and authentication. One inherent problem with smart cards, however, is the possibility of loss or theft. Current options for securing smart cards against unauthorized use are primarily restricted to passwords. Passwords are easy enough for others to steal so that they do not offer sufficient protection. This has promoted interest in biometric identification methods, including iris recognition. The iris is, due to its unique biological properties, exceptionally suited for identification. It is protected from the environment, stable over time, unique in shape and contains a high amount of discriminating information. This paper proposes a method to integrate iris recognition with the smart card to develop a high security access environment. An iris recognition system and smart card programming circuit with its software have been designed. Template on card (TOC) category has been employed. Hence, the extracted iris features stored in smart card are compared against the data acquired from a camera or database for authentication. The proposed algorithm has superior performance in terms of security, accuracy and consistency compared with other published technology.

IMS 4: Digital signature with
localization for image authentication
This paper proposes a method of extracting a digital signature that can localize tampered areas. The method of generating the digital signature of an image is based on the regularity properties of wavelet transform coefficients
Objecive: Person identification as a security means has a variety of important applications. Many techniques and automated systems have been developed over the past few decades; each has its own advantages and limitations. There are often trade-offs amongst reliability, the ease of use,ethical/human rights issues, and acceptability in a particular application. Multimodal identification and authentication can, to some extent, alleviate the dilemmas and improve the overall performance. This paper proposes a new method of the combined use of signatures and utterances of pronounced names to identify or authenticate persons. Unlike typical signature verification methods, the dynamic features of signatures are captured as sound.The multimodal approach shows increased reliability, providing a relatively simple and potentially useful method for person identification and authentication.

Video
Processing
IMS 5: An Advanced Motion Detection Algorithm With Video Quality Analysis for Video Surveillance Systems
Motion detection is the first essential process in the extraction of information regarding moving objects and makes use of stabilization in functional areas, such as tracking, classification, recognition, and so on. In this paper, we propose a novel and accurate approach to motion detection for the automatic video surveillance system. Our method achieves complete detection of moving objects by involving three significant proposed modules: a background modeling (BM) module, an alarm trigger (AT) module, and an object extraction (OE) module. For our proposed BM module, a unique two-phase background matching procedure is performed using rapid matching followed by accurate matching in order to produce optimum background pixels for the background model. Next, our proposed AT module eliminates the unnecessary examination of the entire background region, allowing the subsequent OE module to only process blocks containing moving objects. Finally, the OE module forms the binary object detection mask in order to achieve highly complete detection of moving objects. The detection results produced by our proposed (PRO) method were both qualitatively and quantitatively analyzed through visual inspection and for accuracy, along with comparisons to the results produced by other state-of-the-art methods. The analyses show that our PRO method has a substantially higher degree of efficacy, outperforming other methods by an metric accuracy rate of up to 53.43%.

Face regions are visual focuses in conversational video communications, thus better reconstruction quality of the regions of interest (ROI) is highly desired or necessary in the bandwidth-constrained conversational video coding. In this paper, we introduce an efficient motion based face detection method to identify face blocks in the first step, which can reduce computational complexity substantially without any loss in face detection results. Then an active contour model is applied to find face contours for more refined and compact face regions. Based on the well-located and compact face regions, facial feature priority based bit allocation is proposed for face ROI based conversational video coding. Experimental results demonstrate that the proposed face region based coding can considerably improve the coding results in the face regions, compared with two other relevant video coding schemes, in terms of objective rate-distortion performance as well as subjective visual quality

IMS 7: Motion and feature based person tracking in surveillance
videos
This
work describes a method for accurately tracking persons in indoor surveillance
video stream obtained from a static camera with difficult scene properties
including illumination changes and solves the major occlusion problem. First,
moving objects are precisely extracted by determining its motion,
for further processing. The scene illumination changes are averaged to obtain
the accurate moving object during background subtraction process. In case of
objects occlusion, we use the color feature information to accurately
distinguish between objects. The method is able to identify moving persons,
track them and provide unique tag for the tracked persons. The effectiveness of
the proposed method is demonstrated with experiments in an indoor environment.

IMS 8: Identification and analysis of human pose in video
Human
figure identification is always a challenging move in field of pattern
recognition. This paper presents a complete algorithm to find a single object
(human body) and identify the object as human being. The algorithm starts the
segmentation process with basic frame difference method and use morphological
operators, edge detection, feature point generation
and finally spline interpolation to find the human like object. After
completion of successful segmentation the algorithm takes a gentle approach to identify
the object as human body and detect the pose by matching the templates. This
paper describes every single step of single human body detection
with perfection and ready for real life use

IMG1: 3-D
Reconstruction of Microtubules From
Multi-Angle Total Internal Reflection Fluorescence Microscopy Using Bayesian
Framework
Total internal reflection fluorescence (TIRF) microscopy excites a thin evanescent
field which theoretically decays exponentially. Each TIRF image is actually the
projection of a 3-D volume and hence cannot alone produce an accurate
localization of structures in the z-dimension, however, it provides
greatly improved axial resolution for biological samples. Multiple angle-TIRF
microscopy allows controlled variation of the incident angle of the
illuminating laser beam, thus generating a set of images of different
penetration depths with the potential to reconstruct the 3-D volume of the
sample. With the ultimate goal to quantify important biological parameters of microtubules,
we present a method to reconstruct 3-D position and orientation of microtubules
based on multi-angle TIRF data, as well as experimental calibration of the
actual decay function of the evanescent field at each angle. We validate our
method using computer simulations, by creating a phantom simulating the
curvilinear characteristics of microtubules and project the artificially
constructed volume into a set of TIRF image for different penetration depth.
The reconstructed depth information for the phantom data is shown to be
accurate and robust to noise. We apply our method to microtubule TIRF images of
PtK2 cells in vivo. By comparing microtubule curvatures of the
reconstruction results and several electron microscopy (EM) images of
vertically sliced sample of microtubules, we find that the curvature statistics
of our reconstruction agree well with the ground truth (EM data). Quantifying
the distribution of microtubule curvature reveals an interesting discovery that
microtubules can buckle and form local bendings of considerably small radius of
curvature which is also visually spotted on the EM images, while microtubule
bendings on a larger scale generally have a much larger radius and cannot bear
the stress of a large curvature. The presented method has the potential to
provide a - - reliable tool for 3-D reconstruction and tracking of
microtubules.
IMG2: Two Efficient
Label-Equivalence-Based Connected-Component Labeling Algorithms for 3-D Binary
Images
Abstract
Whenever
one wants to distinguish, recognize, and/or measure objects (connected
components) in binary images, labeling is required. This paper presents two
efficient label-equivalence-based connected-component labeling algorithms for
3-D binary images. One is voxel based and the other is run based. For the
voxel-based one, we present an efficient method of deciding the order for
checking voxels in the mask. For the run-based one, instead of assigning each
foreground voxel, we assign each run a provisional label. Moreover, we use run
data to label foreground voxels without scanning any background voxel in the
second scan. Experimental results have demonstrated that our voxel-based
algorithm is efficient for 3-D binary images with complicated connected components,
that our run-based one is efficient for those with simple connected components,
and that both are much more efficient than conventional 3-D labeling
algorithms.
IMG3: Stationary Probability
Model for Bitplane Image Coding Through Local Average of Wavelet Coefficients
Abstract
This
paper introduces a probability model for symbols emitted by bitplane image
coding engines, which is conceived from a precise characterization of the
signal produced by a wavelet transform. Main insights behind the proposed model
are the estimation of the magnitude of wavelet coefficients as the arithmetic
mean of its neighbors' magnitude (the so-called local average), and the
assumption that emitted bits are under-complete representations of the
underlying signal. The local average-based probability model is introduced in
the framework of JPEG2000. While the resulting system is not JPEG2000
compatible, it preserves all features of the standard. Practical benefits of
our model are enhanced coding efficiency, more opportunities for parallelism,
and improved spatial scalability.
IMG4: Natural Image
Segmentation Based on Tree Equipartition, Bayesian Flooding and Region Merging
Abstract
We
propose a general purpose image segmentation framework, which involves feature
extraction and classification in feature space, followed by flooding and
merging in spatial domain. Region growing is based on the computed local
measurements and distances from the distribution of features describing the
different classes. Using the properties of the label dependent distances
spatial coherence is ensured, since the image features are described globally.
The distribution of the features for the different classes are obtained by
block-wise unsupervised clustering based on the construction of the minimum
spanning tree of the blocks' grid using the Mallows distance and the
equipartition of the resulting tree. The final clustering is obtained by using
the k-centroids algorithm. With high probability and under topological
constraints, connected components of the maximum likelihood classification map
are used to compute a map of initially labelled pixels. An efficient flooding
algorithm is introduced, namely, Priority Multi-Class Flooding Algorithm
(PMCFA), that assign pixels to labels using Bayesian dissimilarity criteria. A
new region merging method, which incorporates boundary information, is
introduced for obtaining the final segmentation map. Therefore, the merging
stage is based on region features and edge localization. Segmentation results
on the Berkeley benchmark data set demonstrate the effectiveness of the
proposed methods.
IMG5: LCD Motion Blur:
Modeling, Analysis, and Algorithm
Abstract
Liquid
crystal display (LCD) devices are well known for their slow responses due to
the physical limitations of liquid crystals. Therefore, fast moving objects in
a scene are often perceived as blurred. This effect is known as the LCD motion
blur. In order to reduce LCD motion blur, an accurate LCD model and an
efficient deblurring algorithm are needed. However, existing LCD motion blur
models are insufficient to reflect the limitation of human-eye-tracking system.
Also, the spatiotemporal equivalence in LCD motion blur models has not been
proven directly in the discrete 2-D spatial domain, although it is widely used.
There are three main contributions of this paper: modeling, analysis, and
algorithm. First, a comprehensive LCD motion blur model is presented, in which
human-eye-tracking limits are taken into consideration. Second, a complete
analysis of spatiotemporal equivalence is provided and verified using real
video sequences. Third, an LCD motion blur reduction algorithm is proposed. The
proposed algorithm solves an l1-norm regularized
least-squares minimization problem using a subgradient projection method.
Numerical results show that the proposed algorithm gives higher peak SNR, lower
temporal error, and lower spatial error than motion-compensated inverse
filtering and Lucy-Richardson deconvolution algorithm, which are two
state-of-the-art LCD deblurring algorithms.
IMG6: Illumination Recovery
From Image With Cast Shadows Via Sparse Representation
Abstract
In
this paper, we propose using sparse representation for recovering the
illumination of a scene from a single image with cast shadows, given the
geometry of the scene. The images with cast shadows can be quite complex and,
therefore, cannot be well approximated by low-dimensional linear subspaces.
However, it can be shown that the set of images produced by a Lambertian scene
with cast shadows can be efficiently represented by a sparse set of images
generated by directional light sources. We first model an image with cast
shadows composed of a diffusive part (without cast shadows) and a residual part
that captures cast shadows. Then, we express the problem in an ℓ1-regularized
least-squares formulation, with nonnegativity constraints (as light has to be
non-negative at any point in space). This sparse representation enjoys an
effective and fast solution thanks to recent advances in compressive sensing.
In experiments on synthetic and real data, our approach performs favorably in
comparison with several previously proposed methods.
IMG7: FSIM: A Feature
Similarity Index for Image Quality Assessment
Abstract
Image
quality assessment (IQA) aims to use computational models to measure the image
quality consistently with subjective evaluations. The well-known structural
similarity index brings IQA from pixel- to structure-based stage. In this
paper, a novel feature similarity (FSIM) index for full reference IQA is
proposed based on the fact that human visual system (HVS) understands an image
mainly according to its low-level features. Specifically, the phase congruency
(PC), which is a dimensionless measure of the significance of a local
structure, is used as the primary feature in FSIM. Considering that PC is
contrast invariant while the contrast information does affect HVS' perception
of image quality, the image gradient magnitude (GM) is employed as the
secondary feature in FSIM. PC and GM play complementary roles in characterizing
the image local quality. After obtaining the local quality map, we use PC again
as a weighting function to derive a single quality score. Extensive experiments
performed on six benchmark IQA databases demonstrate that FSIM can achieve much
higher consistency with the subjective evaluations than state-of-the-art IQA
metrics.
IMG8: Enhanced Adaptive Loop
Filter for Motion Compensated Frame
Abstract
We propose an adaptive loop filter to remove the redundancy between current and
motion compensated frames so that the residual signal is minimized, thus coding
efficiency increases. The loop filter coefficients and offset are optimized for
each frame or a set of blocks to minimize the total energy of the residual
signal resulting from motion estimation and compensation. The optimized loop
filter with offset is applied for the set of blocks where the filtering process
gives coding gain based upon rate-distortion cost. The proposed loop filter is
used for the motion compensated frame whereas the conventional adaptive
interpolation filter (AIF) is applied to the reference frames to interpolate
the subpixel values. Another conventional scheme adaptive loop filter (ALF), is
used after deblocking filtering to enhance quality of reconstructed frames, not
to minimize energy of residual signal. The proposed loop filter can be used in
combination with the AIF and ALF. Experimental results show that proposed
algorithm provides the averaged bit reduction of 8% compared to conventional
H.264/AVC scheme. When the proposed scheme is combined with AIF and ALF, the
coding gain increases even further.
IMG9: Composite Model-Based
DC Dithering for Suppressing Contour Artifacts in Decompressed Video
Abstract
Because
of the outstanding contribution in improving compression efficiency,
block-based quantization has been widely accepted in state-of-the-art image/video
coding standards. However, false contour artifacts are introduced, which result
in reducing the fidelity of the decoded image/video especially in terms of
subjective quality. In this paper, a block-based decontouring method is
proposed to reduce the false contour artifacts in the decoded image/video by
automatically dithering its direct current (DC) value according to a composite
model established between gradient smoothness and block-edge smoothness.
Feature points on the model with the corresponding criteria in suppressing
contour artifacts are compared to show a good consistency between the model and
the actual processing effects. Discrete cosine transform (DCT)-based block
level contour artifacts detection mechanism ensures the blocks within the
texture region are not affected by the DC dithering. Both the implementation
method and the algorithm complexity are analyzed to present the feasibility in
integrating the proposed method into an existing video decoder on an embedded
platform or system-on-chip (SoC). Experimental results demonstrate the
effectiveness of the proposed method both in terms of subjective quality and
processing complexity in comparison with the previous methods.
IMG10: ADART: An Adaptive
Algebraic Reconstruction Algorithm for Discrete Tomography
Abstract
In
this paper we suggest an algorithm based on the Discrete Algebraic
Reconstruction Technique (DART) which is capable of computing high quality
reconstructions from substantially fewer projections than required for conventional
continuous tomography. Adaptive DART (ADART) goes a step further than DART on
the reduction of the number of unknowns of the associated linear system
achieving a significant reduction in the pixel error rate of reconstructed
objects. The proposed methodology automatically adapts the border definition
criterion at each iteration, resulting in a reduction of the number of pixels
belonging to the border, and consequently of the number of unknowns in the
general algebraic reconstruction linear system to be solved, being this
reduction specially important at the final stage of the iterative process.
Experimental results show that reconstruction errors are considerably reduced
using ADART when compared to original DART, both in clean and noisy
environments.
IMG11: $t$ -Tests, $F$ -Tests
and Otsu's Methods for Image Thresholding
Abstract
Otsu's
binarization method is one of the most popular image-thresholding methods;
Student's t -test is one of the most widely-used statistical tests to
compare two groups. This paper aims to stress the equivalence between Otsu's
binarization method and the search for an optimal threshold that provides the
largest absolute Student's t-statistic. It is then naturally
demonstrated that the extension of Otsu's binarization method to multi-level
thresholding is equivalent to the search for optimal thresholds that provide
the largest F -statistic through one-way analysis of variance (ANOVA).
Furthermore, general equivalences between some parametric image-thresholding
methods and the search for optimal thresholds with the largest likelihood-ratio
test statistics are briefly discussed.
IMG12: Rate Control Scheme for
Consistent Video Quality in Scalable Video Codec
Abstract
Multimedia
data delivered to mobile devices over wireless channels or the Internet are
complicated by bandwidth fluctuation and the variety of mobile devices.
Scalable video coding has been developed as an extension of H.264/AVC to solve
this problem. Since scalable video codec provides various scalabilities to
adapt the bitstream for the channel conditions and terminal types, scalable
codec is one of the useful codecs for wired or wireless multimedia
communication systems, such as IPTV and streaming services. In such scalable
multimedia communication systems, video quality fluctuation degrades the visual
perception significantly. It is important to efficiently use the target bits in
order to maintain a consistent video quality or achieve a small distortion
variation throughout the whole video sequence. The scheme proposed in this
paper provides a useful function to control video quality in applications
supporting scalability, whereas conventional schemes have been proposed to
control video quality in the H.264 and MPEG-4 systems. The proposed algorithm
decides the quantization parameter of the enhancement layer to maintain a
consistent video quality throughout the entire sequence. The video quality of
the enhancement layer is controlled based on a closed-form formula which
utilizes the residual data and quantization error of the base layer. The
simulation results show that the proposed algorithm controls the frame quality
of the enhancement layer in a simple operation, where the parameter decision
algorithm is applied to each frame.
IMG13: Learning Adaptive
Metric for Robust Visual Tracking
Abstract
Matching
the visual appearances of the target over consecutive image frames is the most
critical issue in video-based object tracking. Choosing an appropriate distance
metric for matching determines its accuracy and robustness, and thus
significantly influences the tracking performance. Most existing tracking
methods employ fixed pre-specified distance metrics. However, this simple
treatment is problematic and limited in practice, because a pre-specified
metric does not likely to guarantee the closest match to be the true target of
interest. This paper presents a new tracking approach that incorporates
adaptive metric learning into the framework of visual object tracking.
Collecting a set of supervised training samples on-the-fly in the observed
video, this new approach automatically learns the optimal distance metric for
more accurate matching. The design of the learned metric ensures that the
closest match is very likely to be the true target of interest based on the
supervised training. Such a learned metric is discriminative and adaptive. This
paper substantializes this new approach in a solid case study of
adaptive-metric differential tracking, and obtains a closed-form analytical
solution to motion estimation and visual tracking. Moreover, this paper extends
the basic linear distance metric learning method to a more powerful nonlinear
kernel metric learning method. Extensive experiments validate the effectiveness
of the proposed approach, and demonstrate the improved performance of the
proposed new tracking method.
IMG14: Image Decomposition
With Multilabel Context: Algorithms and Applications
Abstract
Most
research on image decomposition, e.g., image segmentation and image parsing,
has predominantly focused on the low-level visual clues within a single image
and neglected the contextual information across images. In this paper, we
present a new perspective to image decomposition piloted by the multilabel
context associated with each individual image. Observing that the contextual
information (i.e., local label representations of the same label are similar
while those from different labels are dissimilar) exists across images, we
propose to perform image decomposition in a collective way and obtain an
optimal representation for each label from a set of multilabeled images. We
formulate the problem as an optimization problem which maximizes inter-label
difference while minimizing the intra-label difference of the target label
representations and propose two ways to solve this problem. Such a contextual
image decomposition has a wide variety of applications, among which two
exemplary ones-multilabel image annotation and label ranking, are presented and
evaluated with different classification techniques. Extensive experiments on
two benchmark datasets demonstrate promising results.
IMG15: Graph Laplace for
Occluded Face Completion and Recognition
Abstract
This
paper proposes a spectral-graph-based algorithm for face image repairing, which
can improve the recognition performance on occluded faces. The face completion
algorithm proposed in this paper includes three main procedures: 1) sparse
representation for partially occluded face classification; 2) image-based data
mining; and 3) graph Laplace (GL) for face image completion. The novel part of
the proposed framework is GL, as named from graphical models and the Laplace
equation, and can achieve a high-quality repairing of damaged or occluded
faces. The relationship between the GL and the traditional Poisson equation is
proven. We apply our face repairing algorithm to produce completed faces, and
use face recognition to evaluate the performance of the algorithm. Experimental
results verify the effectiveness of the GL method for occluded face completion.
IMG16: Fast Transforms for
Acoustic Imaging—Part II: Applications
Abstract
In Part I [“Fast Transforms for Acoustic Imaging-Part I: Theory,” IEEE Transactions on
Image Processing], we introduced the Kronecker array transform (KAT), a fast
transform for imaging with separable arrays. Given a source distribution, the
KAT produces the spectral matrix which would be measured by a separable sensor
array. In Part II, we establish connections between the KAT, beamforming and
2-D convolutions, and show how these results can be used to accelerate
classical and state of the art array imaging algorithms. We also propose using
the KAT to accelerate general purpose regularized least-squares solvers. Using
this approach, we avoid ill-conditioned deconvolution steps and obtain more
accurate reconstructions than previously possible, while maintaining low
computational costs. We also show how the KAT performs when imaging near-field
source distributions, and illustrate the trade-off between accuracy and
computational complexity. Finally, we show that separable designs can deliver
accuracy competitive with multi-arm logarithmic spiral geometries, while having
the computational advantages of the KAT.
IMG17: Denoising-Enhancing Images
on Elastic Manifolds
Abstract
The
conflicting demands for simultaneous low-pass and high-pass processing,
required in image denoising and enhancement, still present an outstanding
challenge, although a great deal of progress has been made by means of adaptive
diffusion-type algorithms. To further advance such processing methods and
algorithms, we introduce a family of second-order (in time) partial
differential equations. These equations describe the motion of a thin elastic
sheet in a damping environment. They are also derived by a variational approach
in the context of image processing. The new operator enables better edge
preservation in denoising applications by offering an adaptive lowpass filter,
which preserves high-frequency components in the pass-band better than the
adaptive diffusion filter, while offering slower error propagation across
edges. We explore the action of this powerful operator in the context of image
processing and exploit for this purpose the wealth of knowledge accumulated in
physics and mathematics about the action and behavior of this operator. The
resulting methods are further generalized for color and/or texture image
processing, by embedding images in multidimensional manifolds. A specific
application of the proposed new approach to superresolution is outlined.
IMG18: Affine Legendre Moment
Invariants for Image Watermarking Robust to Geometric Distortions
Abstract
Geometric
distortions are generally simple and effective attacks for many watermarking
methods. They can make detection and extraction of the embedded watermark
difficult or even impossible by destroying the synchronization between the
watermark reader and the embedded watermark. In this paper, we propose a new
watermarking approach which allows watermark detection and extraction under
affine transformation attacks. The novelty of our approach stands on a set of
affine invariants we derived from Legendre moments. Watermark embedding and
detection are directly performed on this set of invariants. We also show how
these moments can be exploited for estimating the geometric distortion
parameters in order to permit watermark extraction. Experimental results show
that the proposed watermarking scheme is robust to a wide range of attacks:
geometric distortion, filtering, compression, and additive noise.
IMG19: A New Color Filter
Array With Optimal Properties for Noiseless and Noisy Color Image Acquisition
Abstract
Digital
color cameras acquire color images by means of a sensor on which a color filter
array (CFA) is overlaid. The Bayer CFA dominates the consumer market, but there
has recently been a renewed interest for the design of CFAs . However,
robustness to noise is often neglected in the design, though it is crucial in
practice. In this paper, we present a new 2 × 3-periodic CFA which provides, by
construction, the optimal tradeoff between robustness to aliasing, chrominance
noise and luminance noise. Moreover, a simple and efficient linear demosaicking
algorithm is described, which fully exploits the spectral properties of the
CFA. Practical experiments confirm the superiority of our design, both in
noiseless and noisy scenarios.
IMG20: Topology Preserving
Warping of 3-D Binary Images According to Continuous One-to-One Mappings
Abstract
The
estimation of one-to-one mappings is one of the most intensively studied topics
in the research field of nonrigid registration. Although the computation of
such mappings can be now accurately and efficiently performed, the solutions
for using them in the context of binary image deformation is much less
satisfactory. In particular, warping a binary image with such transformations
may alter its discrete topological properties if common resampling strategies
are considered. In order to deal with this issue, this paper proposes a method
for warping such images according to continuous and bijective mappings while
preserving their discrete topological properties (i.e., their homotopy type).
Results obtained in the context of the atlas-based segmentation of complex
anatomical structures highlight the advantages of the proposed approach.
IMG21: Robust Spatiotemporal
Matching of Electronic Slides to Presentation Videos
Abstract
We
describe a robust and efficient method for automatically matching and
time-aligning electronic slides to videos of corresponding presentations.
Matching electronic slides to videos provides new methods for indexing,
searching, and browsing videos in distance-learning applications. However,
robust automatic matching is challenging due to varied frame composition, slide
distortion, camera movement, low-quality video capture, and arbitrary slides
sequence. Our fully automatic approach combines image-based matching of slide
to video frames with a temporal model for slide changes and camera events. To
address these challenges, we begin by extracting scale-invariant
feature-transformation (SIFT) keypoints from both slides and video frames, and
matching them subject to a consistent projective transformation (homography) by
using random sample consensus (RANSAC). We use the initial set of matches to
construct a background model and a binary classifier for separating video
frames showing slides from those without. We then introduce a new matching
scheme for exploiting less distinctive SIFT keypoints that enables us to tackle
more difficult images. Finally, we improve upon the matching based on visual
information by using estimated matching probabilities as part of a hidden
Markov model (HMM) that integrates temporal information and detected camera
operations. Detailed quantitative experiments characterize each part of our
approach and demonstrate an average accuracy of over 95% in 13 presentation
videos.
IMG22: Maintaining Temporal
Coherence in Video Retargeting Using Mosaic-Guided Scaling
Abstract
Video
retargeting from a full-resolution video to a lower resolution display will
inevitably cause information loss. Content-aware video retargeting techniques
have been studied to avoid critical visual information loss while resizing a
video. Maintaining the spatio-temporal coherence of a retargeted video is very
critical on visual quality. Camera motions and object motions, however, usually
make it difficult to maintain temporal coherence using existing schemes. In
this paper, we propose the use of a panoramic mosaic to guide the scaling of
corresponding regions of video frames in a video shot to ensure good temporal
coherence. In the proposed method, after aligning video frames in a shot to a
panoramic mosaic constructed for the shot, a global scaling map for these
frames is derived from the panoramic mosaic. Subsequently, the local scaling
maps of individual frames are derived from the global map and is further refined
according to spatial coherence constraints. Our experimental results show that
the proposed method can effectively maintain temporal coherence so as to
achieve good visual quality even a video contains camera motions and object
motions.
IMG23: Integer Computation of
Lossy JPEG2000 Compression
Abstract
In
this paper, an integer-based Cohen-Daubechies-Feauvea (CDF) 9/7 wavelet
transform as well as an integer quantization method used in a lossy JPEG2000
compression engine is presented. The conjunction of both an integer transform
and quantization step allows for a complete integer computation of lossy
JPEG2000 compression. The lossy method of compression utilizes the CDF 9/7
wavelet filter, which transforms integer input pixel values into floating-point
wavelet coefficients that are then quantized back into integers and finally
compressed by the embedded block coding with optimal truncation tier-1 encoder.
Integer computation of JPEG2000 allows a reduction in computational complexity
of the wavelet transform as well as ease of implementation in embedded systems
for higher computational performance. The results of the integer computation
show an equivalent rate/distortion curve to the JasPer JPEG2000 compression
engine, as well as a 30% reduction in computation time of the wavelet transform
and a 56% reduction in computation time of the quantization processing on an
average.
IMG24: Hybrid No-Reference
Natural Image Quality Assessment of Noisy, Blurry, JPEG2000, and JPEG Images
Abstract
In
this paper, we propose a new image quality assessment method based on a hybrid
of curvelet, wavelet, and cosine transforms called hybrid no-reference (HNR)
model. From the properties of natural scene statistics, the peak coordinates of
the transformed coefficient histogram of filtered natural images occupy
well-defined clusters in peak coordinate space, which makes NR possible.
Compared to other methods, HNR has three benefits: 1) It is an NR method
applicable to arbitrary images without compromising the prediction accuracy of
full-reference methods; 2) as far as we know, it is the only general NR method
well suited for four types of filters: noise, blur, JPEG2000, and JPEG
compression; and 3) it can classify the filter types of the image and predict
filter levels even when the image is results from the application of two
different filters. We tested HNR on very intensive video image database (our
image library) and Laboratory for Image & Video Engineering (a public
library). Results are compared to the state-of-the-art methods including peak
SNR, structural similarity, visual information fidelity, and so on.
IMG26: Fast Transforms for
Acoustic Imaging— Part I: Theory
Abstract
The
classical approach for acoustic imaging consists of beamforming, and produces
the source distribution of interest convolved with the array point spread
function. This convolution smears the image of interest, significantly reducing
its effective resolution. Deconvolution methods have been proposed to enhance
acoustic images and have produced significant improvements. Other proposals
involve covariance fitting techniques, which avoid deconvolution altogether.
However, in their traditional presentation, these enhanced reconstruction
methods have very high computational costs, mostly because they have no means
of efficiently transforming back and forth between a hypothetical image and the
measured data. In this paper, we propose the Kronecker Array Transform (KAT), a
fast separable transform for array imaging applications. Under the assumption
of a separable array, it enables the acceleration of imaging techniques by
several orders of magnitude with respect to the fastest previously available
methods, and enables the use of state-of-the-art regularized least-squares
solvers. Using the KAT, one can reconstruct images with higher resolutions than
was previously possible and use more accurate reconstruction techniques,
opening new and exciting possibilities for acoustic imaging.
IMG27: Depth
No-Synthesis-Error Model for View Synthesis in 3-D Video
Abstract
Currently,
3-D Video targets at the application of disparity-adjustable stereoscopic
video, where view synthesis based on depth-image-based rendering (DIBR) is
employed to generate virtual views. Distortions in depth information may
introduce geometry changes or occlusion variations in the synthesized views. In
practice, depth information is stored in 8-bit grayscale format, whereas the
disparity range for a visually comfortable stereo pair is usually much less
than 256 levels. Thus, several depth levels may correspond to the same integer
(or sub-pixel) disparity value in the DIBR-based view synthesis such that some
depth distortions may not result in geometry changes in the synthesized view.
From this observation, we develop a depth no-synthesis-error (D-NOSE) model to
examine the allowable depth distortions in rendering a virtual view without
introducing any geometry changes. We further show that the depth distortions
prescribed by the proposed D-NOSE profile also do not compromise the occlusion
order in view synthesis. Therefore, a virtual view can be synthesized
losslessly if depth distortions follow the D-NOSE specified thresholds. Our
simulations validate the proposed D-NOSE model in lossless view synthesis and
demonstrate the gain with the model in depth coding.
IMG28: Comparison of Texture
Analysis Schemes Under Nonideal Conditions
Abstract
Several
recent advancements in the field of texture analysis prompt some fundamental
questions. For instance, what is the true impact of these novel advancements
under real-world environments? When do these novel advancements fail to
perform? Which methods perform better and under what conditions? In this work,
we investigate these and other issues under nonideal image acquisition
environments, specifically, environments with changing conditions due to
illumination variations and those caused by both affine and nonaffine
transformations. We study the performance of nine popular texture analysis
algorithms using three different datasets, with varying levels of difficulty.
Experiments are performed on nonideal texture datasets under five different
setups. We find that most state-of-the-art techniques do not perform well under
these conditions. To a large extent, their performance under nonideal
conditions depends critically on the nature of the textural surface. Moreover,
most techniques fail to perform reliably when the number of classes in the
dataset is increased significantly, over the regular-size datasets used in
previous work. Multiscale features performed reasonably well against variations
caused by illumination and rotation but are prone to fail under changes in
scale. Surprisingly, the performance for most of the algorithms is generally
stable on structured or periodic textures, even with variations in illumination
or affine transformations.
IMG29: A New Scheme for Robust
Gradient Vector Estimation in Color Images
Abstract
Gradient
estimators are mostly designed to yield accurate and robust estimates of the
gradient magnitude, not the gradient direction. This paper proposes a method
for the accurate and robust estimation of both the gradient magnitude and
direction. It robustly estimates the gradient in the x- and y-directions.
The robustness against noise is achieved by prefiltering and postfiltering of
the gradient in each direction. To reduce edge blurring effects introduced by
these filters, the gradient in a certain direction is obtained by applying the
prefilter and postfilter in the perpendicular direction. The basic elements
employed in each window are: highpass, lowpass and aggregation operators. The
highpass operator is used as a gradient estimator, the lowpass operator is for
prefiltering and postfiltering, and the aggregation operator is for aggregating
the prefiltered and postfiltered gradients. Four different combinations of
highpass, lowpass and aggregation operators are proposed: MVD-Median-Mean,
MVD-Median-Max, RCMG-Median-Mean, and RCMG-Median-Max. Experimental results
show that the RCMG-Median-Mean has the best performance in estimating the
gradient and detecting the edges in noisy color images. It is computationally
more efficient than the state-of-the-art gradient estimators and is able to
accurately estimate the gradient direction as well as the gradient magnitude.
Computer simulation results show that the proposed method outperforms other
recently proposed color gradient estimators and edge detectors.
IMG30: DART: A Practical
Reconstruction Algorithm for Discrete Tomography
Abstract
In
this paper, we present an iterative reconstruction algorithm for discrete
tomography, called discrete algebraic reconstruction technique (DART). DART can
be applied if the scanned object is known to consist of only a few different
compositions, each corresponding to a constant gray value in the
reconstruction. Prior knowledge of the gray values for each of the compositions
is exploited to steer the current reconstruction towards a reconstruction that
contains only these gray values. Based on experiments with both simulated CT
data and experimental $mu$CT data, it is shown that DART is capable of
computing more accurate reconstructions from a small number of projection
images, or from a small angular range, than alternative methods. It is also
shown that DART can deal effectively with noisy projection data and that the
algorithm is robust with respect to errors in the estimation of the gray values.
IMG31: Automatic Craniofacial
Structure Detection on Cephalometric Images
Abstract
Anatomical
structure tracing on cephalograms is a significant way to obtain cephalometric
analysis. Cephalometric analysis is divided in two categories, manual and
automatic approaches. The manual approach is limited in accuracy and
repeatability due to differences in inter- and intra-personal marking. In this
paper, we have attempted to develop and test a novel method for automatic
localization of craniofacial structures based on the detected edges in the
region of interest. Before edge detection of the particular region, the region
was filtered by adaptive non local filter for noise removal by keeping the edge
information undisturbed. According to the gray-scale feature at the different
regions of the cephalograms, modified Canny edge detection algorithm for
obtaining tissue contour was proposed. With the application of morphological
opening and edge linking approaches, an improved bidirectional contour tracing
methodology was proposed by an interactive selection of the starting edge
pixels, the tracking process searches repetitively for an edge pixel at the
neighborhood of previously searched edge pixel to segment images, and then
craniofacial structures are obtained. The effectiveness of the algorithm is
demonstrated by the preliminary experimental results obtained with the proposed
method.
IMG32: Generating Descriptive
Visual Words and Visual Phrases for Large-Scale Image Applications
Abstract
Bag-of-visual
Words (BoWs) representation has been applied for various problems in the fields
of multimedia and computer vision. The basic idea is to represent images as
visual documents composed of repeatable and distinctive visual elements, which
are comparable to the text words. Notwithstanding its great success and wide
adoption, visual vocabulary created from single-image local descriptors is
often shown to be not as effective as desired. In this paper, descriptive
visual words (DVWs) and descriptive visual phrases (DVPs) are proposed as the
visual correspondences to text words and phrases, where visual phrases refer to
the frequently co-occurring visual word pairs. Since images are the carriers of
visual objects and scenes, a descriptive visual element set can be composed by
the visual words and their combinations which are effective in representing
certain visual objects or scenes. Based on this idea, a general framework is
proposed for generating DVWs and DVPs for image applications. In a large-scale
image database containing 1506 object and scene categories, the visual words
and visual word pairs descriptive to certain objects or scenes are identified
and collected as the DVWs and DVPs. Experiments show that the DVWs and DVPs are
informative and descriptive and, thus, are more comparable with the text words
than the classic visual words. We apply the identified DVWs and DVPs in several
applications including large-scale near-duplicated image retrieval, image
search re-ranking, and object recognition. The combination of DVW and DVP
performs better than the state of the art in large-scale near-duplicated image
retrieval in terms of accuracy, efficiency and memory consumption. The proposed
image search re-ranking algorithm: DWPRank outperforms the state-of-the-art algorithm
by 12.4% in mean average precision and about 11 times faster in efficiency.
IMG33: Fast Bilateral Filter
With Arbitrary Range and Domain Kernels
Abstract
In
this paper, we present a fast implementation of the bilateral filter with
arbitrary range and domain kernels. It is based on the histogram-based fast
bilateral filter approximation that uses uniform box as the domain kernel.
Instead of using a single box kernel, multiple box kernels are used and
optimally combined to approximate an arbitrary domain kernel. The method
achieves better approximation of the bilateral filter compared to the single
box kernel version with little increase in computational complexity. We also
derive the optimal kernel size when a single box kernel is used.
IMG34: Computational Color
Constancy: Survey and Experiments
Abstract
Computational
color constancy is a fundamental prerequisite for many computer vision
applications. This paper presents a survey of many recent developments and
state-of-the-art methods. Several criteria are proposed that are used to assess
the approaches. A taxonomy of existing algorithms is proposed and methods are
separated in three groups: static methods, gamut-based methods, and
learning-based methods. Further, the experimental setup is discussed including
an overview of publicly available datasets. Finally, various freely available
methods, of which some are considered to be state of the art, are evaluated on
two datasets.
IMG35: A No-Reference Image
Blur Metric Based on the Cumulative Probability of Blur Detection (CPBD)
Abstract
This
paper presents a no-reference image blur metric that is based on the study of
human blur perception for varying contrast values. The metric utilizes a
probabilistic model to estimate the probability of detecting blur at each edge
in the image, and then the information is pooled by computing the cumulative
probability of blur detection (CPBD). The performance of the metric is
demonstated by comparing it with existing no-reference sharpness/blurriness
metrics for various publicly available image databases.
IMG36: A Closed-Form
Approximation of the Exact Unbiased Inverse of the Anscombe
Variance-Stabilizing Transformation
Abstract
We
presented an exact unbiased inverse of the Anscombe variance-stabilizing transformation
in [M. Mäkitalo and A. Foi, “Optimal inversion of the Anscombe transformation
in low-count Poisson image denoising,” IEEE Trans. Image Process., vol. 20, no.
1, pp. 99–109, Jan. 2011.] and showed that when applied to Poisson image
denoising, the combination of variance stabilization and state-of-the-art
Gaussian denoising algorithms is competitive with some of the best Poisson
denoising algorithms. We also provided a MATLAB implementation of our method,
where the exact unbiased inverse transformation appears in nonanalytical form.
Here, we propose a closed-form approximation of the exact unbiased inverse in
order to facilitate the use of this inverse. The proposed approximation
produces results equivalent to those obtained with the accurate (nonanalytical)
exact unbiased inverse, and thus, notably better than one would get with the
asymptotically unbiased inverse transformation that is commonly used in
applications.
IMG37: A Bayesian Network
Model for Automatic and Interactive Image Segmentation
Abstract
We
propose a new Bayesian network (BN) model for both automatic and interactive
image segmentation. A multilayer BN is constructed from an oversegmentation to
model the statistical dependencies among superpixel regions, edge segments,
vertices, and their measurements. The BN also incorporates various local
constraints to further restrain the relationships among these image entities.
Given the BN model and various image measurements, belief propagation is
performed to update the probability of each node. Image segmentation is
generated by the most probable explanation inference of the true states of both
region and edge nodes from the updated BN. Besides the automatic image
segmentation, the proposed model can also be used for interactive image segmentation.
While existing interactive segmentation (IS) approaches often passively depend
on the user to provide exact intervention, we propose a new active input
selection approach to provide suggestions for the user's intervention. Such
intervention can be conveniently incorporated into the BN model to perform
actively IS. We evaluate the proposed model on both the Weizmann dataset and
VOC2006 cow images. The results demonstrate that the BN model can be used for
automatic segmentation, and more importantly, for actively IS. The experiments
also show that the IS with active input selection can improve both the overall
segmentation accuracy and efficiency over the IS with passive intervention.
IMG38: A Flexible
Content-Adaptive Mesh-Generation Strategy for Image Representation
Abstract
Based
on the greedy-point removal (GPR) scheme of Demaret and Iske, a simple yet
highly effective framework for constructing triangle-mesh representations of
images, called GPRFS, is proposed. By using this framework and ideas from the
error diffusion (ED) scheme (for mesh-generation) of Yang , a highly effective
mesh-generation method, called GPRFS-ED, is derived and presented. Since the ED
scheme plays a crucial role in our work, factors affecting the performance of
this scheme are also studied in detail. Through experimental results, our
GPRFS-ED method is shown to be capable of generating meshes of quality
comparable to, and in many cases better than, the state-of-the-art GPR scheme,
while requiring substantially less computation and memory. Furthermore, with
our GPRFS-ED method, one can easily trade off between mesh quality and
computational/memory complexity. A reduced-complexity version of the GPRFS-ED
method (called GPRFS-MED) is also introduced to further demonstrate the
computational/memory-complexity scalability of our GPRFS-ED method.
IMGV39: High Capacity Color Barcodes: Per
Channel Data Encoding via Orientation Modulation in Elliptical Dot Arrays
We
present a new high capacity
color barcode. The barcode we propose uses the cyan,
magenta, and yellow (C,M,Y) colorant separations available in
color printers and enables high
capacity by independently encoding
data in each of these
separations. In each colorant channel,
payload data is conveyed by using a periodic array
of elliptically shaped dots whose individual orientations are modulated to encode the data. The orientation based data encoding provides
beneficial robustness against printer and scanner tone variations. The overall color barcode is obtained when these color
separations are printed in overlay as is common in color printing. A reader recovers
the barcode data from a conventional color scan of the barcode, using red, green, and blue
(R,G,B) channels complementary, respectively, to the
print C, M, and Y channels. For each channel, first the periodic arrangement of dots is exploited at the reader to enable synchronization
by compensating for both global rotation/scaling in
scanning and local distortion in printing. To
overcome the color interference resulting from
colorant absorptions in noncomplementary scanner channels, we propose a novel interference minimizing data encoding approach and a
statistical channel model (at the reader) that
captures the characteristics of the interference, enabling more accurate data recovery. We also employ an error correction
methodology that effectively utilizes the channel
model. The experimental results show that the proposed method works well,
offering (error-free) operational rates that are comparable to or better than
the highest capacity barcodes
known in the literature.
IMGV40: ViBe: A Universal Background Subtraction Algorithm for Video Sequences
This
paper presents a technique for
motion detection that incorporates several innovative mechanisms. For example, our proposed technique stores, for each pixel, a set of values
taken in the past at the same location or in the neighborhood. It then compares
this set to the current pixel value in order to determine whether that pixel
belongs to the background, and adapts the model by
choosing randomly which values to substitute from the background
model. This approach differs from those based upon the classical belief that
the oldest values should be replaced first. Finally, when the pixel is found to
be part of the background, its value is propagated
into the background model of a
neighboring pixel. We describe our method in full details (including
pseudo-code and the parameter values used) and compare it to other background subtraction
techniques. Efficiency figures show that our method outperforms recent and
proven state-of-the-art methods in terms of both computation speed and
detection rate. We also analyze the performance of a
downscaled version of our algorithm to the absolute
minimum of one comparison and one byte of memory per pixel. It appears that
even such a simplified version of our algorithm performs better than mainstream techniques
IMGV41: GRISSOM Platform: Enabling
Distributed Processing and Management of Biological Data Through Fusion of Grid
and Web Technologies
Transcriptomic
technologies have a critical impact in the revolutionary changes that reshape biological research. Through
the recruitment of novel high-throughput
instrumentation and advanced computational
methodologies, an unprecedented wealth of
quantitative data is produced. Microarray
experiments are considered high-throughput, both in terms of
data volumes (data
intensive) and processing
complexity (computationally intensive). In this paper, we present grids for in silico systems biology and
medicine (GRISSOM), a web-based application that
exploits GRID infrastructures for distributed data processing and management, of DNA microarrays
(cDNA, Affymetrix, Illumina) through a generic,
consistent, computational analysis framework. GRISSOM
performs versatile annotation and integrative
analysis tasks, through the use of
third-party application programming interfaces, delivered as web services. In
parallel, by conforming to service-oriented architectures, it can be
encapsulated in other biomedical processing
workflows, with the help of workflow enacting
software, like Taverna Workbench, thus rendering access to its algorithms,
transparent and generic. GRISSOM
aims to set a generic paradigm of efficient
metamining that promotes translational research in biomedicine, through the fusion of grid and
semantic web computing technologies
IMGV42: MR Image Reconstruction From Highly
Under sampled k-Space Data by Dictionary Learning
Compressed
sensing (CS) utilizes the sparsity of magnetic resonance (MR)
images to enable accurate reconstruction
from undersampled k-space data. Recent CS methods
have employed analytical sparsifying transforms such as wavelets, curvelets,
and finite differences. In this paper, we propose a novel framework for
adaptively learning the sparsifying transform (dictionary), and reconstructing the image
simultaneously from highly
undersampled k-space data. The sparsity in this framework is enforced on
overlapping image patches emphasizing local
structure. Moreover, the dictionary is adapted to
the particular image instance thereby favoring
better sparsities and consequently much higher undersampling rates. The
proposed alternating reconstruction algorithm learns
the sparsifying dictionary, and uses it to remove
aliasing and noise in one step, and subsequently restores and fills-in the k-space data
in the other step. Numerical experiments are conducted on MR
images and on real MR data of several anatomies with a variety of sampling
schemes. The results demonstrate dramatic improvements on the order of 4-18 dB
in reconstruction error and doubling of the
acceptable undersampling factor using the proposed adaptive dictionary as compared to previous CS methods. These
improvements persist over a wide range of practical data
signal-to-noise ratios, without any parameter tuning
IMGV43: Game-Theoretic Strategies and
Equilibriums in Multimedia Fingerprinting Social Networks
Multimedia
social network is a network infrastructure in which
the social network users
share multimedia contents with all different
purposes. Analyzing user behavior in multimedia social networks helps design more secured and
efficient multimedia and
networking systems. Multimedia fingerprinting
protects multimedia from illegal alterations and multiuser collusion is a cost-effective attack. The
colluder social network
is naturally formed during multiuser collusion with which colluders gain reward
by redistributing the colluded multimedia contents.
Since the colluders have conflicting interest, the maximal-payoff collusion for
one colluder may not be the maximal-payoff collusion for others. Hence, before
a collusion being successful, the colluders must bargain with each other to
reach agreements. We first model the bargaining behavior among colluders as a
noncooperative game and
study four different bargaining solutions of this game.
Moreover, the market value of the redistributed multimedia
content is often time-sensitive. The earlier the colluded copy being released,
the more the people are willing to pay for it. Thus, the colluders have to
reach agreements on how to distribute reward and
risk among themselves as soon as possible. This paper further incorporates this
time-sensitiveness of the colluders' reward and
studies the time-sensitive bargaining equilibrium. The study in this paper reveals the strategies
that are optimal for the colluders; thus, all the colluders have no inventive
to disagree. Such understanding reduces the possible types of collusion into a
small finite set.
IMGV44: Moving Region Segmentation From Compressed Video Using Global Motion
Estimation and Markov Random Fields
In
this paper, we propose an unsupervised segmentation
algorithm for extracting moving regions
from compressed video using global motion estimation (GME) and Markov random field (MRF) classification. First, motion
vectors (MVs) are compensated from global motion and quantized into several representative classes, from which MRF priors are estimated. Then, a coarse segmentation map of the MV field
is obtained using a maximum a posteriori estimate of
the MRF label process. Finally, the boundaries of segmented moving regions are refined using color and edge
information. The algorithm has been validated on a number of test sequences, and experimental results are provided to demonstrate its
advantages over state-of-the-art methods
IMGV44:
Towards Brain First-Aid: A Diagnostic Device for Conscious
Awareness -
When the brain
is damaged, evaluating an individual's level of awareness
can be a major diagnostic
challenge (Is he or she in there?). Existing tests typically rely on behavioral
indicators, which are incorrect in as many as one out of every two cases. The
current paper presents a diagnostic
device that addresses this problem. The technology
circumvents behavioral limitations through noninvasive brain
wave measurements (electroencephalography, or EEG). Unlike traditional EEG, the
device is designed for
point-of-care use by incorporating a portable,
user-friendly, and stable design. It uses a novel
software algorithm that automates subject stimulation, data
acquisition/analysis, and the reporting of results. The test provides
indicators for five identifiable levels of neural
processing: sensation, perception, attention, memory, and language. The results
are provided as rapidly obtained diagnostic,
reliability, validity, and prognostic scores. The device
can be applied to a wide variety of patients across a host of different environments. The technology is
designed to be wireless-enabled for remote
monitoring and assessment capabilities. In essence, the device
is developed to scan for conscious
awareness in order to optimize subsequent patient
care.
IMGV45:
Size-Controllable Region-of-Interest in Scalable Image
Representation
Differentiating region-of-interest (ROI) from non-ROI in an image in terms of relative size as well as fidelity becomes an important functionality for future visual communication environment with a variety of display devices. In this paper, we propose a scalable image representation with the ROI functionality in the spatial domain, which allows us to generate a hierarchy of images with arbitrary sizes. The ROI functionality of our scalable representation is a result of a nonuniform grid transformation in the spatial domain, where only the center of ROI and an expansion parameter are to be known. Our grid transformation guarantees no loss of information within the area of ROI.
IMGV46:
Optimizing a Tone Curve for Backward-Compatible High Dynamic
Range Image and Video Compression
For
backward compatible high dynamic range (HDR) video compression, the HDR sequence is reconstructed by inverse tone-mapping a compressed low dynamic range (LDR) version of
the original HDR content. In this paper, we show that the appropriate choice of
a tone-mapping operator
(TMO) can significantly improve the reconstructed HDR quality. We develop a statistical model that approximates the distortion resulting
from the combined processes of tone-mapping and compression. Using this
model, we formulate a numerical optimization problem
to find the tone-curve
that minimizes the expected mean square error (MSE) in the reconstructed HDR
sequence. We also develop a simplified model that
reduces the computational complexity of the optimization problem to a closed-form solution. Performance evaluations show that
the proposed methods provide superior performance in terms of HDR MSE and SSIM compared to existing tone-mapping
schemes. It is also shown that the LDR image quality
resulting from the proposed methods matches that produced by perceptually-based
TMOs.
IMGV47: Retinal Image Analysis Using Curvelet
Transform and Multistructure Elements Morphology by Reconstruction
Retinal
images can be used in several applications, such as
ocular fundus operations as well as human recognition. Also, they play
important roles in detection of some diseases in early stages, such as
diabetes, which can be performed by comparison of
the states of retinal blood vessels. Intrinsic
characteristics of retinal images
make the blood vessel detection process difficult. Here, we proposed a new
algorithm to detect the retinal blood vessels
effectively. Due to the high ability of the curvelet
transform in representing the edges, modification of
curvelet transform
coefficients to enhance the retinal image edges better prepares the image
for the segmentation part. The directionality feature of the multistructure elements method
makes it an effective tool in edge detection. Hence, morphology
operators using multistructure
elements are applied to the enhanced image in order to find the retinal
image ridges. Afterward, morphological operators by reconstruction eliminate the
ridges not belonging to the vessel tree while trying to preserve the thin
vessels unchanged. In order to increase the efficiency of the morphological
operators by reconstruction,
they were applied using multistructure
elements. A simple thresholding method along with connected
components analysis (CCA) indicates the remained
ridges belonging to vessels. In order to utilize CCA more efficiently, we
locally applied the CCA and length filtering instead
of considering the whole image. Experimental results
on a known database, DRIVE, and achieving to more
than 94% accuracy in about 50 s for blood vessel detection, proved that the
blood vessels can be effectively detected by
applying our method on the retinal images.
BIOMEDICINE
IMGV48: Hybrid Genetic and Variational Expectation-Maximization Algorithm for
Gaussian-Mixture-Model-Based Brain MR Image Segmentation
The
expectation-maximization
(EM) algorithm has been widely applied to the
estimation of Gaussian mixture
model (GMM) in brain MR image segmentation.
However, the EM algorithm is deterministic and intrinsically prone to overfitting the training data and being trapped in local optima. In this paper, we
propose a hybrid genetic and variational EM (GA-VEM) algorithm for brain MR image
segmentation. In this approach, the VEM algorithm is performed to estimate the GMM, and the GA is employed to initialize the hyperparameters
of the conjugate prior distributions of GMM parameters involved in the VEM algorithm. Since GA has the potential to achieve global
optimization and VEM can steadily avoid overfitting,
the hybrid GA-VEM algorithm
is capable of overcoming the drawbacks of traditional EM-based
methods. We compared our approach to the EM-based,
VEM-based, and GA-EM based segmentation algorithms, and the segmentation routines used in the statistical parametric
mapping package and FMRIB Software Library in 20
low-resolution and 17 high-resolution brain MR studies. Our results
show that the proposed approach can improve substantially the performance of brain MR image
segmentation
MEDICAL
IMAGE
IMGV49Vessel Boundary Delineation on Fundus Images Using Graph-Based Approach
This
paper proposes an algorithm to measure the width of retinal vessels in fundus photographs using graph-based algorithm to segment both vessel
edges simultaneously. First, the simultaneous two-boundary
segmentation problem is modeled as a two-slice, 3-D surface segmentation
problem, which is further converted into the problem of computing a minimum
closed set in a node-weighted graph. An initial
segmentation is generated from a vessel probability image. We use the REVIEW database to evaluate diameter
measurement performance. The algorithm is robust and estimates the vessel width with subpixel accuracy. The method is used to
explore the relationship between the average vessel
width and the distance from the optic disc in 600 subjects
MULTIMEDIA
IMGV50:
Routing-Aware Multiple Description Video Coding Over Mobile
Ad-Hoc Networks -
This paper proposes a cross-layer approach called routing-aware multiple description coding with multipath transport to support video communications over mobile ad-hoc networks. This approach establishes a packet loss model based on the MAC access mechanism and network parameters, and utilizes it along with the routing messages from multipath routing to estimate the packet loss probability of transmitted video packets. Then the estimated results are passed to the application layer to assist reference frame selection for multiple description coding in order to mitigate error propagation introduced in the motion-compensated loop. Results show that this is an effective approach to improve error resilience of video transmission over mobile ad-hoc networks and enhance the video experience for multiple users. Optimal Bandwidth Assignment for Multiple-Description-Coded Video - 05659908.pdf
IMGV51:
Scalable Video Multicast in Hybrid 3G/Ad-Hoc Networks
Mobile video broadcasting service, or mobile TV, is expected to become a popular application for 3G wireless network operators. Most existing solutions for video Broadcast Multicast Services (BCMCS) in 3G networks employ a single transmission rate to cover all viewers. The system-wide video quality of the cell is therefore throttled by a few viewers close to the boundary, and is far from reaching the social-optimum allowed by the radio resources available at the base station. In this paper, we propose a novel scalable video broadcast/multicast solution, SV-BCMCS, that efficiently integrates scalable video coding, 3G broadcast, and ad-hoc forwarding to balance the system-wide and worst-case video quality of all viewers at 3G cell. We solve the optimal resource allocation problem in SV-BCMCS and develop practical helper discovery and relay routing algorithms. Moreover, we analytically study the gain of using ad-hoc relay, in terms of users' effective distance to the base station. Through extensive real video sequence driven simulations, we show that SV-BCMCS significantly improves the system-wide perceived video quality. The users' average PSNR increases by as much as 1.70 dB with slight quality degradation for the few users close to the 3G cell boundary.
NEURAL
NETWORK
IMGV52: A New Supervised Method for Blood Vessel Segmentation in Retinal Images by Using Gray-Level and Moment Invariants-Based Features
This paper presents a
new supervised method for blood
vessel detection in
digital retinal images.
This method uses a neural
network (NN) scheme for pixel classification and
computes a 7-D vector composed of gray-level and moment
invariants-based features for pixel representation.
The method was evaluated on the publicly available
DRIVE and STARE databases, widely used for this
purpose, since they contain retinal images where the vascular structure has been precisely
marked by experts. Method
performance on both sets of test images is better
than other existing solutions in literature. The method proves especially accurate for
vessel detection in STARE
images. Its application to this database (even when
the NN was trained on the DRIVE database) outperforms all analyzed segmentation approaches. Its effectiveness and robustness
with different image conditions, together with its
simplicity and fast implementation, make this blood vessel segmentation proposal
suitable for retinal image computer analyses such as automated screening for early diabetic retinopathy detection