The presence of shadow in an image is a major problem associated with various visual processing applications such as object recognition, traffic surveillance and segmentation. In this paper, we introduce a method to remove the shadow from a real image using the morphological diversities of shadows and sparse representation. The proposed approach first generates an invariant image and further processing is applied to the invariant image. Here, shadow removal is formulated as a decomposition problem that uses separate local dictionaries for shadow and nonshadow parts, without using single global or fixed generic dictionary. These local dictionaries are constructed from the patches extracted from the residual of the image obtained after invariant image formation. Finally, noniterative Morphological Component Analysisbased image decomposition using local dictionaries is performed to add the geometric component to the nonshadow part of the image so as to obtain shadow free version of the input image. The proposed approach of shadow removal works well for indoor and outdoor images, and the performance has been compared with previous methods and found to be better in terms of RMSE.
Shadow ; Dictionary learning ; Shadow removal ; Sparse coding ; MCA
Shadow detection and removal process is widely used as a preprocessing operation in various image processing applications for the removal of undesirable noise and objects. For example, applications such as video surveillance [1] , scene interpretation [2] and object recognition [3] require shadow removal as an initial step to eliminate the undesirable effect of noise on the performance of such systems. Once detected, shadows in images are used for applications such as detection of object shape and size in aerial images, detection of movement of objects in video surveillance system, and finding the number of light sources and illumination conditions in natural images. In digital photography, removal of shadows can help to improve the visual quality of photographs. Ignoring the existence of shadows in images can, in general, degrade the output quality.
Shadow detection and removal is an active research area for the last two decades. Several algorithms have been proposed based on learning [4] , color models [5] , region [6] and [7] and invariant image models [8] and [9] for shadow detection and removal in image as well as in videos. A major work by Lalonde et al. [10] mainly focuses on the shadows cast by objects onto the ground plane. Other notable works are based on assumptions of Lambertian reflectance and Planckian lighting [11] . Interested readers can see a review article by Sasi and Govindan [12] to get a more comprehensive report of the methodologies reported in the field of shadow detection and removal during the last decade.
Though shadow removal involving multiple images [13] and interactive methodologies [14] , [15] and [16] provides superior performance, fully automatic approaches available for single image shadow removal stand behind them in terms of performance. This is because of the fact that indoor and outdoor shadows are much affected by the direction, intensity of light source, as well as geometry and texture of the objects where shadow is cast.
The review carried out reveals that the research work reported in shadow detection and removal works satisfactorily in the case of user interaction [14] , [15] and [16] and for multiple images [13] . The automatic approaches available for single shadow removal are more complex to implement and set much restriction in the class of images under consideration [10] and [11] . Reintegration methods and local area processing are time intensive. Also, many methods cannot distinguish between near dark objects and shadows. In the case of smallpatch regions, image inpainting method is more suitable, whereas inpainting large patch holes involve huge computation. Thus, the topic of single image shadow detection and removal requires a good amount of further research to develop an approach that provides satisfactory performances.
This paper proposes a method to remove shadow from a real image using sparse representation and a variation of MCA. Sparsity is a powerful way to approximate signals and images by using a sparse linear combination of atoms from an overcomplete dictionary. Sparse representation is being used in signal and image processing applications such as denoising [17] , superresolution [18] , inpainting [19] , deblurring [20] , segmentation [21] , and compression [22] , demonstrating that sparse models are well suited to natural images as well. Starck et al. developed the idea of MCA in a series of papers and it was used in separating the texture from the geometric component [23] , [24] and [25] . MCA algorithm works by decomposing the image into edges and textures, using the morphological differences in these features [23] and [24] . Each of the shape features is related to a fixed dictionary of atoms such as wavelets or Discrete Cosine Transforms. Typical MCA is an iterative thresholding scheme in which the threshold linearly decreases to zero. Another similar work in the area is “Learning the Morphological Diversity” by Peyré et al. [26] . They introduced an adaptive MCA scheme by learning the morphologies of image layers. A combination of adaptive local dictionaries and fixed generic dictionaries using wavelets and curvelets is used for decomposition. The main deficiency of these models is the existence of similar atoms corresponding to cartoon and texture dictionaries that produce coherence. So, to get the expected solution, proper manual initialization is essential.
Apart from sparse representation, we are also using invariant image formation as a base step in the proposed shadow removal method. Many of the works in the area of shadow removal using invariant image formation were authored by Finlayson and his students [8] , [9] , [27] , [28] , [29] , [30] and [31] . In general, his methods are based on forming an invariant image, in which shadows do not appear followed by reconstructing the required missing components using reintegration. Invariant image formation results in the loss of photo quality of image. To bring back the fine details lost in the invariant image formation, reintegration is performed using Poisson equation by averaging over retinex paths [8] or Hamiltonian paths [27] . In most of the methods missing information after shadow removal is interpolated using image inpainting methods. Finlayson also limits his work to images that follow Lambertian model where Planckian illumination lights the scenes. However, real scenes need not satisfy Lambertian assumption.
The remaining part of this research report is put forth as in the following: The basics of sparse coding, dictionary learning, MCA and invariant image formation required for better readability of the paper is presented in Section 2 . The proposed method of shadow removal using sparse representation is discussed in Section 3 . The dataset used, results obtained and further discussions of the proposed work are given in Section 4 . The paper is concluded in Section 5 highlighting the approach used and performance gain achieved.
This section briefly reviews the theory behind sparse coding (SC), dictionary learning, morphological component analysis (MCA) and intrinsic image formation to better understand the proposed shadow removal methodology using MCA.
Using sparse coding, an image y can be expressed as a set of few elementary signals taken from an overcomplete dictionary A , subject to α should be sparse.

( 1) 
Considering noise and sparsity constraint, we can add a regularization parameter λ and reformulate (1) as

( 2) 
Solution to the above problem is NP hard; however, many convex [32] , nonconvex [33] optimization and greedy approximation algorithms [34] and [35] exist in literature to deal with problems having the above formulation. Since norm 2 minimization is equivalent to norm 1, L1 regularized LS also gives solution for the above problem [32] , [36] and [37] .
In dictionary learning, the algorithm is given samples of the form y = Aα , where is an unknown random sparse vector and A is an unknown dictionary matrix in . The goal is to learn A and α from given y such that
Dictionary learning can be formulated as given in (3) . For fixed dictionary solve system of equations subject to α is sparse. Then, for fixed α , update A.

( 3) 
where k denotes sparsity.
Dictionaries can be fixed, global or local. Global dictionaries are built from clean patches of selected images of a database. Earlier, in the sparse coding area, focus was mostly given to fixed overcomplete dictionaries, such as wavelets and discrete cosine transform [38] . These approaches are called generic since the dictionary is predefined. Local dictionaries are learned online and hence more adapted to the input image. Different methodologies exist in dictionary learning literature, starting from fixed dictionaries to online dictionaries [39] . Fixed dictionaries find sparse approximations of the set of training signals for fixed dictionary, whereas optimized dictionaries keep sparse signals fixed and build optimized dictionaries [40] and [41] . MOD, KSVD [42] , and online dictionary learning [39] and [43] are the popular algorithms in the area.
Morphological Component Analysis is used for the separation of the components of an image having different morphologies. MCA and Basis Pursuit are based on sparsity, but MCA is much faster and is capable of handling large data sets.
Consider an image y having ‘s ’ morphological components, such that , where y_{k} denotes the k th geometric or textural component of y . To decompose the image y into , the MCA algorithm finds the sparsest solution over the dictionaries A_{k} such that

( 4) 
where α_{k} denotes the k th sparse solution
MCA algorithm uses fixed generic dictionaries such as wavelets and curvelets in representing geometric component and DCT for textural components. A major step in this algorithm is the selection of dictionaries that are mutually incoherent. Further details about the MCA algorithm can be found in Reference [24] .
An invariant image is invariant to illumination, color and intensity. Illumination invariant image is invariant to illumination. Shadow is caused by illumination and hence illumination invariant image is free from shadows. However, invariant image formation degrades photo quality of image. The proposed method of shadow removal uses invariant image formation proposed by Finlayson et al. [28] . Finlayson et al. say that the correct angle of projection is one with minimum entropy in the resulting invariant image. The result is a gray scale image that is independent of shadows.
In the proposed approach, shadow removal is formulated as a variation of image decomposition problem using MCA, which uses separate local dictionaries for shadow and nonshadow parts. The input image is initially used to generate an illumination invariant image [28] in which shadows are absent. The formation of invariant image also results in the loss of photo quality of image; hence, texture and edge information is lost and we need to bring back these fine details of the image to get shadowfree resultant image. Invariant image is then subtracted from input image to get the residual image that consists of shadow and geometric information of the input image. Sparse coding using orthogonal Matching Pursuit is applied using locally learned dictionaries. Finally, missing geometric components are reintegrated to the invariant image to get the resultant shadowfree image. The major steps during the shadow removal process is given in Fig. 1 (a)–(f).

Fig. 1. (a) Input image I ; (b) shadow free invariant image I_{N} of input I ; (c) I_{S} = I − I_{N} ; (d) geometric component; (e) output of the proposed approach; and (f) ground truth.

Major differences between the proposed approach and MCA are
Input shadow image I is initially used to generate an illumination invariant image, I_{N} , in which shadows and relevant geometric information are absent. Difference between input image I and illumination invariant image I_{N} generates a residual image I_{S} where the shadows and the other geometric information are retained. Thus, I is now split into two images; the shadow image and the nonshadow image, I = I_{N} + I_{S} . Dictionary learning generates a dictionary using the training patches extracted from I_{S} . is further divided into two subdictionaries, which represent the shadow and the geometric components of I_{S} , respectively. The problem of shadow removal using the proposed approach is formulated as given in (5) . Sparse coding is performed using Mallat and Zhangs Orthogonal matching pursuit [34] .

( 5) 
where denotes the k th patch extracted from I_{S} . are the sparse coefficients of with respect to and L denotes the maximum number of nonzero coefficients of .
The proposed approach of shadow removal uses separate local dictionaries for shadow and nonshadow parts, instead of using single global dictionary. Even though shadows are affected globally, we perform decomposition and reconstruction locally, using local dictionary. Hence, using local dictionaries can contribute more results than using global dictionary. The proposed work mainly focuses on the use of a local dictionary or an adaptive strategy that builds the dictionary from the exemplar patches extracted from the shadow and the nonshadow regions of residual image (I_{S} ). The reasons for not selecting a single global dictionary are
Initial dictionary is built from a set of overlapping patches from an image having shadow part I_{S} . Dictionary learning for shadow detection problem can be formulated as follows

( 6) 
where α^{k} denotes the sparse representation coefficients of y^{k} based on , and λ represents regularization parameter.
To solve (6) , we used an online dictionary learning algorithm proposed by Mairal et al. [43] . The atoms constituting are further divided into two classes representing the geometric ( ) and the shadow components ( ). Figure 2 (a)–(f) illustrates the steps involved in partition. Dictionary partition is performed by computing texton histogram followed by kmeans clustering. The sum of the norm of each atoms texton histogram in a cluster is computed and the one giving the minimum norm belongs to a shadow cluster and the other cluster is a geometric component. This is based on the observation that norm of shadow cluster when ploted always gives less area under the curve, whereas geometric cluster gives higher value for the same. Thus

( 7) 

Fig. 2. (a) Input image I , (b) invariant image; (c) I_{S} = I_{N} − I ; (d) initial dictionary learned from I_{s} (e) Local Shadow dictionary; (f) Local Geometric dictionary.

The MCA algorithms distinguish between the morphological diversities of geometric, , and shadow, , subdictionaries by using mutual incoherence between them. The mutual coherence of and is defined as

( 8) 
where stand for the i th and j th atoms in and , respectively and denotes the inner product of . When each atom is normalized to have a unit l_{2} norm, the range of µ (A_{1} , A_{2} ) is [0, 1]. The mutual incoherence of is . A smaller value of mutual coherence indicates that there are much diversities in the two sub dictionaries and decomposition will be better.
OMP algorithm is applied to each extracted from I_{S} via minimization of (5) based on the two dictionaries and to find its sparse coefficients . To recover geometric component of I_{S} we use reconstructed patch as follows.

Finally, the shadowremoved image can be obtained via . Instead of iteratively performing sparse coding, here, MCA based image decomposition is performed only one time for each patch with respect to
The entire algorithm is summarized in Algorithm 1: Noniterative MCA based shadow removal.
This section gives detailed description about the implementation details, the dataset used, evaluation metrics and performance of the proposed shadow removal methodology.
Guo et al. [6] provide datasets for shadow detection as well as removal which are available in the University of Illinois at UrbanaChampaign database (UIUC).^{1} This dataset consists of 108 indoor as well as outdoor image pairs, photographed under a variety of illumination conditions. As this dataset also provides shadowfree image and ground truth of shadow region, it is useful for the evaluation of detection and removal.
Evaluation of the proposed method is performed using RMSE (Root mean squared error). Most of the shadow removal works reported in still images have not performed quantitative evaluation of the result. Guo et al. [6] , Miyazaki et al. [14] Gong and Cosker [15] , and Gryka et al. [16] are the major works that provide quantitative evaluation. Other notable works provide only visual comparison.
Experimental evaluation was performed on both indoor and outdoor images from UIUC database [6] . The images used in this paper include human subjects and natural objects in various environments comprising different background materials and textured surfaces. Figure 3 gives the sample result of few images using the proposed method. The procedure and implementation details of the proposed method are described as follows.

Fig. 3. Results of the proposed shadow removal process: Left – input image I ; Middle – output of the proposed approach; Right – Ground truth.

The implementation of the intrinsic image is adopted from Finlayson et al. [28] . Successful removal of shadow depends to a great extent on this first phase. This method is simple and works by finding the invariant direction followed by generating a gray scale image. Finally, an L1, a chromatic intrinsic image that is free of shadows, is formed. The method does not need any sort of camera calibration or prior knowledge about the image. Invariant image results in the loss of fine details, namely edges and geometric features. The proposed method tries to bring back these fine details into this invariant image to get the final shadowfree image. A plot given in Fig. 4 shows the difference in RMSE for invariant image and resultant image using proposed approach for the entire set of 108 images from the UIUC dataset. It can be observed that the RMSE of the proposed approach always remains better than that of the invariant image.

Fig. 4. RMSE for the invariant image and the resultant image for the entire dataset. It can be observed that RMSE of the proposed approach always remain less than that of the invariant image.

In the dictionary learning step, we used online dictionary learning proposed by Mairal et al. [43] to solve (6) with the suggested regularization parameter λ set to 0.15. Initial dictionary is built from a set of overlapping patches from residual image I_{S} (see Fig. 2 (c) and (d)). Size of dictionary is set as 1024 and the dictionary learning iteration is fixed as 100. For dictionary learning, residual image of size 200 × 200 is used, from which patches of size 18 × 18 are extracted. The criteria for selecting patch size is further explained in this section. Kmeans clustering is used to classify initial dictionary into shadow and geometric dictionary as the proposed approach uses separate local dictionaries for shadow and geometric parts. Figure 2 (e) and (f) shows a sample of shadow and geometric dictionaries. Sparse coding was performed using Mallat and Zhangs Orthogonal Matching pursuit [34] .
We measure the diversities of two sub dictionaries, shadow and geometric ( and ), using mutual coherence µ defined in (8) . Smaller value of mutual coherence leads to much diverse subdictionaries and improves the decomposition based on the two subdictionaries. As the dictionary atom size increases, mutual coherence also increases. Size of dictionary atoms influences the mutual coherence of the subdictionaries. We tried to plot the dictionary atom size versus the mutual coherence of local, global and generic dictionaries in Fig. 5 . For generic dictionaries such as Haar wavelet, DCT and Gaussian, mutual coherence decreases as atom size increases, whereas for local and global dictionaries, mutual coherence increases along with atom size. Also, µ of local dictionaries is less compared to that of global dictionaries. From Fig. 5 its clear that as atom size increases, mutual coherence of local and global dictionaries also increases, whereas decreasing atom size affects the computation time. So, we try to keep atom size as small as possible so that it always maintains a reasonable computation time during the online dictionary learning process. Hence, we fixed the patch size as 18 × 18, which in turn is equal to atom size 324. Even though µ of fixed generic dictionaries is less, the resultant shadowfree image using local dictionaries has less RMSE than the one using generic dictionaries.

Fig. 5. Dictionary atom size versus Mutual coherence (µ ) of generic, local and global dictionaries.

The major difference between the proposed method and MCA was explained in Section 3 . MCA algorithms are iterative, whereas the proposed MCA approach is noniterative. Even if the algorithm iterates for a few number of times, the result somewhat remains the same and there is no notable difference in the RMSE of the resultant image. This is clear if we observe the plot given in Fig. 6 , where RMSE of the proposed approach for image in Fig. 2 over 100 iteration is given. The intention of Fig. 6 is just to clarify that RMSE value never changes over iteration. Another difference from MCA is that the proposed approach uses local subdictionaries, whereas the MCA approach uses fixed generic dictionary. Plot given in Fig. 6 also gives the difference in performance using RMSE value of the resultant image using fixed dictionary, local subdictionary and after invariant image formation. The proposed approach using local dictionary substantially improves the performance.

Fig. 6. RMSE vs Iteration for image in Fig. 2 using generic dictionary, local dictionary and invariant image. Proposed approach is noniterative and even if the algorithm iterates for 100 times, RMSE value of the resultant image remains almost constant.

Quantitative evaluation of the proposed method has been performed using RMSE. Table 1 gives the performance of the proposed approach in terms of RMSE using generic, local and global dictionaries. Performance is comparatively better when using local dictionary. Even though µ of generic dictionaries is comparatively lower, RMSE of shadow removal using the proposed approach is better. The method has been quantitatively compared with Guo et al. [6] and Gryka et al. [16] and found to be better in terms of RMSE. Details of evaluation using images from UIUC dataset are given in Table 2 . The proposed method has been visually compared using results from Gong and Cosker [15] and is given in Fig. 7 .
Generic dictionary  

1^{a}  2^{b}  3^{c}  Global Dictionary  Global + Local Dictionary [26]  Proposed Approach 
14.67  14.85  13.93  14.10  13.5  12.23 
a. Haar wavelet packet and DCT dictionary.
b. DCT and shifted Kronecker delta.
c. iid Gaussian entries with zero mean.
Methodology  RMSE 

Guo et al. 2013 [6]  19.85 
Gryka et al. 2015 [16]  13.83 
Invariant image [28]  15.10 
Proposed approach  12.23 

Fig. 7. Result of the proposed shadow removal process and Gong and Cosker [15] : Left – Input image I ; Middle – output of the proposed approach; Right – Result of Gong and Cosker [15] .

Table 3 gives a qualitative evaluation of six major shadow removal algorithms – Miyazaki et al. [14] , Lalonde et al. [10] , Guo et al. [6] , Gong and Cosker [15] , Zhu et al. 2015 [11] and Gryka et al. [16] – to compare their performance with the proposed approach using properties such as computational load and preservation of textures.
Author  Method  Preservation of Texture  Computational load 

Miyazaki et al. [14]  Interactive  Excellent  Low 
Lalonde et al. [10]  Automatic  Good  Low 
Guo et al. [6]  Automatic  Excellent  Medium 
Gong and Cosker [15]  Interactive  Good  Low 
Zhu et al. [11]  Automatic  Average  Medium 
Gryka et al. [16]  Interactive  Excellent  Low 
Proposed  Automatic  Excellent  Low 
Average running time of an image from UIUC dataset [6] is 78.34 ± 18 s/image, out of which 69.6 ± 10 is for dictionary learning. In comparison, for the same dataset, the method by Guo et al. [6] takes 104.718 s/image for shadow removal. Computational time of the proposed method is better compared with state of the art methods.
We have presented a method to remove shadow from an image using the morphological diversities of shadow coupled with invariant image formation. In the proposed approach, shadow removal is formulated as a decomposition problem that uses separate local dictionaries for shadow and nonshadow parts, without using single global dictionary. Local dictionaries used in the proposed approach are learned from the patches extracted from the residual of image obtained after invariant image formation. Finally, a variation of MCAbased image decomposition using local dictionaries is performed to add the geometric component to the nonshadow part of the image to obtain the shadowremoved version of input image. The proposed approach of shadow removal works well for indoor and outdoor images, and the method has been compared with previous approaches and was found to be better in terms of RMSE.
Published on 10/04/17
Licence: Other
Are you one of the authors of this document?