The observation data from ground surface meteorological stations is an important basis on which climate change research is carried out, while the homogenization of the data is necessary for improving the quality and homogeneity of the time series. This paper reviews recent advances in the techniques of identifying and adjusting inhomogeneity in climate series. We briefly introduce the results of applying two commonly accepted and well-developed methods (RHtest and MASH) to surface climate observations such as temperature and wind speed in China. We then summarize current progress and problems in this field, and propose ideas for future studies in China. Along with collecting more detailed metadata, more research on homogenization technology should be done in the future. On the basis of comparing and evaluating advantages and disadvantages of different homogenization methods, the homogenized climate data series of the last hundred years should be rebuilt.
climate observation ; time series ; homogeneity ; uncertainty
The Earth’s surface climate is undergoing a significant warming trend in recent hundred years. The regional climate of China has experienced a nearly consistent warming. A lot of research has been done and a number of China-averaged air temperature series have been established. Although the correlation coefficients among those main time series are between 0.73 and 0.97, the changes of air temperature in the past 50 or 100 years identified with different data processing methods are significantly different [ Tang et al., 2010 ]. For long-term climate change analysis, the use of climate series including non-climatic factors may lead to different conclusions.
The homogeneous long-term climatic series are the basis in climate change research and are used for assessing historical climate trends and variability, especially for research on climatology and extreme events. However, most long-term climatological time series have been affected by many non-climatic factors leading to a high discontinuity. These factors include such changes as instruments, observation methods, station relocation, etc. A lot of exploratory research has been carried out and some progress has been made by foreign researchers as early as the mid-1980s. Many methods have been applied to climate data operation research department and with further research a number of homogeneous research methods have been developed and established [ Alexandersson, 1986 , Jones et al., 1986 , Easterling and Peterson, 1995 and Vincent, 1998 ]. In recent years, many researchers have begun to focus their research on China [ Li et al., 2004 ; Yan and Jones, 2008 , Guo et al., 2009 , Li and Yan, 2009 and Li and Dong, 2009 ]. A number of homogeneous test and adjustments have been done on surface and sounding data. Although there have been some breakpoints on air temperature data and the homogeneous datasets have been established [ Li et al., 2006 ], other meteorological elements still remain in the experimental stage and no homogeneous datasets have been produced. Many related research in climate change still use original data without homogenization, which leads to large uncertainty in the conclusions of the studies. Therefore, it is still an important task in climate change research to study the homogeneity testing and adjustment methods for various elements of climate data and to explore the application technology which can be applied.
Based on the summary of the homogenization methods at home and abroad in recent years, the problems in the current homogeneity research of climatic data are analyzed and the future development directions are proposed in this paper.
The inhomogeneity in climate time series is the systematic differences that cannot be ignored relative to the natural variability and are caused by unnatural sources. A homogeneous climate time series is that where the variations are caused only by variations in weather and climate. The non-homogeneity of time series can be a gradual trend and can also be a sudden discontinuity (breakpoint). In the course of many homogeneity studies, a number of research methods were developed according to the different climate elements or time scales. Fourteen homogeneity methods were established and are widely used by various international research institutions and are summarized in the “Guidelines on climate metadata and homogenization” of the World Climate Data and Monitoring Programme (WCDMP) [ Aguilar et al., 2003 ], in which the guidance of establishment of station metadata and homogeneity methods is given. Contradictory conclusions may be received when these methods with different characteristics or based on different climatology principles or different time scales are used while testing the same time series. Therefore, it is useful to select specific homogeneity and adjustment methods for different elements according to their characteristics identified through experiments.
With the continuous development in homogeneity research, the homogeneity technology has undergone considerable improvement. Strict quality control is needed before the homogeneous test and adjustment, so that the occurrence of random errors in the climatic series can be greatly reduced and the time series will reach a good quality. Many man-made breakpoints in the climatic series after quality control can be found easily by analyzing the metadata. The direct homogeneity method is to examine and adjust the breakpoints using a subjective adjustment with the metadata and visually determine the time and reason of the inhomogeneous breakpoints. However, detailed station metadata information is difficult to obtain due to a variety of historical factors. Therefore, objective methods to detect breakpoints in the series by using certain mathematical methods are used by more and more scientists.
The changes of surface climate data series may be non-homogeneous, but may be only changepoints of the local climate. In most cases, the magnitude of the inhomogeneity may be the same as the true magnitude of climate change or even smaller. In order to separate these two signals, many detection techniques use the data series near the station as a local climate change references and any breakpoint which is significantly different from the local climate signal can be considered as a possible changepoint. It is a basic approach that surrounding stations are used to develop a reference series for the homogeneity test, and the most common method is to calculate the correlation coefficient between the station of interest and the surrounding stations. Afterwards, the station with the largest correlation coefficient is selected as the reference station. Two groups of homogeneity testing techniques can be distinguished and are usually referred to as “absolute” and “relative” methods based on whether to use the reference series or not. Statistical tests are applied to each independent station in the absolute methods, and the surrounding stations which presumably homogeneous are used in the relative methods as reference stations. While both approaches are worthwhile and valid, they both have drawbacks. With the absolute method it is difficult to determine if the changepoint is caused by climatic change factors. To overcome this problem, metadata support from station history information is essential for evaluating the breaks detected. The relative method attempts to isolate the impacts of non-climate factors, which assumes that in a geographical area the climate patterns are identical and all sites observed in the region reflect the same pattern. Therefore, the same climate data collected from all sites of the region should be highly correlated and have similar variability, but different conversion factors and random sampling variability. However, it will also cause problem if the observation stations and other elements change at the same time. For example, if the observation methods change over the same period and all the time series will be affected at the same time, hence the changepoints will not be detected by the relative methods. In addition, it may not draw clear conclusions when many time series of the surrounding stations are non-homogeneous and used as reference stations. Although the use of reference series has some flaws, most of the studies on homogeneity are still based on the effective use of the reference series, taking the following four steps: 1) metadata analysis and basic quality control; 2) establishment of the reference series; 3) changepoint detection; 4) data adjustment. The ability of the test can be improved if the effective reference series are used. Good reference series should be homogeneous and have a high correlation with the target time series. It is easier to check out the changepoints, as the variability of the difference or ratio series become smaller after the trends and cycle in the series are removed or reduced.
Many comparable works have been carried out worldwide [ Peterson et al., 1998 ; Ducré-Robitaille et al., 2003 ; Li et al., 2003 ; DeGaetano, 2006 ; Reeves et al., 2007 ; Costa and Soares, 2009 ] on these methods with comprehensive analyses and comparisons. Li et al.  summarized the more commonly used nine homogeneity tests and revised methods and some results of national or regional climate series in some countries which rely on these research methods. Many of these methods have been widely used in China. These methods include the standard normal homogeneity test (SNHT), two-phase regression (TPR), multivariate analysis of series for homogenization (MASH), multiple-phase linear regression and so on. Reeves et al.  analyzed and compared eight methods including SNHT, TPR, etc. Costa and Soares  summarized nine homogeneity test methods, including MASH, regression-based test methods (TPR, multiple linear regression, etc.), SNHT and so on. These homogenization research methods are often designed for different research objectives. Therefore, it is not easy to carry out the work of objective comparison and analysis. In order to compare the ability of these methods for testing non-homogeneity, the common approach is to determine the inspection rules, such as various methods that do not consider the parameter estimates, the ability to identify the correct changepoints (including hit rate, false alarm rate, etc.). However, there is no single method that can be recommended for homogenization after a number of comparative works have been carried out. Overall, the currently and widely used homogeneity methods are mostly based on the maximum likelihood ratio test. Other methods have undergone continuous changes and improvement processes, such as one of the most important methods: TPR has undergone many changes. Hinkley [1969 ; 1971 ] proposed the test of TPR as early as the 1970s. Solow  used this method to test air temperature series. Lund and Reeves  improved the TPR testing methods (LR), and Turner et al.  further applied the TPR and LR methods to the homogeneity tests. By improving this method, Wang [2003 ; 2006 ] have established a new test method which has been applied to several climatic time series.
The previous homogeneity studies are mostly based on testing only the yearly time series. There are periodic characteristic in the monthly and daily series, which lead to extremely complex homogeneity identification. The previous international methods which are improved and applied to test and adjust the nonhomogeneous series are not suitable for the high temporal resolution (such as daily or hourly) series. Based on the analysis of the air temperature observation series in Beijing and the homogeneity result of the traditional types of Alexanderson, Yan et al.  suggested that the homogeneity result is more reliable on the seasonal scale than on the monthly or shorter time scale due to the disturbances in local weather. It is difficult to achieve the statistical significance level which is based on the homogeneity analysis of “adjacent reference time series” in time series shorter than seasonal scale. There are no detailed instructions provided by the WMO on the adjustment of data less than monthly scale. The WMO only suggests that as analyzing the long-term climate change, we should determine the breakpoints referring to the non-homogeneous situation of the monthly series when its scale is less than monthly. In addition, the autocorrelation is another issue to be considered because the assumption of independent error case for most methods developed before is reasonable for the yearly series but not for daily or monthly series, which usually have a strong autocorrelation. Lund et al.  believe that the test process which is the development of independent error series will lead to more false changepoints when the autocorrelation is positive. Although the detection difficulties caused by the periodicity and autocorrelation are obvious, it is necessary to consider the two at the same time.
In addition, the homogeneity tests in the past have the hypothesis that there is at most one changepoint in the time series. According to different research purposes, many researchers only find the most significant changepoint and revise it. Then the method will be applied to the adjusted series. This may lead to the wrong revision when there are more than one changepoint because the adjustment effect of the first changepoint may be deviations [ Reeves et al., 2007 ]. Ideally, all the possible changepoints should be recognized simultaneously. Therefore, the detection and adjustment about multiple changepoints in the time series are more actively studied at present. Wang and Feng  developed the method based on semidistributed computing grade separation to identify multiple changepoints. Menne et al.  developed a similar procedure. In this process, the time of the priority changepoint should be re-evaluated when the time-points of the other changepoints are located. In addition, the homogeneity of the “undocumented” changepoints detection and revision are more and more concerned. The “undocumented” changepoints are referring to those changepoints without metadata to support, while the capabilities of some previous homogeneity methods are limited for detecting these changepoints.
In summary, current studies on homogeneity of surface climate data show the following trends. 1) An increase in test objects. Most of the initial homogeneity methods are for air temperature, pressure, and other meteorological elements with normal distribution. In recent years, more and more methods are developed mainly for precipitation, wind speed, and other climatic elements with non-normal probability distribution and largely affected by the local environmental factors. 2) The time scale refinement. The purpose of the homogeneity test is to discover unnatural climatic changepoints in the time series, this is needed to remove the interference from the process of auto-regression. We can obtain the best detection results in the yearly series as they get less affected by seasonal variability. However, with in-depth research on climatic change, the data on monthly and daily scale are urgently needed as the basic data for the research of extreme events. Therefore, more and more attention has been paid to the homogeneity research methods for the monthly and daily series, especially reflected in the emergence of the adjustment method based on the distribution of daily observations [ Della-Marta and Wanner, 2006 ; Toreti et al., 2010 ] and the time-frequency decomposition domain test. This test is based on the wave motion decomposition of different climatic scales [ Yan and Jones, 2008 ] and some other methods. 3) More and more attention is paid to the undocumented changepoints and multiple changepoints in the study of homogeneity. Detailed metadata information can be served as a strong complement to the study of homogeneity. However, in the case of lack of metadata information, synthetic judgment methods by using many homogeneity techniques to detect the changepoints have been developed and applied.
Most of the meteorological observations in China started only half a century ago and many stations have experienced relocation and observation system changes since then. Over the years, many researches on air temperature changes in China are carried out based on the data without homogeneity tests. Beginning in 2001, with the five-year support of the Ministry of Science and Technology of China through the based-platforms special project of the scientific data sharing system, Li et al.  used the TPR homogeneity method developed by the U.S. National Climatic Data Center (NCDC), based on strict quality control, to detect and adjust the air temperature series of 731 standard and basic stations. In 2006, the first edition of a homogenized air temperature dataset (1951–2004) [ Li et al., 2006 ] was published. It provides the climate change detection and related operation and research works with a high-quality air temperature dataset which reflects the true climate change of nearly 50 years in China. In recent years, some new changes occur in the observation system after 2004, such as the use of automated equipment, station relocation caused by urban development, operation restructuring and other factors, which are causing new inhomogeneity in the data series. Therefore, this dataset is urgently needed to be updated and improved. At present, climatic scientists in China have undertaken a number of attempts to test the homogeneity of precipitation, wind speed, radiation, pressure, and other elements [ Jiang et al., 2008 ; Liu , 2000 ; Ju et al., 2006 ; Wu et al., 2008 ]. For example, Li et al.  found that the wind speed of Beijing shows significant effects of urbanization after homogenization, yet it is hard to explain the geographical distribution patterns of climate change which is shown by the original data. Nevertheless, there are no homogeneous datasets for precipitation, wind, and other elements yet. In recent years, the methods of MASH developed by Szentimrey [1999 ; 2003 ] and the RHtest established by Wang et al. [2007 ; 2008 ] have been successfully used in homogeneity studies of climatic data series in China. In the following section we discuss some of the issues in homogeneity research work, starting from the specific applications of the two methods.
MASH is an internationally accepted climate data processing method based on strict statistics. The basic idea is that multi-station series in the same climatic zone are pair wise compared on a step by step approach. Here, the non-homogeneous breakpoints are determined and adjusted for each station. The results can be considered as a homogeneous dataset at regional scale. This method does not assume that the reference time series is homogeneous, but that all series may contain inhomogeneous changepoints. Based on this assumption, the series of each station are tested and adjusted by comparing them with each other in the same climatic region. The series to be tested is selected from all available series, while the remaining series are used for the reference series. Finally, all the series are analyzed in the same way and homogeneity test standards and adjustments are determined. Different addition and multiplication modes are applied for different climatic factors, while the multiplication mode can also be converted to the addition mode by taking its logarithm. The difference series consist of the series to be tested and the weighted reference series. The best weight is determined by the variance of the minimized difference series. In order to improve the efficiency of statistical tests, it is assumed that the series to be tested is the only ordinary series in all the difference series and the breakpoints detected in the difference series are considered as the breakpoints of the series to be tested. At present the time of the possible changepoints provided by the metadata can be used or left out when using the MASH software.
By investigating the homogeneity of the air temperature in Beijing using the MASH method, Li and Yan  found that the use of metadata has almost no effect on the results of the main breakpoints when using the same station dataset, which shows that the objective analysis of MASH is meaningful. Especially when in the cases that metadata does not record all changepoints or lack of metadata, the MASH analysis seems particularly valuable. Based on this, Li and Yan  tested and adjusted the 1960–2008 daily average, maximum and minimum air temperature series of 549 standard and basic stations in China using the MASH method. The main changepoints are basically the same as the aforementioned results of Li et al.  , which are obtained by using the method that has been improved and is strictly based on the metadata record. However, a system bias in some stations has also been detected by MASH, which is generated by the observing system changes form manual to automatic stations in recent years. The study of the latest homogeneous air temperature data shows that the average warming rate from 1960 to 2008 is about 0.26°C per ten years over China, which is slightly larger than the rate calculated from the original data.
RHtest is based on the penalized maximal t-test (PMT) and the penalized maximal F-test (PMFT), in which multiple-phase linear regression algorithm are embedded, with the lag-1 autocorrelation of the time series being empirically accounted for. It can be used to test the homogeneity of yearly, monthly and daily series and to adjust the series with a lag-1 autocorrelation error and multiple changepoints (mean changepoint). The problems on false alarm rate and the uneven distribution of the test capability are greatly reduced by using the empirical penalized function. The regression testing algorithm is applied to test multiple changepoints. First, it identifies the most likely changepoints in each segment of the series, and calculates the statistics of all these changepoints to determine the first changepoint. Then, it identifies the most possible changepoint in each segment and estimates their statistical significance for determining the next possible changepoint. Repeat the process until all the changepoints are found step by step. We can then array the changepoints in descending order according to their statistical significance to form a changepoint list to judge whether the smallest changepoint is significant. If it is not significant, it will be deleted. After that, the significance of the remaining changepoints will be evaluated again and the final remaining changepoints which are significant are the real changepoints.
PMFT can effectively avoid the inspection error caused by the non-homogeneous reference series and can be used to test the wind speed and other meteorological elements which are largely affected by the local environment as the series can be tested without reference series. Cao et al.  detected the annual mean wind speed data of 701 meteorological observation stations in China by using PMFT and detailed metadata. The results show that the homogeneity detection method works well for the annual mean wind speed over China. Among the detected 701 stations, 61.3% of annual mean wind speed series are homogeneous, showing that the homogeneity of the annual mean wind speed is good in many areas of China. The changes in instruments and location are the two main causes for non-homogeneity, while the changes in the type of observation instruments are the most important reasons for the non-homogeneity of the annual mean wind speed over China.
New requirements in the quality of climate data are desired by the research of climate change. In order to improve the quality of climate data in China and to reduce the uncertainty in climate studies, the research and operational work on homogeneity of climate data need to be enhanced. We need to improve the homogeneity methods more actively and develop operational processes based on more sophisticated homogeneity methods to publish high quality homogeneous products. In the coming years, we need to focus on the following issues: 1) Collecting and processing of metadata should be enhanced. Important metadata which is used to evaluate the homogeneity of the surface climate series includes station relocation, instrument changes and destruction, the new statistical equations, changes of the surrounding environment, the times of the observation. Detailed metadata can provide necessary and objective support and will promote the development of homogenization research. 2) In basic research studies, homogeneity techniques and methods and the comparison and assessment between different homogeneity methods should be deeply carried out for climate data. The ultimate goal of homogeneity research is to establish procedures of data products that can be regularly updated and used in operational procedures. Hence, these need to be continuously improved by comparing different homogeneity methods currently being used, such as the advantages and disadvantages of MASH, RHtest and other methods when they are applied to different climatic elements. The object is to form the technology of homogeneity used in operation on the basis of scientific evidence. 3) The climate series of the recent hundred years should be re-established in different regions of China. Although China is vast and has a lot of early meteorological observation series, there has not been a set of reliable centennial-scale observations. The most important reason is the lack of homogeneity research for centennial-scale observations. For the climatic change research it is a difficult task to collect early metadata of air temperature, precipitation observation data, historical information of stations in China, and to analyze and adjust the homogeneity in the series caused by human factors such as station relocation and changes in observation systems.
In short, as in-depth global change research is carried out in China, it is imperative to enhance the homogenization research of climate data, as its application is urgently needed by various industries.
This research is jointly supported by the National Program on Key Basic Research Project (No. 2010CB951602, 2009CB421401), National Science and Technology Ministry (No. 2008BAK50B07), China Special Fund for Meteorological Research in the Public Interest (No. 200906041-052), and the Project of National Natural Science Foundation of China (No. 40805060).