This paper analyzes the status of existing resources through extensive research and international cooperation on the basis of four typical global monthly surface temperature datasets including the climate research dataset of the University of East Anglia (CRUTEM3), the dataset of the U.S. National Climatic Data Center (GHCN-V3), the dataset of the U.S. National Aeronautics and Space Administration (GISSTMP), and the Berkeley Earth surface temperature dataset (Berkeley). Chinas first global monthly temperature dataset over land was developed by integrating the four aforementioned global temperature datasets and several regional datasets from major countries or regions. This dataset contains information from 9,519 stations worldwide of at least 20 years for monthly mean temperature, 7,073 for maximum temperature, and 6,587 for minimum temperature. Compared with CRUTEM3 and GHCN-V3, the station density is much higher particularly for South America, Africa, and Asia. Moreover, data from significantly more stations were available after the year 1990 which dramatically reduced the uncertainty of the estimated global temperature trend during 1990–2011. The integrated dataset can serve as a reliable data source for global climate change research.
Global monthly surface temperature dataset ; Integration of multi-source data ; Climate change
In recent years, changes in land surface temperature on global and hemispherical scales have been most thoroughly studied through CRUTEM3 [CRUTEM: dataset of the University of East Anglia] (Jones, 1994 ; Jones and Moberg, 2003 ), GHCN-V3 [GHCN-V3: dataset of the U.S. National Climatic Data Center] (Peterson and Vose, 1997 ), and GISSTMP [GISSTMP: dataset of the U.S. National Aeronautics and Space Administration] (Hansen and Lebedeff, 1987 ). These three datasets have been fully developed in recent years, although each has limitations. For example, fewer stations are set up in some regions of South America, Asia, and Africa. Such stations differ in representativeness, which results in large differences in homogeneity treatment. When these datasets are applied in describing air temperature changes on the global or regional scale, inconsistencies of various degrees occur (Gong and Wang, 2002 ; Wang et al., 2009 ). Furthermore, because they differ in data collection and treatment techniques as well as focus, these three datasets have distinct advantages and disadvantages during application. In recent years, the International Surface Temperature Initiative (Thorne et al., 2011 ) and the Berkeley [Berkeley: the Berkeley Earth surface temperature dataset] Earth surface temperature research team (Rohde et al., 2013 ) have conducted a large amount of research in this aspect.
Several internationally renowned datasets of global climate including air temperature and precipitation are mainly from the U.S., the United Kingdom, and Russia. China lags behind in the capacity of collecting and processing global data. Chinese scholars have devoted great effort in the development of homogenized climate datasets in recent years and have accumulated experience. The National Meteorological Information Center, China Meteorological Administration, released Chinas homogenized air temperature dataset (1951–2004) version 1.0 in December 2006 (Li et al., 2004 ; Li and Dong, 2009 ). Li et al. (2010) conducted homogeneity tests and correction on the air temperature series of China during the past century (1900–2006). They developed a homogenized air temperature dataset and air temperature series for China and systematically evaluated the uncertainty level of climate warming in China during the past century. Li and Yan (2010) adopted multiple analyses of series for homogenization in the homogenization and correction for daily air temperature series at more than 500 stations nationwide during 1960–2006. Cao et al. (2013) interpolated and corrected 16 long-term monthly mean air temperature series of eastern China to construct the air temperature variation series. Xu et al. (2013) performed homogeneity research of daily climate sequences and compiled a second-generation air temperature homogenized dataset for China. This dataset demonstrates great advantages when applied to extreme climate events and variations across the years. On the basis of the dataset compiled by Li et al., 2010 ; Wang et al., 2014 used the best unbiased method to reconstruct the air temperature sequences. Because of a lack of necessary technologies for data collection and treatment, however, Chinese experts have not commenced the development of global data products yet. With the furthering of scientific research in climate change and the growing demand for global air temperature datasets, China has made new progress in this aspect. Thus, a foundation is established for developing a new version of global air temperature dataset. In this study, the advantages of several typical datasets of global monthly air temperature are combined for some regions and countries to form Chinas first dataset of global monthly air temperature. Data support is provided for real-time monitoring and studying the variation of global climate.
Global historical climatology network is the first version of the monthly air temperature dataset developed by the U.S. National Climatic Data Center since early 1990s (Vose et al., 1992 ). GHCN-V3 was released in 2011, with proper quality control on repetitive data, climate anomalies, and spatial inconsistency (Durre et al., 2007 ). Homogeneity testing and correction for the temperature series were conducted by automatic paired alignment (Menne and Williams, 2009 ). GHCN-V3 consists of two types of data. The first is original data, which are usually used as the fundamental data for other datasets such as GISSTMP; the second is the homogenized CRUTEM (Jones, 1994 ; Jones and Moberg, 2003 ), which is widely used in the study on air temperature changes and trends. In particular, CRUTEM2 uses the homogenized data of many countries and districts with good quality control. The dataset is of good spatial representativeness, high stability, and the widest application. With further improvement on quality control method, CRUTEM3 was released (Brohan et al., 2006 ). Fig. 1 shows the spatial distribution of stations included in the two datasets. Although the number of stations differs between GHCN-V3 and CRUTEM3, the long-term data series are mainly distributed in North America and Europe. Furthermore, the data length of CRUTEM3 is much longer than that of GHCN-V3. South America, Asia, and Africa have fewer stations and shorter time span. At most stations, the data length is less than 50 years.
Spatial distribution for stations with time span in (a) GHCN-V3, and (b) CRUTEM3 (unit: year).
In addition to the two datasets, GISSTMP (Hansen et al., 1999 ) introduced the data from several stations in Antarctica and combined the homogenized U.S. Historical Climatology Network data from more than 1,200 stations. The data from the stations located in cities with populations of more than 50,000 were homogenized. Because this dataset was developed on the basis of the original data of GHCN-V2, the two datasets are consistent in data sources. In recent years, the Berkeley Earth surface air temperature research team combined 1.6 billion data series in 16 datasets to build an integrated dataset of global monthly air temperature. A new algorithm was developed that utilizes some short or discontinuous data. After the removal of repetitive records, Berkeley covers 36,000 stations (Rohde et al., 2013 ).
Through international cooperation, global datasets such as CRUTEM3 and GHCN-V3 were collected along with the regional datasets released or exchanged by typical regions or countries. The supplementary datasets fall into the following categories.
GHCN-V3 has relatively stable data sources, and the non-homogenized original data can serve as basic data for building a new air temperature dataset. When integrating multi-source data, the stations covered by both GHCN-V3 and other datasets should be identified. The following principles apply to the identification of repetitive stations: (a) each station of GHCN-V3 is taken as the station to be inspected, and the stations in other datasets within 0.25° from the inspected station are the candidate stations; (b) if the candidate station and the inspected station have the same station number as specified by World Meteorological Organization, the two stations are considered as the same station; (c) if the candidate station and the inspected station have the same name, the two are also considered as the same. With the repetitive stations determined, the priority of other data sources and GHCN-V3 needs to be determined. Data sources belonging to (1) and (3) in Section 3.1 are the homogenized datasets and first-hand data of individual countries of districts and have a higher priority than GHCN-V3. Therefore, when integrating the data sources belonging to (1) and (3), the repetitive stations in GHCN-V3 are replaced by these two data sources, and the stations not found in GHCN-V3 are supplemented. The data sources belonging to (2) in Section 3.1 are the original datasets released by the meteorological departments of various countries. The differences in data at repetitive stations in GHCN-V3 and during the overlapping period are calculated. The priority is determined by the data length and integrity. If the consistency of data in the overlapping period is over 95%, priority is given to that with longer length. The data absent in the priority data source are supplemented by other data sources. If the consistency of data is below 95%, the data source with longer data length is selected, and no supplementation is needed. After integrating the above three data sources, CRUTEM3 and Berkeley datasets are used to supplement the data of Africa, Asia, and South America to increase the density of stations in these areas.
Table 1 is the basic information on the fusion of various data sources with GHCN-V3. For the U.S., Canada, and Australia, the homogenized datasets released by the meteorological departments are assigned with higher priority. Before the 1950s, China had only 192 stations, which rapidly increased after the 1950s and reached 825 in 2005. Currently, GHCN-V3 includes approximately 416 Chinese stations, with all original data. Thus, when integrating Chinas air temperature data, the data homogenized by Xu et al. (2013) from 633 stations built after 1950 were used. For the 192 stations built before 1950, data homogenized by Li et al. (2010) were used. Regarding Chinas neighboring countries, the Japan Meteorological Agency updates the data of 151 stations on a monthly basis. The air temperature data of 76 Korean stations and 25 Vietnamese stations since 1960 that were obtained through exchange are all original data with strict quality control. Russia has released historical climate series at 518 stations since its founding. The metadata on operation, shutdown, and dislocation of the stations is also provided. By comparison with repetitive stations in GHCN-V3, the data from 426 stations with higher integrity and longer time span are assigned higher priority.
|Source of regional dataset||Number of stations||Stations by homogenization||Period|
|Included in regional dataset, not in GHCN-V3||Included in both regional dataset and GHCN-V3, but regional dataset with higher priority||Included in both GHCN-V3 and regional dataset, but GHCN-V3 with higher priority; or only included in GHCN-V3||Supplement by other global datasets|
|Canadas national climate and weather data archive||59||279||591||112||338||1870–2011|
|Australias high-quality climate change datasets||0||110||473||19||110||1880–2011|
|U.S. Historical Climatology Network||0||1,218||703||362||1,218||1870–2011|
|China Meteorological Administration||309||416||0||0||633||1900–2011|
|Russian Meteorological Agency||31||395||123||125||0||1900–2010|
|Japan Meteorological Agency||5||146||21||0||146||1870–2011|
|Korea exchange dataset||0||66||25||0||0||1960–2007|
|Vietnam exchange dataset||17||3||5||0||0||1960–2010|
|European regional datasets||200||400||1,155||347||678||1760–2009|
|South American regional datasets||0||0||353||80 (CRUTEM3) 311 (Berkley)||80||1951–2011|
|Africa regional datasets||0||0||751||70 (CRUTEM3) 277 (Berkley)||70||1951–2011|
|Antarctic climate data||12||34||0||0||0||1950–2011|
Excluding Russia, the European region uses two major sources of homogenized data including that for the Greater Alpine Region (Chimani et al., 2012 ) and the European Climate Assessment & Dataset (Klein Tank et al., 2002 ). The stations in these two datasets are assigned with higher priority, and there are 600 non-repetitive stations. GHCN-V3 supplements data at 1,155 stations in the European region, and CRUTEM3 and Berkeley combined supplement data at 347 stations. For South America and Africa with sparsely distributed stations, almost no other data sources are used except GHCN-V3, CRUTEM3, and Berkeley. Thus, for South America, the data of 353 stations are from GHCN-V3, the data of 80 stations are from CRUTEM3, and the data of 311 stations are from Berkeley. In Africa, the data of 751 stations are from GHCN-V3, the data of 70 stations are from CRUTEM3, and the data of 277 stations are from Berkeley. For the Antarctic region, the Scientific Committee on Antarctic Research dataset covers 46 stations on land. The time span of most series is 1950–2012, and the longest is approximately 50 years.
Despite quality control, the use of various methods will lead to quality problems in integrated dataset. The quality control method used for GHCN-V3 implements a three-step quality control process for the integrated dataset.
Step 1: check for climate anomalies. Anomalies higher than five times the standard deviation of the monthly mean at each station are checked. Fifty-four, 39, and 129 stations have higher anomalies in monthly mean temperature, maximum temperature, and minimum temperature, respectively. These anomalies are treated as default.
Step 2: check for spatial consistency. The standard is as follows (the formula should not be represented graphically):
where is the normalized air temperature at the target station; is the normalized air temperature at the neighboring stations (not exceeding 20) within 500 km from the target station; is the mean of normalized air temperature at the neighboring station; is the standard deviation of normalized air temperature at the neighboring station. The test showed that monthly mean air temperature, maximum temperature, and minimum temperature have spatial inconsistency problems at 349, 170, and 505 stations, respectively. These values are treated as default.
Step 3: check for internal consistency. Most data sources contain monthly mean temperature, maximum temperature, and minimum temperature simultaneously. The mean temperature is the value of a fixed time or a result of a weather forecast and is usually not the average between the maximum and the minimum. Therefore, internal inconsistency may arise such as mean temperature lower than the minimum temperature or higher than the maximum temperature. The test showed that internal inconsistency occurs in approximately 1,544 stations. The remedy is to take the average of the maximum and the minimum temperature.
The integrated dataset includes 9,519, 7,073, and 6,587 stations with lengths of monthly mean air temperature, monthly maximum and minimum temperature series of at least 20 years. Fig. 2 shows the spatial distribution of 9,519 stations included in the monthly mean dataset. The station density in the integrated dataset is higher than that in GHCN-V3 or CRUTEM3, particularly in South America, Africa, and Asia. The length of data series increases most obviously in the U.S., China, and the adjacent regions. As indicated by the number of stations with various time span (Fig. 3 ), the number of stations at each time span interval in the integrated dataset is significantly higher than that in GHCN-V3 and CRUTEM3. Thirty-nine stations cover more than 200 years. Except for one station in the U.S., all stations are located in central Europe. There are 6,121 stations covering 50–200 years, accounting for approximately 64% of the total. These stations are mainly distributed in the U.S., Europe, Asia, and Australia. Approximately 3,359 stations have time span of 20–50 years, accounting for 35% of the total. They are distributed sparsely in South America and Africa. As shown by the changes in number of stations in 1900–2011 (Fig. 4 ), the yearly number of stations in the integrated dataset is significantly greater compared with that in GHCN-V3 and CRUTEM3. After the 1990s the number of stations is significantly higher in the integrated dataset than that in the other two datasets. The yearly changes in station numbers of GHCN-V3 and CRUTEM3 in 1900–2011 indicate that the station number increased since 1900 with higher value in 1960–1990. After 1990, the number of stations decreased sharply. By 2011, the numbers of GHCN-V3 and CRUTEM3 stations were approximately only 3,000 and 1,600, respectively. These additional stations will decrease the uncertainty in estimation of the global air temperature trend since 1990.
Spatial distribution for the stations of monthly mean temperature with time span from the integrated dataset (unit: year).
Station numbers in various time span intervals from the integrated dataset, GHCN-V3, and CRUTEM3.
Yearly station numbers during 1900–2011 from the integrated dataset, GHCN-V3, and CRUTEM3.
The features of several typical datasets of global monthly mean were analyzed, and the regional datasets for major countries or regions were combined. In addition, a new dataset of global long-term monthly air temperature for land was created. The station density in the integrated dataset increased in each interval in various regions in the world. Fig. 5 shows a comparison of land surface annual mean temperature anomaly in the integrated dataset, GHCN-V3, and CRUTEM3. The three datasets describe a similar overall trend of global land surface mean temperature. In the period 1972–1985, the three series nearly coincide. In other periods, certain differences are apparent. For example, the integrated dataset was much closer to CRUTEM3 in 1900–1910, between CRUTEM3 and GHCN-V3 in 1920–1970, while closer to GHCN-V3 after 1990.
Annual global land surface air temperature anomalies during 1900–2011 relative to the 1961–1990 means.
For various periods (Table 2 ), the integrated dataset underestimated annual mean temperature in 1900–1950 compared with CRUTEM3 and GHCN-V3. In 1951–2011, the integrated dataset estimation was between that of the other two datasets. Over the entire period (1900–2011), the global annual mean temperature estimated by the integrated dataset was slightly lower than that of GHCN-V3 but slightly higher than that of CRUTEM3. These results indicate that the integrated dataset can estimate the global mean air temperature trend similar to that estimated by CRUTEM3 and GHCN-V3. With the involvement of additional stations, the differences in long-term variation trend of air temperature appeared to diminish, which was expected. Fig. 5 shows the global annual mean temperature anomalies in 1900–2011 relative to the 1961–1990 means.
Although some countries or regions have released homogenized datasets, more countries do not conduct homogenization treatment. As a result, the datasets of many regions contain the influences of non-natural factors. Therefore, it is highly important to remove the errors related to the lack of homogenization treatment or at least to determine the range of relevant errors. Thus, future work will focus on data homogenization and correction, and a homogenized, real-time dataset of global air temperature will be established. This study provides a crucial basis for improving Chinas monitoring and understanding of global climate change and the mechanism of climate change in Asian countries.
Deepest gratitude goes to Prof. Philip D. JONES from University of East Anglia, Prof. Manfred from Austria, and REN Yu-Yu from the National Climate Center, China Meteorological Administration for their assistance in data collection. This paper is supported by the China Meteorological Administration Special Public Welfare Research Fund (GYHY201206012 , GYHY201406016 ) and the Climate Change Foundation of the China Meteorological Administration (CCSF201338 ).