Geoscientists often face interpolation and estimation problems when analyzing sparse data from field observations. Geostatistics is an invaluable tool that can be used to characterize spatial or temporal phenomena. Classic statistics is generally devoted to the analysis and interpretation of uncertainties caused by limited sampling of a property under study.
Geostatistics however, deviates from classic statistics in that Geostatistics is not tied to a population distribution model that assumes, for example, all samples of a population are normally distributed and independent from one another. Most of the earth science data (e.g., rock properties, contaminant concentrations) often do not satisfy these assumptions as they can be highly skewed and/or possess spatial correlation (i.e., data values from locations that are closer together tend to be more similar than data values from locations that are further apart). The goal of geostatistics is to predict the possible spatial distribution of a property.
Such prediction often takes the form of a map or a series of maps. Two basic forms of prediction exist: estimation (Figure 1) and simulation (Figure 2).
In estimation, a single, statistically \best” estimate (map) of the spatial occurrence is produced. The estimation is based on both the sample data and on a model (variogram) determined as most accurately representing the spatial correlation of the sample data. This single estimate or map is usually produced by the kriging technique
On the other hand, in simulation, many equal-likely maps (sometimes called \images”) of the property distribution are produced, using the same model of spatial correlation as required for kriging. Differences between the alternative maps provide a measure of quantifying the uncertainty, an option not available with kriging estimation.
Geostatistics versus Simple Interpolation
In geostatistical estimation, we wish to estimate a property at an unsampled location, based on the spatial correlation characteristics of this property and its values at existing sampled locations. But, why not just use simple interpo- lation? How is spatial correlation incorporated in the geostatistical approach? A simple example may illustrate this point more clearly (Figure 3): we know permeability at n sampled locations, we wish to estimate the permeability at an unsampled location, z0 . Using inverse distance, the unknown value can be evaluated as:
We can see that the above relation is a linear estimator, i.e., z0 is a weighted sum of the n known values. Each weight (Wi ) (assigned to a known zi ) is determined by the distance of the known data point to the unknown data point. For n = 7, for example, the weights can be calculated easily as shown in Figure 5 and 4.
Using this scheme, the weights assigned to points 1, 2, 4, 6 are all equal to 0.2. However, from the understanding of geology, we realize that permeability within the elongated sand body should be more similar in the lateral direction. Thus, points 4 and 6 should be given higher weights than points 1 and 2. This is obviously not the case when using inverse distance.
Thus, in conventional interpolation methods (e.g., inverse distance, inverse distance squared), information on spatial correlation is not incorporated. On the other hand, geostatistical estimation considers both distance and spatial correlation. In general, geosta- tistical estimation consists of 3 steps: (1) examining the similarity between a set of sample (known) data points via an experimental variogram analysis; (2) fitting a permissible mathematical function to the experimental variogram; (3) conducting kriging interpolation based on this function.