The minimum sample size for kriging
|September 17, 2011||Posted by karmadsen under blog, Geostatistics, Modeling Software, Statistics|
In attempting to find a rule of thumb for the minimum number of observation points needed for kriging, I found out that it’s not particularly straightforward. Kriging is a linear least squares estimation algorithm. Even in simple least square regression there is a lot of guess work involved in determining the minimum sample size needed prior to conducting a study. Various estimation methods exists to help researchers design regression analysis studies based on the number of variables and the the desired power level. But in kriging, there are added levels of complexity. The x,y distribution of points matters (i.e. they shouldn’t all be clumped together). The volatility of the z-value matters (i.e. you need more points to capture busy spatial trends).
Various methods exist to design spatial sampling for interpolation. The simplest and most obvious is to sample on a grid, with the resolution as dense as feasible given the project budget. If a specific portion of the sample area is especially sensitive, the resolution of sampling could be higher in that region and lower elsewhere. Once kriging has been conducted on a set of points, kriging mathematics can be used to project where the “next” sampling point or points should be to reduce uncertainty in the model.
Rather than establishing a rule of thumb for the minimum number of sampling points to use, a more important question is how to use the points that you have to get the most use out of them. Each prediction within the calculated surface is based on the nearest observation points. The number of neighbors to use is set by the user. According to the Harvard School of Public Health, a general rule of thumb is to use a sizable fraction of the total data set: “For example, for 100 data points, I would try to use at least 25 neighbors, and more if possible. For 1000, I would use at least 25-50 and ideally a few hundred, but the computations may be too slow with this many.”
In general, more is better when it comes to observation points. It’s also possible to have too many observation points for kriging. During the kriging calculation, an N x N matrix (where N is the number of sample points) must be inverted. This can become computationally intensive. Also, the N x N matrix becomes ill-conditioned when sample points are located closely together. Thus, kriging works best for sparse sample sets.2
1. Ciol, M.A. (2008) Presentation: Sample Size and Power Calculations. University of Washington.
2. Swiler, L.P., Slepoy, R., Giunta, A.A. Evaluation of Sampling Methods in Constructin Response Surface Approximations. Albuquerque, NM: Sandia National Laboratories. American Institute of Aeronautics and Astronautics.