Population and representative samples

When it comes to mining, we aim to extract the economic portion of a population (the portion that makes a profit).  We do not know our full population.  Instead, we rely on a miniscule subset of the population (the samples) to make our decisions.  If we are to make reasonable decisions, the samples we rely on must represent the total population.

What makes a sample set representative?  In statistical terms, this means the samples we collect of the population provide a fair indication of how the population behaves.

A histogram is a useful plot for understanding the population and for measuring of the population distribution.   A histogram is a plot of the  count of data points within successive intervals (see an example histogram for copper in Figure 1).  This bar chart provides a summary of typical spread of grades in the data set.

Figure 1 : Example of a Histogram
Figure 2 : Example of a Histogram on a Log-scale

When a data set has a positive skew, most of the sample values are low grade with a small percentage of more extreme high grades.  This means most of the samples occur within a few intervals of the histogram (see the left hand bars of the histogram in Figure 1). A useful option is to change the intervals for the histogram.  The easiest way to do this is to apply a log-transform to the data.  A log-transform maintains the data order, so the same low grades samples on a normal scale are low grades on a log-scale, and the highest sample on a normal scale is still the highest sample on the log-scale. Figure 2 is a log-scale histogram of the data presented in Figure 1. A log-transformation effectively magnifies the lower grade end of the distribution and contracts the higher grade scale.

A sample histogram is a representative reflection of the population histogram when it accurately reflects the total population (Figure 3).

Figure 3 : Histogram of samples and populations

A  good  way  to  ensure  a  representative  data  set  is  to  have  fair  coverage  of  the population (no bias introduced by clustered drilling) and even sampling within each geologically controlled population (as best as can be achieved – often samples are collected over variable lengths to represent the various geological units or population controls).

In addition, for a sample to be representative the difference between the sample value we obtain and the true value should be as close as possible.  If we take numerous repeat samples at the identical location, the difference between them should be small and the average of all of them should be as close to the true value as possible.  This means the samples are precise (small overall error) and accurate (close to the true value).   The consistent difference between the average of the repeat samples and the true value is called a bias.

Precision, accuracy and bias

In reality, data collection errors lead to a mismatch between what we sample and the population we are trying to represent. This difference can occur in the following ways:

·   Precision describes our ability to be specific about a grade – the number of decimal places we report describes our ability to be precise.   Precision is measured by comparing repeat samples.

·   Accuracy describes how well the average of the repeat samples targets the true (but unknown) grade.

Figure 4 : Accuracy and precision – target example

Bias is the measure of the systematic difference between the average of our repeat samples and the true grade.

In statistical terms, the histograms of the repeat samples either reflect the true unknown value or not according to a shift in the average away from the true mean (a bias), or a wider than acceptable spread (lack of precision) as described in Figure 5.

In reality, samples invariably incur a degree of imprecision and inaccuracy.  We need to ensure that through proper sampling practices the errors incurred are as small as possible.

Figure 5 : Accuracy and precision in statistical terms

Samples and lots

Samples  are  collected  at  different  volumes  –  consider  the  difference  between  the volumes of 1m of RC chips compared with the volume of the pulp that is eventually analysed in the laboratory.   Pierre Gy describes these differences as the “lot” or the “sample”, where the sample is the volume ultimately analysed for grade, while the lot is the volume of material collected for sampling.

Figure 6 : The difference between a “lot” and a “sample”

Other examples of lots are: blasthole cone, diamond core, development face chips and stockpiles.  Examples of samples are half diamond core, riffle split sample, mill pulp and fire assay sample.

Activity

Consider a 48 kg lot that contains precious grains.  We are interested in the number of grains per kg.   The lot is divided into 48 one-kilogram samples. 1.   Count the grains per kilogram within each sample (record in count template).

2.   Calculate the overall average of the samples                       .  This is the grade of the lot.

3.   How well does each sample reflect the grade of the lot?   To answer this question, plot a histogram of the sample grades.

Highlight the lot grade on the histogram.  Compare the sample grades to the lot grade.

Calculate the variance and standard deviation.

What does this tell you about the precision of the samples?

4.   Suppose we take bigger samples, say 6kg samples.   How precise will these samples be?  To answer this, calculate each 6kg sample’s grade as grains per kilogram.

Calculate the overall average.  Calculate the standard deviation.  How does this compare to the standard deviation of the 1kg samples?

5.   Suppose  there  is  a  problem  with  your  counting  apparatus.    For  each  1kg sample, a grain is lost.  With this adjustment, recalculate the true grade of the lot.

Recalculate the standard deviation of the samples and plot the histogram of the problem data.

What do you observe?