Monday, July 27, 2009

Calculation of SD

SD is a relatively precise method of defining variability. Providing the variability forms a normal distribution (as does fibre samples - generally speaking) then the outcome of calculating SD provides us with a reliable measurement of variation. According to Chebyshev's Therorum, about 68% of the variation lies within one standard deviation either side of the mean. In other words, the higher the SD, the wider the distribution of values - the greater the variation (in fibre diameter).

CV is used when we wish to compare degree of variability with unrelated sets of data. For example, comparing the degree of variation between the Euro and the US dollar, or comparing the variation in price of shares in two different companies.

CV is calculated by dividing the standard deviation by the mean, then multiplyed by 100 (SD/mean x 100).

Alpaca fibre samples with the same degree of variation will always have the same SD, however, the CV will change according to the average (say AFD).

I have had many debates on this matter, even with some civil servants with agriculture departments. However, showing how we manually calculate SD and CV on two samples soon proves the point.
Firstly, alpaca breeders use measurements of variation to identify the spread or distribution of fibers from the AFD. In other words, whether the alpaca will have a wide histogram or a narrow one. By the way, the less variation in fiber diameter, the better the fiber is to process, relatively speaking (as well as being regarded as genetically superior.

Having said that, it is appropriate to calculate SD and CV for two imaginary samples of fibers. For ease of calculation, the samples will have a ridiculously small number of fibers. While we use computers to calculate these stat's, I will do it manually.

The first sample has 5 fibers, each with the following microns: 18, 19, 19, 20 & 21. The AFD of this sample is therefore 19.4 microns

We calculate the SD as follows =

1/ obtain the sum of the squares for each of the data values (eg 324 + 361 + 361 + 400 + 441 = 1887)

2/ square the sum of the data values and divide by the number of values (eg 18 + 19 + 19 + 20 + 21 = 97, thence 97 x 97 divided by 5 = 1881.8)

3/ subtract 2/ from 1/, then divide the answer by the number of values less 1 (eg 1887 - 1881.8 = 5.2, thence 5.2 divided by 4 = 1.3)

4/ obtain the square root of 3/ (eg, the square root of 1.3 = 1.14

The SD of the sample is therefore 1.14

Now take a second sample of fibers with the same degree of variation (distribution of fibers from the mean) Lets say the microns of the five fibers are 23, 24, 24, 25 & 26. (AFD of 24.4)

The calculations for SD of this second sample are:
1/ 2982
2/ 2976.8
3/ 1.3
4/ 1.14 - The SD is also 1.14

The SD is the same because they are both the same degree of variation from the AFD.

If we take a sample with a greater spread of fibers, the SD will be higher, for example, fiber microns of 23, 24, 24, 25 & 29. (AFD of 25. microns)

1/ 3147
2/ 3125
3/ 5.5
4/ 2.3 - The SD is 2.3

Now we can calculate CV (SD divided by the AFD x 100)

The first sample, therefore, has a CV of 5.9%, the second sample has a CV of 4.7%. The variation is the same, just the AFD is different.

This is where the problem lies.

Lets take two alpacas with identical variation, say, SD of 4.7. One has AFD of 22.0 microns, the other is 27.0 microns. Their CV's are therefore 21.4 and 17.4.

The problem is that many breeders are selecting low CV alpacas, only because their AFD is high. In fact they are selecting an alpaca that is not only high in micron, but can also have a high distribution of fiber diameter. Using CV, therefore, can conceal the fact that an alpaca has a high number of very coarse fibers. Paul Vallely

No comments: