# History of Statistics - Origins in Probability

Origins in Probability

Main article: History of probability See also: Timeline of probability and statistics

The use of statistical methods dates back to least to the 5th century BCE. The historian Thucydides in his History of the Peloponnesian War describes how the Athenians calculated the height of the wall of Platea by counting the number of bricks in an unplastered section of the wall sufficiently near them to be able to count them. The count was repeated several times by a number of soldiers. The most frequent value (in modern terminology - the mode ) so determined was taken to be the most likely value of the number of bricks. Multiplying this value by the height of the bricks used in the wall allowed the Athenians to determine the height of the ladders necessary to scale the walls.

In the Indian epic - the Mahabharata (Book 3: The Story of Nala) - King Rtuparna estimated the number of fruit and leaves (2095 fruit and 50,000,000 - five crores - leaves) on two great branches of a Vibhitaka tree by counting them on a single twig. This number was then multiplied by the number of twigs on the branches. This estimate was later checked and found to be very close to the actual number. With knowledge of this method Nala was subsequently able to regain his kingdom.

The earliest writing on statistics was found in a 9th century book entitled: "Manuscript on Deciphering Cryptographic Messages", written by Al-Kindi (801–873 CE). In his book, Al-Kindi gave a detailed description of how to use statistics and frequency analysis to decipher encrypted messages, this was the birth of both statistics and cryptanalysis.

The Trial of the Pyx is a test of the purity of the coinage of the Royal Mint which has been held on a regular basis since the 12th century. The Trial itself is based on statistical sampling methods. After minting a series of coins - originally from ten pounds of silver - a single coin was placed in the Pyx - a box in Westminster Abbey. After a given period - now once a year - the coins are removed and weighed. A sample of coins removed from the box are then tested for purity.

The Nuova Cronica, a 14th century history of Florence by the Florentine banker and official Giovanni Villani, includes much statistical information on population, ordinances, commerce and trade, education, and religious facilities and has been described as the first introduction of statistics as a positive element in history, though neither the term nor the concept of statistics as a specific field yet existed. But this was proven to be incorrect after the rediscovery of Al-Kindi's book on frequency analysis.

The arithmetic mean, although a concept known to the Greeks, was not generalised to more than two values until the 16th century. The invention of the decimal system by Simon Stevin in 1585 seems likely to have facilitated these calculations. This method was first adopted in astronomy by Tycho Brahe who was attempting to reduce the errors in his estimates of the locations of various celestial bodies.

The idea of the median originated in Edward Wright's book on navigation (Certaine Errors in Navigation) in 1599 in a section concerning the determination of location with a compass. Wright felt that this value was the most likely to be the correct value in a series of observations.

John Graunt in his book Natural and Political Observations Made upon the Bills of Mortality estimated the population of London in 1662 from parish records. He knew that there were around 13,000 funerals per year in London and that three people died per eleven families per year. He estimated from the parish records that the average family size was 8 and calculated that the population of London was about 384,000. Laplace in 1802 estimated the population of France with a similar method.

The mathematical methods of statistics emerged from probability theory, which can be dated to the correspondence between Pierre de Fermat and Blaise Pascal (1654). Christiaan Huygens (1657) gave the earliest known scientific treatment of the subject. Jakob Bernoulli's Ars Conjectandi (posthumous, 1713) and Abraham de Moivre's The Doctrine of Chances (1718) treated the subject as a branch of mathematics. In his book Bernoulli introduced the idea of representing complete certainty as one and probability as a number between zero and one.

Galileo struggled with the problem of errors in observations and had vaguely formulated the principle that the most likely values of the unknowns would be those that made the errors in all the equations reasonably small. The formal study of theory of errors may be traced back to Roger Cotes' Opera Miscellanea (posthumous, 1722). Tobias Mayer, in his study of the libration of the moon (Kosmographische Nachrichten, Nuremberg, 1750), invented the first formal method for estimating the unknown quantities by generalized the averaging of observations under identical circumstances to the averaging of groups of similar equations.

The first example of what later became known as the normal curve was studied by Abraham de Moivre who plotted this curve on November 12, 1733. de Moive was studying the number of heads that occurred when a 'fair' coin was tossed.

A memoir - An attempt to show the advantage arising by taking the mean of a number of observations in practical astronomy - prepared by Thomas Simpson in 1755 (printed 1756) first applied the theory to the discussion of errors of observation. The reprint (1757) of this memoir lays down the axioms that positive and negative errors are equally probable, and that there are certain assignable limits within which all errors may be supposed to fall; continuous errors are discussed and a probability curve is given. Simpson discussed several possible distributions of error. He first considered the uniform distribution and then the discrete symmetric triangular distribution followed by the continuous symmetric triangle distribution..

Ruđer Bošković in 1755 based in his work on the shape of the earth proposed in his book De Litteraria expeditione per pontificiam ditionem ad dimetiendos duos meridiani gradus a PP. Maire et Boscovicli that the true value of a series of observations would be that which minimises the sum of absolute errors. In modern terminology this value is the median.

Johann Heinrich Lambert in his 1765 book Anlage zur Architectonic proposed the semicircle as a distribution of errors:

with -1 ≤ x ≤ 1.

Pierre-Simon Laplace (1774) made the first attempt to deduce a rule for the combination of observations from the principles of the theory of probabilities. He represented the law of probability of errors by a curve and deduced a formula for the mean of three observations.

Laplace in 1774 noted that the frequency of an error could be expressed as an exponential function of its magnitude once its sign was disregarded. This distribution is now known as the Laplace distribution.

Lagrange proposed a parabolic distribution of errors in 1776:

with -1 ≤ x ≤ 1.

Laplace in 1778 published his second law of errors wherein he noted that the frequency of an error was proportional to the exponential of the square of its magnitude. This was subsequently rediscovered by Gauss (possibly in 1795) and is now best known as the normal distribution which is of central importance in statistics. This distribution was first referred to as the normal distribution by Pierce in 1873 who was studying measurement errors when an object was dropped onto a wooden base. He chose the term normal because of its frequent occurrence in naturally occurring variables.

Lagrange also suggested in 1781 two other distributions for errors - a cosine distribution

with -1 ≤ x ≤ 1 and a logarithmic distribution

with -1 ≤ x ≤ 1 where || is the absolute value of x.

Laplace gave (1781) a formula for the law of facility of error (a term due to Joseph Louis Lagrange, 1774), but one which led to unmanageable equations. Daniel Bernoulli (1778) introduced the principle of the maximum product of the probabilities of a system of concurrent errors.

Laplace, in an investigation of the motions of Saturn and Jupiter in 1787, generalized Mayer's method by using different linear combinations of a single group of equations.

In 1802 Laplace estimated the population of France to be 28,328,612. He calculated this figure using the number of births in the previous year and census data for three communities. The census data of these communities showed that they had 2,037,615 persons and that the number of births were 71,866. Assuming that these samples were representative of France, Laplace produced his estimate for the entire population.

The method of least squares, which was used to minimize errors in data measurement, was published independently by Adrien-Marie Legendre (1805), Robert Adrain (1808), and Carl Friedrich Gauss (1809). Gauss had used the method in his famous 1801 prediction of the location of the dwarf planet Ceres. The observations that Gauss based his calculations on were made by the Italian monk Piazzi. Further proofs were given by Laplace (1810, 1812), Gauss (1823), Ivory (1825, 1826), Hagen (1837), Bessel (1838), Donkin (1844, 1856), Herschel (1850), Crofton (1870), and Thiele (1880, 1889).

The term probable error (der wahrscheinliche Fehler) - the median deviation from the mean - was introduced in 1815 by the German astronomer Frederik Wilhelm Bessel.

Antoine Augustin Cournot in 1843 was the first to use the term median (valeur médiane) for the value that divides a probability distribution into two equal halves.

Other contributors to the theory of errors were Ellis (1844), De Morgan (1864), Glaisher (1872), and Giovanni Schiaparelli (1875). Peters's (1856) formula for, the "probable error" of a single observation was widely used and inspired early robust statistics (resistant to outliers: see Peirce's criterion).

In the 19th century authors on statistical theory included Laplace, S. Lacroix (1816), Littrow (1833), Dedekind (1860), Helmert (1872), Laurant (1873), Liagre, Didion, De Morgan, Boole, Edgeworth, and K. Pearson.

Gustav Theodor Fechner used the median (Centralwerth) in sociological and psychological phenomena. It had earlier been used only in astronomy and related fields. Francis Galton used the English term median for the first time in 1881 having earlier used the terms middle-most value in 1869 and the medium in 1880.

Adolphe Quetelet (1796–1874), another important founder of statistics, introduced the notion of the "average man" (l'homme moyen) as a means of understanding complex social phenomena such as crime rates, marriage rates, and suicide rates.

The first tests of the normal distribution were invented by the German statistician Wilhelm Lexis in the 1870s. The only data sets available to him that he was able to show were normally distributed were birth rates.

Francis Galton studied a variety of human characteristics - height, weight, eyelash length among others - and found that many of these could be fitted to a normal curve distribution.

Francis Galton in 1907 submitted a paper to Nature on the usefulness of the median. He examined the accuracy of 787 guesses of the weight of an ox at a country fair. The actual weight was 1208 pounds: the median guess was 1198. The guesses were markedly non-normally distributed.

The Norwegian Anders Nicolai Kiær introduced the concept of stratified sampling in 1895. Arthur Lyon Bowley introduced random sampling in 1906. Jerzy Neyman in 1934 showed that stratified random sampling was in general a better method of estimation than purposive (quota) sampling.

The 5% level of significance appears to have been introduced by Fisher in 1925. Fisher stated that deviations exceeding twice the standard deviation are regarded as significant. Before this deviations exceeding three times the probable error were considered significant. For a symmetrical distribution the probable error is half the interquartile range. The upper quartile of a standard normal distribution lies between 0.66 and 0.67 its probable error is approximately 2/3 of a standard deviation. It appears that Fisher's 5% criterion was rooted in previous practice.

In 1929 Wilson and Hilferty re examined Pierce's data from 1873 and discovered that it was not actually normally distributed.