So starting with the n = 1,000,000,000 I took 100 samples of size t = 1000 up to n = 1,000,100,000.
For convenience in identifying the number of primes in each sample, I took them in strict sequence with the accompanying values listed below. As the primes are themselves distributed in a random fashion this would seem permissible in this instance.
Alternatively  at least in theory  100 repeated random samples of 1000 (with replacement) could be taken within the same range i.e. 100,000,000 to 100,100,000. However in practice this would be very difficult.
So the first row for example represents the 10 samples in sequence from 100,000,000 to 100,010,000 with the last row, for example representing the final 10 sample values from 100,090,000 to 100,100,000.
54

56

57

55

57

61

57

56

47

51

61

55

57

49

43

54

56

43

58

54

51

56

51

49

43

54

56

43

58

54

60

56

55

57

54

52

59

56

51

56

49

43

59

64

55

63

62

53

49

51

63

54

40

54

51

56

52

54

42

57

53

73

55

50

54

53

61

49

52

56

54

59

44

57

50

56

56

53

52

54

57

56

52

54

63

43

54

52

51

48

49

55

57

52

54

52

56

48

56

66

Now in general terms, the mean number of primes in each sample can be approximated as t/log n
= 1000/18.42 = 54.29 (approx)
This equates well with actual value averaged over the 100 sample results above = 54.11.
Much more problematic however is the provision of a general formula to approximate the standard deviation (for all values of n).
Though I experienced doubts on several occasions with respect to my initial "hunch", repeated empirical testing seems to suggest it as perhaps the simplest and best estimate,
i.e. √{t/(2log n)}
This would give the standard deviation as 5.21 and compares well enough with the estimated standard deviation (based on the 100 sample values) = 5.46. This does not of course constitute a proof, and indeed a much greater degree of sampling would be required to truly establish it as the most likely estimate.
However in principle by now using the normal distribution, we could estimate the probabilities associated for example of sample values lying within any prescribed distance from the mean (on both sides).
For example we would expect for the above a little in excess of 2/3 of sample values to lie within 1 standard deviation of the mean value.
This would suggest therefore the probability that 2/3 of sample values (for frequency of primes occurring) would lie in the range of 49  59 (approx).
Addendum (5/3/2016). Having returned to this issue in recent days, I feel I can bring more clarity to a situation that I did not eally feel had been properly dealt with, first time around.
Though I was hoping that the standard deviation would correspond to √{t/(log n)}, I was led  largely through the empirical evidence of a small sample  to adjust it somewhat to fit the data.
However on reflection this was not warranted. Even just a few stray "outliers" with respect to this data would have a vey large influence on the standard deviation. Therefore it was unrealistic to expect that the empirical example would fit in with theoretical explanations.
The Erdős–Kac Theorem states that if ω(n) is the number of (distinct) prime factors of n, the probability distribution of
is the normal distribution.
In like manner I am suggesting that if ω(n) is the number of primes in a sample of size t. (i.e. where samples are taken in the region of n) the probaibility distribution of
ω(n)  (t/log t)
Addendum (5/3/2016). Having returned to this issue in recent days, I feel I can bring more clarity to a situation that I did not eally feel had been properly dealt with, first time around.
Though I was hoping that the standard deviation would correspond to √{t/(log n)}, I was led  largely through the empirical evidence of a small sample  to adjust it somewhat to fit the data.
However on reflection this was not warranted. Even just a few stray "outliers" with respect to this data would have a vey large influence on the standard deviation. Therefore it was unrealistic to expect that the empirical example would fit in with theoretical explanations.
The Erdős–Kac Theorem states that if ω(n) is the number of (distinct) prime factors of n, the probability distribution of
is the normal distribution.
In like manner I am suggesting that if ω(n) is the number of primes in a sample of size t. (i.e. where samples are taken in the region of n) the probaibility distribution of
ω(n)  (t/log t)
√{(t/logt)}
is the normal distribution.
The Erdős–Kac Theoremwould suggest that where the average number of (distinct) primes is approximately 100 (with standard deviation 10), then we would expect just alittle more than 2/3 of all factors to lie within 1 standard deviation of 100 either side of the mean (i.e. between 90 and 110).
In like manner if the average mean value of the number of primes in each sample is 100 (where samples are taken in the region of n) then we would likewise expect again that in a little more than 2/3 of samples, the number of primes would lie within 1 standard deviation (i.e. 10) of 100 (i.e. between 90 and 110).
No comments:
Post a Comment