In another newsgroup recently, I suggested that the usual procedure, m +- t*s/sqrt[n], be used to get a confidence interval for the mean when the data were on a discrete 1,...,k scale. Now, there is nothing to prevent the lower limit from being < 1 or the upper limit from being > k, so a reasonable person would truncate the interval accordingly to keep it in [1,k].
Then it occurred to me that the logic of the WIlson interval for a binomial proportion, which is naturally in [0,1], could be extended to cover this problem and give an interval that is naturally in [1,k] with no need to truncate. That is, we should solve for the probabilities p1,...,pk that maximize (for an upper bound) or minimize (for a lower bound) the corresponding mean, sum j*pj, subject to 1: all pj >= 0; 2: sum pj = 1; and 3: sum ((fj - n*pj)^2 /(n*pj)) <= chi-square[k-1,alpha], where f1,...,fk are the observed frequencies and n = sum fj. How those optimizations are done is a purely numerical problem that (so far) has yielded nicely to current numerical-analytic methods.
My question is whether anyone has seen anything comparing this method to the traditional t-based method in terms of coverage probability and small-sample behavior.
Date: Nov 1, 2009 6:33 PM Author: Ray Koopman Subject: CI for the mean of a discrete variable
In another newsgroup recently, I suggested that the usual procedure, m +- t*s/sqrt[n], be used to get a confidence interval for the mean when the data were on a discrete 1,...,k scale. Now, there is nothing to prevent the lower limit from being < 1 or the upper limit from being > k, so a reasonable person would truncate the interval accordingly to keep it in [1,k].
*********************************************
A Dreadful Nightmare,
For discrete uniform Distribution the mean value, m, of any sample, whatever the size=n, is never outside [1, k]. Values ___ k=10, n=10, Pr(m<=7.0) = 0.956 ___ k=80, n=100, Pr(m<=44.29) = 0.950
Date: Nov 1, 2009 6:33 PM Author: Ray Koopman Subject: CI for the mean of a discrete variable
In another newsgroup recently, I suggested that the usual procedure, m +- t*s/sqrt[n], be used to get a confidence interval for the mean when the data were on a discrete 1,...,k scale. Now, there is nothing to prevent the lower limit from being < 1 or the upper limit from being > k, so a reasonable person would truncate the interval accordingly to keep it in [1,k].
*********************************************
My response
Itīs absolutely ridiculous and misleading to keep wrong procedures, as that Koopman advises, though historically recordable, when better ones are available. The Table below shows what the CI really are for some k and n values. The critical values are NEVER outside [1, k].
_Table:_______ Confidence Intervals for the mean values of n size random samples relative to Uniform Discrete Distribution {1 , , k}_________________________________________
REM "0-Koopman" CLS PRINT " k*n <=8000 " INPUT " K = "; k INPUT " n = "; n INPUT " all = "; all DIM w(8001) FOR rpt = 1 TO all RANDOMIZE TIMER LOCATE 14, 50: PRINT USING "########"; all - rpt s = 0 FOR i = 1 TO n ji = INT(k * RND) + 1 s = s + ji NEXT i w(s) = w(s) + 1 NEXT rpt w(1) = .025: w(2) = .975 FOR u = 1 TO 2 ww = w(u) FOR t = 0 TO 8000 wr = wr + w(t) / all IF wr > ww THEN GOTO 1 NEXT t 1 PRINT USING "#####.### .### "; t / n; wr NEXT u END
>Date: Nov 1, 2009 6:33 PM >Author: Ray Koopman >Subject: CI for the mean of a discrete variable
[snip, previous]
>Itīs absolutely ridiculous and misleading to keep wrong procedures, as that Koopman advises, though historically recordable, when better ones are available. The Table below shows what the CI really are for some k and n values. The critical values are NEVER outside [1, k].
>_Table:_______ Confidence Intervals for the mean values of n > size random samples relative to Uniform Discrete Distribution >{1 , , k}_________________________________________
[snip, rest]
Luis, Ray did not specify that the concern was for *uniform* discrete values. Please re-read the original post.
You use only the 95% CI. Ray did not specify that the CI might not be 99.9% or even more severe.
Date: Nov 1, 2009 6:33 PM Author: Ray Koopman Subject: CI for the mean of a discrete variable
In another newsgroup recently, I suggested that the usual procedure, m +- t*s/sqrt[n], be used to get a confidence interval for the mean when the data were on a discrete 1,...,k scale. Now, there is nothing to prevent the lower limit from being < 1 or the upper limit from being > k, so a reasonable person would truncate the interval accordingly to keep it in [1,k].
*********************************************
[snip, rest]
Luis, Ray did not specify that the concern was for *uniform* discrete values. Please re-read the original post.
You use only the 95% CI. Ray did not specify that the CI might not be 99.9% or even more severe.
Unlike a notorious Reader, Im not STUPID. So I do not disdain to interpret (after carefully reading the post) what the main OPīs concern is. This case is the clearest one ever found: Confidence Intervals for means, when calculated based on approximate models (see note* below), can be staying beyond a Distribution of the Random Discrete Variable, such {1. , k} in the case under study. I simply had shown that for Uniform Discrete Law data the anomaly never occurs if an exact model is chosen. OR ARE YOU, Ulrich, persuaded that in general is acceptable that the C.I. can be <1 and/or >k? Never, ever!
*Note: The Kopmanīs solution approximate the EXACT Distribution (whatever) to a Normal, which by greater disgrace is CONTINUOS. Therefore the treatment uses the Studentīs Law.