Density Estimation - Example of Density Estimation - Script For Example

Script For Example

The following commands of the R programming language will create the figures shown above. These commands can be entered at the command prompt by using cut and paste.

library (MASS) data(Pima.tr) data(Pima.te) Pima <- rbind (Pima.tr, Pima.te) glu <- Pima d0 <- Pima == 'No' d1 <- Pima == 'Yes' base.rate.d1 <- sum(d1)/(sum(d1) + sum(d0)) glu.density <- density (glu) glu.d0.density <- density (glu) glu.d1.density <- density (glu) approxfun (glu.d0.density$x, glu.d0.density$y) -> glu.d0.f approxfun (glu.d1.density$x, glu.d1.density$y) -> glu.d1.f p.d.given.glu <- function (glu, base.rate.d1) { p1 <- glu.d1.f(glu) * base.rate.d1 p0 <- glu.d0.f(glu) * (1 - base.rate.d1) p1/(p0+p1) } x <- 1:250 y <- p.d.given.glu (x, base.rate.d1) plot (x, y, type='l', col='red', xlab='glu', ylab='estimated p(diabetes|glu)') plot (density(glu), col='blue', xlab='glu', ylab='estimate p(glu), p(glu|diabetes), p(glu|not diabetes)', main=NA) lines (density(glu), col='red')

Note that the above conditional density estimator uses bandwidths that are optimal for unconditional densities. Alternatively, one could use the method of Hall, Racine and Li (2004) and the R np package for automatic (data-driven) bandwidth selection that is optimal for conditional density estimates; see the np vignette for an introduction to the np package. The following commands of the R programming language use the npcdens function to deliver optimal smoothing. Note that the response "Yes"/"No" is a factor.

library(np) fy.x <- npcdens(type~glu,nmulti=1,data=Pima) Pima.eval <- data.frame(type=factor("Yes"), glu=seq(min(Pima$glu),max(Pima$glu),length=250)) plot (x, y, type='l', lty=2, col='red', xlab='glu', ylab='estimated p(diabetes|glu)') lines(Pima.eval$glu,predict(fy.x,newdata=Pima.eval),col="blue") legend(0,1,c("Unconditional bandwidth", "Conditional bandwidth"), col=c("red","blue"),lty=c(2,1))

The third figure uses optimal smoothing via the method of Hall, Racine, and Li (2004) indicating that the unconditional density bandwidth used in the second figure above yields a conditional density estimate that may be somewhat undersmoothed.

Estimated probability of p(diabetes=1 | glu).

Read more about this topic:  Density Estimation, Example of Density Estimation

Famous quotes containing the word script:

    If it’s a good script I’ll do it. And if it’s a bad script, and they pay me enough, I’ll do it.
    George Burns (b. 1896)