Box Plot Quartile Issue

Viewing 2 posts - 1 through 2 (of 2 total)
  • Author
    Posts
  • #1411 Score: 0
    Kathy Shafer
    Participant

    Hi. I had an interesting thing happen today. I entered 11 data points into a table, created the dot plot and added a box plot. The first and third quartiles were not what I expected. It appears that the median is not being excluded when they are calculated.

    Here is an example of two small data sets.

    Thanks,
    Kathy

    #1413
    Bill Finzer
    Keymaster

    Hi Kathy,

    Thanks for this question. It comes up fairly often.

    John Tukey, father of exploratory data analysis, invented box plots in 1970, long before computers with graphical user interfaces were available, let alone commonplace. Datasets were often small enough to be dealt with by hand. Tukey wanted graphical representations that could easily be created and understood with just paper and pencil. Thus his method for constructing a box plot didn’t require any computation, facilitated by the “removal” of the median value in a distribution when finding the “Q2 or Q4” values as the median of what remains.

    But you probably knew all that! 😉

    Tukey’s method comes up with an approximation to the 25th or 75th percentile of the distribution. CODAP (and Fathom) instead calculate the actual 25th or 75th percentile for their box plots. (See Wikipedia for computational methods.) This seems appropriate in an age when doing things by hand is rarely called for. And, of course, with any reasonably sized dataset, the two methods yield nearly, if not exactly, the same result.

    As an aside, I don’t find box plots useful for characterizing a single distribution. They come into their own when comparing two or, especially, many distributions.

    Bill

Viewing 2 posts - 1 through 2 (of 2 total)
  • You must be logged in to reply to this topic.