Page 1 of 1

Unexpected BoxPlot

Posted: Mon Aug 06, 2007 4:09 pm
by 9348140
Hi world,

i create a box plot based on the sorted data

0 (19 times)
6.6
7.1
9.0
14.2
14.2
16.6

This creates an box from 0 to ~ 1.66. If I understand the source code right, the 1.66 should be an interpolation.

What I don't understand now is, why the upper whisker is drawn from 1.66 DOWN TO 0 (running through the box). Can anyone explain, why this should be correct (or what is wrong)?

(using TeeChart Pro 7.12)

Thanks for any helpfull hint
Martin

Posted: Tue Aug 07, 2007 8:24 am
by narcis
Hi Martin,

Can you please read this thread about a similar question?

Hope this helps!

Posted: Tue Aug 07, 2007 9:54 am
by 9348140
Hi NarcĂ­s,

similar - but it's not same (as far as I understand). I got a boxplot like this:

----------
|~~~~|
----------

where ~~~~ is the whisker! Q3 is bigger than the Adjacent3 in this case.

Posted: Tue Aug 07, 2007 11:12 am
by Marjan
Hi, Martin

Looking at your data I get the following results:

Code: Select all

median = 0.0
25th percentile = 0.0
75th percentile = 1.65
IQR = 1.65
lower inner fence = 25th PCT - 1.5*IQR = -2.475
upper inner fence = 75th PCT + 1.5*IQR = 4.125
lower adjacent point, defined as smallest value above lower inner fence. In this case, 0.0.
upper adjacent point, defined as largest value below upper inner fence. In this case 0.0.
Now, from box plot definition (see
http://cnx.org/content/m10215/latest/ ) the plot is constructed as:
1) box, lower limit 25th pct, upper limit 75th pct, in this case, the box is drawn from 0.0 to 1.65.
2) median line, drawn at 0.0
3) lower whisker at lower adjacent point, in this case 0.0
4) upper whisker at upper adjacent point, in this case 0.0
The "problem" is different programs use different algorithms to calculate percentiles (IQR). If I take the data to Excel, I get IQR = 0.0. SPSS in this case returns 3.3 and TeeChart 1.65. I guess we could add different percentile calculation methods to existing code. We'll log this to our wish list for next TeeChart release. In the meantime the best workaround is to manually calculate necessary statistics outside TeeChart and pass calculated values to BoxPlot series.[/url]

Posted: Tue Aug 07, 2007 12:46 pm
by 9348140
Hi Marjan,

I found a couple of discussion about the calculation of Q1, Q3 (and with it IQR) in the web before starting this thread (e.g. http://www.maths.murdoch.edu.au/units/s ... smore.html). Even if this situation doesn't satisfy at all - I think your proposal should slove the problem from your point of view).

But if you calculate Q1 and Q3 using any type of interpolation - does the given definition for the adjacents correspond to the meaning of the whiskers? What is the INTERPRETATION of a boxplot where the upper adjacent is smaller than Q3? Shouldn't Q1 to Q3 be a subset of the lower and the upper adjacent point?

Reading your answer I guess that you check SPSS. How does SPSS paint the whisker? (I currently have no active licence)

Posted: Tue Aug 07, 2007 1:54 pm
by Marjan
Hi, Martin.

Actually, I amd using NCSS. For percentile it uses the following formula:
The 100pth percentile is computed as
Zp = (1-g)X[k1] + gX[k2]
where k1 equals the integer part of p(n+1), k2=k1+1, g is the fractional part of p(n+1), and X[k] is the kth observation when the data are sorted from lowest to highest.

This formula is slightly different from the one TeeChart uses so you get different percentile and IQR values. Here are the results I get in NCSS:

Code: Select all

median = 0.0 
25th percentile = 0.0 
75th percentile = 3.3
IQR = 3.3
lower inner fence = 25th PCT - 1.5*IQR = -4,95
upper inner fence = 75th PCT + 1.5*IQR = 8,25 
lower adjacent point, defined as smallest value above lower inner fence. In this case, 0.0. 
upper adjacent point, defined as largest value below upper inner fence. In this case 7.1. 
So, the box is drawn from 0 to 3.3, lower whisker is drawn at 0.0 and upper whisher at 7.1, which is ok, as by definition upper whisker position is less or equal to upper inner fence.
Shouldn't Q1 to Q3 be a subset of the lower and the upper adjacent point?
Yes, I could limit the lower inner fence upper limit to Q1 and upper fence lower limit to Q3. But I'll have to check if this is valid.