Unexpected BoxPlot

TeeChart VCL for Borland/CodeGear/Embarcadero RAD Studio, Delphi and C++ Builder.
Post Reply
martin
Newbie
Newbie
Posts: 3
Joined: Wed Nov 22, 2006 12:00 am

Unexpected BoxPlot

Post by martin » Mon Aug 06, 2007 4:09 pm

Hi world,

i create a box plot based on the sorted data

0 (19 times)
6.6
7.1
9.0
14.2
14.2
16.6

This creates an box from 0 to ~ 1.66. If I understand the source code right, the 1.66 should be an interpolation.

What I don't understand now is, why the upper whisker is drawn from 1.66 DOWN TO 0 (running through the box). Can anyone explain, why this should be correct (or what is wrong)?

(using TeeChart Pro 7.12)

Thanks for any helpfull hint
Martin

Narcís
Site Admin
Site Admin
Posts: 14730
Joined: Mon Jun 09, 2003 4:00 am
Location: Banyoles, Catalonia
Contact:

Post by Narcís » Tue Aug 07, 2007 8:24 am

Hi Martin,

Can you please read this thread about a similar question?

Hope this helps!
Best Regards,
Narcís Calvet / Development & Support
Steema Software
Avinguda Montilivi 33, 17003 Girona, Catalonia
Tel: 34 972 218 797
http://www.steema.com
Image Image Image Image Image Image
Instructions - How to post in this forum

martin
Newbie
Newbie
Posts: 3
Joined: Wed Nov 22, 2006 12:00 am

Post by martin » Tue Aug 07, 2007 9:54 am

Hi Narcís,

similar - but it's not same (as far as I understand). I got a boxplot like this:

----------
|~~~~|
----------

where ~~~~ is the whisker! Q3 is bigger than the Adjacent3 in this case.

Marjan
Site Admin
Site Admin
Posts: 745
Joined: Fri Nov 07, 2003 5:00 am
Location: Slovenia
Contact:

Post by Marjan » Tue Aug 07, 2007 11:12 am

Hi, Martin

Looking at your data I get the following results:

Code: Select all

median = 0.0
25th percentile = 0.0
75th percentile = 1.65
IQR = 1.65
lower inner fence = 25th PCT - 1.5*IQR = -2.475
upper inner fence = 75th PCT + 1.5*IQR = 4.125
lower adjacent point, defined as smallest value above lower inner fence. In this case, 0.0.
upper adjacent point, defined as largest value below upper inner fence. In this case 0.0.
Now, from box plot definition (see
http://cnx.org/content/m10215/latest/ ) the plot is constructed as:
1) box, lower limit 25th pct, upper limit 75th pct, in this case, the box is drawn from 0.0 to 1.65.
2) median line, drawn at 0.0
3) lower whisker at lower adjacent point, in this case 0.0
4) upper whisker at upper adjacent point, in this case 0.0
The "problem" is different programs use different algorithms to calculate percentiles (IQR). If I take the data to Excel, I get IQR = 0.0. SPSS in this case returns 3.3 and TeeChart 1.65. I guess we could add different percentile calculation methods to existing code. We'll log this to our wish list for next TeeChart release. In the meantime the best workaround is to manually calculate necessary statistics outside TeeChart and pass calculated values to BoxPlot series.[/url]
Marjan Slatinek,
http://www.steema.com

martin
Newbie
Newbie
Posts: 3
Joined: Wed Nov 22, 2006 12:00 am

Post by martin » Tue Aug 07, 2007 12:46 pm

Hi Marjan,

I found a couple of discussion about the calculation of Q1, Q3 (and with it IQR) in the web before starting this thread (e.g. http://www.maths.murdoch.edu.au/units/s ... smore.html). Even if this situation doesn't satisfy at all - I think your proposal should slove the problem from your point of view).

But if you calculate Q1 and Q3 using any type of interpolation - does the given definition for the adjacents correspond to the meaning of the whiskers? What is the INTERPRETATION of a boxplot where the upper adjacent is smaller than Q3? Shouldn't Q1 to Q3 be a subset of the lower and the upper adjacent point?

Reading your answer I guess that you check SPSS. How does SPSS paint the whisker? (I currently have no active licence)

Marjan
Site Admin
Site Admin
Posts: 745
Joined: Fri Nov 07, 2003 5:00 am
Location: Slovenia
Contact:

Post by Marjan » Tue Aug 07, 2007 1:54 pm

Hi, Martin.

Actually, I amd using NCSS. For percentile it uses the following formula:
The 100pth percentile is computed as
Zp = (1-g)X[k1] + gX[k2]
where k1 equals the integer part of p(n+1), k2=k1+1, g is the fractional part of p(n+1), and X[k] is the kth observation when the data are sorted from lowest to highest.

This formula is slightly different from the one TeeChart uses so you get different percentile and IQR values. Here are the results I get in NCSS:

Code: Select all

median = 0.0 
25th percentile = 0.0 
75th percentile = 3.3
IQR = 3.3
lower inner fence = 25th PCT - 1.5*IQR = -4,95
upper inner fence = 75th PCT + 1.5*IQR = 8,25 
lower adjacent point, defined as smallest value above lower inner fence. In this case, 0.0. 
upper adjacent point, defined as largest value below upper inner fence. In this case 7.1. 
So, the box is drawn from 0 to 3.3, lower whisker is drawn at 0.0 and upper whisher at 7.1, which is ok, as by definition upper whisker position is less or equal to upper inner fence.
Shouldn't Q1 to Q3 be a subset of the lower and the upper adjacent point?
Yes, I could limit the lower inner fence upper limit to Q1 and upper fence lower limit to Q3. But I'll have to check if this is valid.
Marjan Slatinek,
http://www.steema.com

Post Reply