Apache OpenOffice (AOO) Bugzilla – Full Text Issue Listing 
Summary:  NORMSDIST, POISSON statistical functions are not accurate  

Product:  Calc  Reporter:  open_me <rtvvupxyay3oa5b>  
Component:  programming  Assignee:  oc  
Status:  CLOSED FIXED  QA Contact:  issues@sc <issues>  
Severity:  Trivial  
Priority:  P3  CC:  ambrogio.de.lorenzo, issues, rb.henschel  
Version:  OOo 2.0.4  
Target Milestone:    
Hardware:  All  
OS:  All  
Issue Type:  PATCH  Latest Confirmation in:    
Developer Difficulty:    
Issue Depends on:  97091, 100119  
Issue Blocks:  18704  
Attachments: 

Description
open_me
20060829 17:27:15 UTC
Created attachment 38835 [details]
Results of the Replication
Confirmed in OOo 2.0.2 with the spreadsheet provided. Hi, There was a small error in the sheet provided. I corrected this, an added a column where i divided the OOo Calc value by the exact value. Where this does not equal one (1) there is a problem (marked with blue). I will attatch this sheet. So, there is a problem with: =POISSON(134;200;1) (should equal 0,000000445617 but gives an error) =POISSON(200;200;1) (should equal 0,518797 but gives an error) =POISSON(250;200;1) (should equal 0,999715 but gives an error) =NORMSDIST(8,2) (should equal 1,20194E016 but is 8% lower) =NORMSDIST(8,32) (should equal 5,20557E017 but is 6% higer) Confirmed this isue... Created attachment 40765 [details]
Enhanced sheet
Hi Eike, please have a look at this one. Maybe something can be changed based on the openformula spec. ? Frank Sorry for "dependency spam" when changing from "depends on" to "blocks" issue 18704. Propably not doable in time frame for 3.0, retargeting to 3.1 *** Issue 91269 has been marked as a duplicate of this issue. *** *** Issue 91269 has been marked as a duplicate of this issue. *** The wrong results in NORMSDIST are due to cancellation for small negative values, where gauss() is near 0.5 The problem can be solved in two ways: (1) Use NORMSDIST(x)= 0.5*ERFC(x/SQRT(2)). Unfortunaly ERFC is only an addin function, see my issue 97091. (2) Use NORMSDIST(x) = 0.5+0.5*GetLowRegIGamma(0.5,0.5*x*x) for x>=0 = 0.5*GetUpRegIGamma(0.5,0.5*x*x) for x<0 I would prefer the solution (1), because GetLowRegIGamma and GetUpRegIGamma use iterations which might be slow in some cases and erfc uses a short polynomial. The improvement of the accuracy of NORMSDIST is included in the patch in issue 100119. The problems in POISSON have two reasons: The implementation calculated the factorial in the formula as separate subterm, which will overflow for large x. It calculated exp(lambda), which will underflow for large lambda. The second reason was not only in the cumulative part but in the mass part too. (The other problem in the mass part had been fixed in issue 26022 already.) To avoid the overflow in factorial, I change the calculation order. To avoid the underflow in exp(lambda), I made a case distinction. For large lambda a different method is used. That is not as accurate as the iteration in the case of small lambda, but it still gives 1213 significant digits. Created attachment 61713 [details]
Avoid overflow and underflow in POISSON
Created attachment 61725 [details]
Comparison with Excel and Gnumeric; example with large and with very large lambda
Thank you! Scheduling for inclusion in CWS odff06. In cws odff06: revision 271396 sc/source/core/tool/interpr3.cxx I'm impressed :) Thanks again! Btw, just changed bool kum to bool bCumulative and used GetBool() instead of GetDouble() for the argument and adapted the condition from (kum == 0.0) to (!bCumulative). Reassigning to QA for verification. verified in internal build cws_odff06 