Validation¶
Every claim below is enforced by a test in a package’s suite — this page is the map, with the source of truth and the file that pins it. It exists because “the tests pass” is invisible from outside; this is what they pass against. Two known divergences are stated at the end, because a validation record that only reports agreement is advertising.
Published literature¶
The chain ladder with Mack standard errors reproduces the numbers printed
in Mack (1993) on the Taylor & Ashe dataset: total reserve 18,680,856
(to 1e-6) and total standard error 2,447,095 (to 0.5%), total CV
13.1% (actuarialpy/tests/test_mack.py). The same file transcribes
Mack’s Theorem 3/4 recursion independently and demands agreement to
1e-10, and derives a complete 3×3 triangle by hand (σ² = 2/3 exactly,
mse(o2) = 6016/9 exactly).
The on-level parallelogram reproduces the classic textbook case — one
+10% change mid-year, annual policies, calendar-year period — as
1.1/1.0125 to machine precision (ratingmodels/tests/test_onlevel.py),
and the closed-form geometry is pinned against brute-force 20,001-point
numerical integration over random rate histories.
Return-level confidence intervals implement Coles (2001) §4.3.3 (GPD,
including the binomial exceedance-rate term) and §3.3.3 (GEV), with the
point estimates pinned to invert the fitted tail exactly
(extremeloss/tests/evt/).
Independent libraries¶
statsmodels. GLM estimation is delegated to statsmodels, and the
adapter contract is tested by fitting statsmodels independently on the
same design and demanding identical coefficients and standard errors;
predict_interval matches get_prediction to 1e-6
(ratingmodels/tests/test_statsmodels_crosscheck.py).
chainladder-python. On the shared canonical dataset (GenIns ==
Taylor & Ashe): development factors and ultimates agree to machine
precision; the total Mack standard error agrees to ~0.2%, with the
residual traced to a documented methodological choice (see Known
divergences)
(actuarialpy/tests/test_chainladder_equivalence.py).
SciPy. genpareto/genextreme are the estimation engines for the
tail fits, and the observed-information covariances are checked against
the GPD’s known Fisher-information asymptotics
(avar(ξ) = (1+ξ)²/n and companions).
Closed forms and hand derivations¶
Where a closed form exists, the numerical machinery must reproduce it:
lognormal parameter standard errors σ/√n and σ/√(2n) at 1e-3 (the
observed information is exact there); the exponential rate’s se = r/√n
(the regression test for a finite-difference step-scaling bug the
closed-form check caught); the memoryless mean excess; the GPD’s
closed-form mean excess; the limited-expected-value identity
e(d)·S(d) + LEV(d) = E[X] across four severity families; and the
exponential loss-elimination-ratio delta-method standard error against
its analytic derivative at 1e-5.
Empirical coverage of confidence intervals¶
Every interval shipped is validated the gold-standard way — simulate from known truth, count containment — with the measured numbers stated rather than hidden behind wide acceptance bands:
Interval |
Nominal |
Measured |
Where |
|---|---|---|---|
GLM |
95% |
≈ nominal |
|
|
95% |
≈ nominal |
|
GPD return level (Wald) |
95% |
≈ 91–92% |
|
GEV return level (Wald) |
95% |
≈ nominal band |
|
Simulation mean / quantile CIs |
95% |
≈ nominal |
|
Mack prediction errors (10×5 trapezoid) |
z ~ N(0,1) |
mean 0.04, sd 1.08, 93.9% in ±1.96 |
|
The GPD line is the honest one: symmetric Wald intervals on a right-skewed return-level sampling distribution run modestly below nominal at moderate exceedance counts; profile likelihood is the known better tool where the last percent matters, and the test documents this rather than widening its bands to hide it.
Cross-package conformance¶
The pooling seam (sf + mean_excess) is cross-validated against fully
independent machinery — lossmodels.Layer(d, ∞).mean() computes
E[(X−d)+] by a different code path and must agree to 1e-6 — and
against raw data: an EmpiricalSeverity charge must equal the plain
sample mean of (x − d)+ times frequency exactly. Every worked example
on this site pins its printed numbers with a regression test in the
hosting package’s suite.
Known divergences¶
Final-period σ² in the Mack chain ladder. chainladder-python’s default extrapolates by log-linear regression; this suite uses Mack’s min-rule from the 1993 paper. The totals differ by ~0.2% on Taylor & Ashe, and this suite’s figure is the one the paper prints. Neither is wrong; the choice is documented in both places.
GPD return-level Wald coverage. Measured ≈ 91–92% at nominal 95% with ~400 exceedances, as stated above.