Example 4: censored payments to coverage terms¶
Real claim extracts rarely show ground-up losses: anything below the
deductible was never reported, and anything at the limit is capped. This
example takes exactly that kind of data through lossmodels — back to the
ground-up scale, through a censoring-aware fit, and out to repriced coverage
terms and their aggregate distribution. Every number is the output of this
fixed-seed run, pinned by a regression test in the lossmodels suite.
The data you actually have¶
Ground-up truth is Lognormal(7.4, 1.1); the policy pays claims net of a 500 deductible up to a 10,000 maximum payment:
import numpy as np
import lossmodels as lm
rng = np.random.default_rng(7)
true = lm.Lognormal(7.4, 1.1)
x = true.sample(6000, rng=rng) # two years of ground-up claims
payments = np.clip(x - 500.0, 0.0, 10_000.0)
obs = payments[payments > 0] # below-deductible: never reported
# 6,000 ground-up -> 5,140 observed payments, 278 of them capped;
# 14.3% of claims are invisible to the data
Back to the ground-up scale¶
One call restores the estimation-ready triple — ground-up values, per-claim left-truncation points (the deductible), and censoring flags (the capped payments) — per the truncation and censoring conventions:
values, trunc, cens = lm.payments_to_ground_up(obs, deductible=500.0,
max_payment=10_000.0)
A Kaplan–Meier fit on that triple is the nonparametric check that the plumbing is right — it should track the conditional survival \(S(t)/S(500)\) of the true law, and it does:
times, surv = lm.kaplan_meier(values, truncation=trunc, censored=cens)
# S(2,000): KM 0.4854 truth 0.4975
# S(8,000): KM 0.0872 truth 0.0867
The naive fit versus the right fit¶
The common mistake is to add the deductible back and fit as if the data were complete. That gets the mean roughly right and the tail badly wrong, because the missing small claims and the capped large ones both squeeze the fitted dispersion:
naive = lm.fit_lognormal(obs + 500.0)
fitc = lm.fit_mle_censored(lm.Lognormal, values, initial_params=[7.0, 1.0],
truncation=trunc, censored=cens)
# true (mu, sigma) = (7.400, 1.100)
# naive = (7.649, 0.833) <- sigma crushed by 24%
# censored = (7.370, 1.112) <- recovered
Everything downstream — layer prices, tail quantiles, reinsurance — runs off
sigma, which is exactly the parameter the naive fit destroys.
Reprice the terms¶
With a credible ground-up severity, coverage alternatives are closed-form
LEV arithmetic (see coverage semantics:
the second Layer argument is the width). Compare the current terms with a
proposal that doubles the limit and funds it with a higher deductible:
from lossmodels.coverage import Layer, OrdinaryDeductible
sev = fitc
cur = Layer(sev, 500.0, 10_000.0) # current: 10,000 xs 500
prop = Layer(sev, 1000.0, 20_000.0) # proposed: 20,000 xs 1,000
cur.mean(), prop.mean() # -> 2,122.75 vs 1,978.08 per ground-up claim
OrdinaryDeductible(sev, 500.0).loss_elimination_ratio() # -> 16.0%
OrdinaryDeductible(sev, 1000.0).loss_elimination_ratio() # -> 28.8%
The aggregate picture¶
Discretize each layer (the coverage transforms expose cdf, so
discretize_severity takes them directly, zero-atom and all) and convolve
with the ground-up frequency. One numerical note earns its keep here: at
3,000 expected claims the Panjer recursion’s mass at zero underflows —
panjer_recursion now raises and says so — and the FFT route is the right
tool:
from lossmodels.aggregate import (discretize_severity, fft_aggregate_poisson,
stop_loss_from_pmf, tvar_from_pmf, var_from_pmf)
for lay in (cur, prop):
pmf = discretize_severity(lay, h=250.0, max_loss=25_000.0)
agg = fft_aggregate_poisson(lm.Poisson(3000.0), pmf, n_steps=65_536)
current |
proposed |
|
|---|---|---|
payments per year |
2,551 |
1,983 |
aggregate mean |
6,365,033 |
5,931,669 |
P99 |
6,803,250 |
6,448,250 |
TVaR₉₉ |
6,868,594 |
6,525,867 |
stop-loss at 6.6M |
9,445 |
92 |
(Payment counts follow from Poisson thinning — the payment frequency is exactly \(3{,}000 \cdot S(d)\) — and the aggregate means reconcile with Wald’s identity to within the documented downward discretization bias.)
The proposal cuts expected cost 6.8% and the 1-in-100 by 5.2%: the higher deductible more than funds the doubled limit at this severity. And because the retained tail now sits well inside 6.6M, an aggregate stop-loss at that attachment goes from a 9,445 pure premium to essentially free — the kind of statement only the full aggregate distribution can make.