lossmodels¶
Loss-distribution modeling: fit severity and frequency distributions — including to claims data net of deductibles and limits — and combine them into an aggregate loss distribution.
The package is organized into severity, frequency, aggregate, estimation,
empirical, and coverage modules. Fitted models expose a consistent
pdf/cdf/sf/quantile/mean/sample interface; the estimation layer
fits and compares them; the aggregate layer convolves a frequency and a
severity model into the aggregate loss for a block.
Distributions¶
The continuous severity inventory covers the Appendix A families of Loss
Models: From Data to Decisions (Klugman, Panjer & Willmot) — gamma, lognormal, Weibull, the transformed-beta
family (Burr, inverse Burr, Pareto II/Lomax, loglogistic, paralogistic, …),
inverse distributions, and the (Klugman) generalized Pareto — plus
SplicedSeverity for body–tail constructions. Frequencies cover Poisson,
negative binomial, binomial, geometric, and logarithmic, with ZeroTruncated
and ZeroModified wrappers for the (a, b, 1) class.
Because the literature parameterizes these families several different ways,
the table below is the authority: each constructor’s exact form, each pinned
against its scipy.stats equivalent by the package’s conformance test suite
(CDF/pmf agreement to \(10^{-10}\)). Two constructors do not use the
Appendix A parameters as of 0.6.1 and are flagged.
Class |
Form |
|
Notes |
|---|---|---|---|
|
\(F(x) = 1 - e^{-\lambda x}\), mean \(1/\lambda\) |
|
⚠ rate, not Appendix A’s mean-\(\theta\); pass |
|
shape \(\alpha\), scale \(\theta\) |
|
Appendix A |
|
\(\log X \sim N(\mu, \sigma^2)\) |
|
Appendix A; mind scipy’s \((s, \mathrm{scale})\) form |
|
\(F(x) = 1 - e^{-(x/\mathrm{lam})^{k}}\) |
|
⚠ Appendix A’s family with renamed parameters: \(k \leftrightarrow \tau\), \(\mathrm{lam} \leftrightarrow \theta\) — lam is the scale, not a rate |
|
Type I, \(x \ge \theta\) |
|
tail-only law; see the naming trap in Conventions |
|
Lomax, \(x > 0\) |
|
the distribution Loss Models calls “Pareto” |
|
\(F(x) = \dfrac{(x/\theta)^{\gamma}}{1 + (x/\theta)^{\gamma}}\) |
|
Appendix A |
|
Type XII (Singh–Maddala) |
|
Appendix A; scipy’s \((c, d) = (\gamma, \alpha)\) |
|
Klugman transformed-beta |
|
not the EVT GPD — see Conventions |
|
shape \(\alpha\), scale \(\theta\) |
|
Appendix A |
|
mean \(\lambda\) |
|
|
|
failures before the \(r\)-th success |
|
scipy form, not Loss Models’ \((r, \beta)\); \(\beta = (1-p)/p\), \(p = 1/(1+\beta)\) |
|
support \(\{0, 1, \ldots\}\) |
|
not |
|
|
Fitting¶
Complete-data fitting is one call per family, or one call across families:
import lossmodels as lm
x = lm.Lognormal(mu=8.6, sigma=1.4).sample(20_000, rng=1)
fit = lm.fit_lognormal(x) # closed-form MLE
best = lm.fit_best_severity(x) # nine families, ranked by AIC
best["best_name"] # 'lognormal'
lm.goodness_of_fit(best["best_model"], x, k=2) # loglik, AIC/BIC, KS, AD, CvM
fit_best_severity fits every registered family (exponential, gamma,
lognormal, Weibull, Pareto I, Pareto II, loglogistic, inverse gamma, Burr) by
MLE and ranks by AIC or BIC; candidates that fail to fit are skipped.
Method-of-moments alternatives (method="moments") and frequency selection
(fit_best_frequency) follow the same pattern.
Fitting under deductibles and limits¶
Real claim data is rarely ground-up and complete: a deductible left-truncates (losses below it never enter the data) and a policy limit right-censors (a capped payment says only that the loss was at least the cap). Fitting a complete-data likelihood to such values is biased — often badly. Every severity fitter therefore accepts the individual-data likelihood of Loss Models: with truncation points \(t_i\) and censoring indicators \(\delta_i\),
The practical entry point converts per-payment data into the
(values, truncation, censored) triple:
import lossmodels as lm
# payments net of an 800 deductible, capped at a 20,000 maximum payment
values, truncation, censored = lm.payments_to_ground_up(
payments, deductible=800, max_payment=20_000,
)
fit = lm.fit_lognormal(values, truncation=truncation, censored=censored)
best = lm.fit_best_severity( # selection under the same likelihood
values, truncation=truncation, censored=censored,
)
With both keywords omitted, every fitter runs its original complete-data path
unchanged. The exponential and Pareto Type I keep closed forms in the
censored/truncated case; gamma, lognormal, Weibull, and the transformed-beta
fitters use the generic machinery (fit_mle_censored,
censored_log_likelihood), which works for any model exposing pdf and
cdf.
Diagnostics follow the data: aic/bic/goodness_of_fit accept the same
keywords and score with the individual-data likelihood; pit_values returns
the probability-integral transform \((F(x_i)-F(t_i))/(1-F(t_i))\), exactly
Uniform(0, 1) under the true model for uncensored data with arbitrary
truncation points; ks_statistic uses the PIT under truncation and compares
against the Kaplan–Meier estimate (kaplan_meier, product-limit under
left truncation and right censoring) when censoring is present.
Aggregate losses¶
CollectiveRiskModel composes a frequency and a severity into the aggregate
loss \(S = \sum_{j=1}^{N} X_j\), by simulation or by discretized recursion:
import lossmodels as lm
crm = lm.CollectiveRiskModel(
frequency=lm.Poisson(lam=120),
severity=lm.Lognormal(mu=8.6, sigma=1.4),
)
s = crm.sample(100_000, rng=7) # one generator threads through both draws
crm.mean(), crm.variance() # exact compound moments
Simulation-based measures (var, tvar, stop_loss,
limited_expected_value on the aggregate side) accept the same rng
argument, and the empirical estimators follow the ecosystem VaR/TVaR
convention — see Conventions.
discretize_severity (midpoint by default) prepares severity PMFs for
Panjer/FFT-style recursion.
Coverage modifications¶
The coverage module applies policy terms to a ground-up severity:
OrdinaryDeductible, PolicyLimit, and Layer(d, u) with per-loss payment
\(\min((X-d)_+, u)\) — note u is the maximum payment (layer width), not
the upper attachment; a 5M xs 1M layer is Layer(1_000_000, 5_000_000).
Layer moments are computed deterministically from
\(E[Y^2] = \int 2\,y\,S(y)\,dy\).
See the API reference below for the full surface; each object’s docstring carries its own usage.
API reference¶
lossmodels: actuarial loss distributions, aggregate modeling, and fit diagnostics.
The full public API is surfaced here for convenience, so common entry points are
available directly as lossmodels.Lognormal, lossmodels.fit_best_severity,
lossmodels.goodness_of_fit, etc. Submodule imports
(from lossmodels.severity import Lognormal) continue to work unchanged.
- class SeverityModel[source]¶
Bases:
ABCBase class for severity (loss size) distributions.
All severity models must implement:
sample
mean
variance
- excess_loss(d: float, n_sim: int = 100_000) float[source]¶
E[(X - d)+] computed deterministically from the survival function.
For a nonnegative loss random variable X,
E[(X - d)+] = integral_d^infinity S_X(x) dx
where S_X(x) = 1 - F_X(x).
This remains well defined even when E[X] does not exist, provided the tail integral from d to infinity converges. The
n_simargument is retained for backward compatibility but is no longer used.
- limited_expected_value(d: float, n_sim: int = 100_000) float[source]¶
E[min(X, d)] computed deterministically from the survival function.
For a nonnegative loss random variable X,
E[min(X, d)] = integral_0^d S_X(x) dx
where S_X(x) = 1 - F_X(x).
The
n_simargument is retained for backward compatibility but is no longer used.
- ppf(p)[source]¶
Alias for
quantile()(inverse CDF).
- quantile(p)[source]¶
Inverse CDF (quantile / Value-at-Risk).
Numerically inverts
cdf. Subclasses with a closed-form inverse override this for speed and accuracy. Returns a Pythonfloatfor a scalarpand anumpy.ndarrayfor array-likep.
- class Exponential(rate: float)[source]¶
Bases:
SeverityModelExponential severity model.
Parameterization¶
X ~ Exponential(rate)
Support: x >= 0
Mean = 1 / rate Variance = 1 / rate^2
- param rate:
Rate parameter (lambda), with rate > 0.
- type rate:
float
- class Gamma(alpha: float, theta: float)[source]¶
Bases:
SeverityModelGamma severity model.
Parameterization¶
X ~ Gamma(shape=alpha, scale=theta) Support: x > 0
- param alpha:
Shape parameter, with alpha > 0.
- type alpha:
float
- param theta:
Scale parameter, with theta > 0.
- type theta:
float
- quantile(p)[source]¶
Inverse CDF (quantile / Value-at-Risk).
Numerically inverts
cdf. Subclasses with a closed-form inverse override this for speed and accuracy. Returns a Pythonfloatfor a scalarpand anumpy.ndarrayfor array-likep.
- class Lognormal(mu: float, sigma: float)[source]¶
Bases:
SeverityModelLognormal severity model.
Parameterization¶
If Y = log(X) ~ Normal(mu, sigma^2), then X is Lognormal(mu, sigma). Support: x > 0
- param mu:
Mean of log(X).
- type mu:
float
- param sigma:
Standard deviation of log(X), with sigma > 0.
- type sigma:
float
- quantile(p)[source]¶
Inverse CDF (quantile / Value-at-Risk).
Numerically inverts
cdf. Subclasses with a closed-form inverse override this for speed and accuracy. Returns a Pythonfloatfor a scalarpand anumpy.ndarrayfor array-likep.
- class Pareto(alpha: float, theta: float)[source]¶
Bases:
SeverityModelPareto Type I severity model.
Parameterization¶
X ~ Pareto(alpha, theta) Support: x >= theta
- Density:
f(x) = alpha * theta^alpha / x^(alpha + 1), x >= theta
- param alpha:
Shape parameter, with alpha > 0.
- type alpha:
float
- param theta:
Scale (minimum) parameter, with theta > 0.
- type theta:
float
- quantile(p)[source]¶
Inverse CDF (quantile / Value-at-Risk).
Numerically inverts
cdf. Subclasses with a closed-form inverse override this for speed and accuracy. Returns a Pythonfloatfor a scalarpand anumpy.ndarrayfor array-likep.
- class SplicedSeverity(body, tail, threshold: float, weight: float)[source]¶
Bases:
SeverityModelTwo-piece spliced severity: a body below a threshold, a tail above it.
The density is
f(x) = w * f_body(x) / F_body(u), 0 < x <= u f(x) = (1 - w) * f_tail(x), x > u
where
uis the threshold andw = P(X <= u)is the body mass. The body is renormalized onto(0, u]and the tail must be a distribution supported on[u, inf)withF_tail(u) = 0(e.g. a GPD with locationufromextremeloss, or aParetowiththeta = u). The CDF isF(x) = w * F_body(x) / F_body(u), x <= u F(x) = w + (1 - w) * F_tail(x), x > u
which is continuous at
u(both pieces equalwthere). The density may jump atu; this is the ordinary (unconstrained) spliced model, not a smooth splice.The result is an ordinary severity model – it exposes
sample(size)andmean(), so it drops straight back intorisksimandextremelossas a tail-corrected severity.- Parameters:
body (severity model) – Body distribution. Must implement
cdf,pdf,quantileandlimited_expected_value(everylossmodelsseverity qualifies).tail (tail distribution on
[u, inf)) – Must implementcdf(withcdf(u) == 0),pdfandsample(size).quantileenables an exact spliced quantile;mean/varianceenable the spliced moments.threshold (float) – Split point
u > 0.weight (float) – Body mass
w = P(X <= u)in(0, 1).
- quantile(p)[source]¶
Inverse CDF (quantile / Value-at-Risk).
Numerically inverts
cdf. Subclasses with a closed-form inverse override this for speed and accuracy. Returns a Pythonfloatfor a scalarpand anumpy.ndarrayfor array-likep.
- class Weibull(k: float, lam: float)[source]¶
Bases:
SeverityModelWeibull severity model.
Parameterization¶
X ~ Weibull(shape=k, scale=lam) Support: x > 0
- param k:
Shape parameter, with k > 0.
- type k:
float
- param lam:
Scale parameter, with lam > 0.
- type lam:
float
- quantile(p)[source]¶
Inverse CDF (quantile / Value-at-Risk).
Numerically inverts
cdf. Subclasses with a closed-form inverse override this for speed and accuracy. Returns a Pythonfloatfor a scalarpand anumpy.ndarrayfor array-likep.
- class Burr(alpha: float, theta: float, gamma: float)[source]¶
Bases:
SeverityModelBurr (Type XII, Singh-Maddala) severity.
X ~ Burr(alpha, theta, gamma), support x > 0:
F(x) = 1 - [1 / (1 + (x/theta)^gamma)]^alpha E[X^k] = theta^k Gamma(1 + k/gamma) Gamma(alpha - k/gamma) / Gamma(alpha), for -gamma < k < alpha * gamma.
- quantile(p)[source]¶
Inverse CDF (quantile / Value-at-Risk).
Numerically inverts
cdf. Subclasses with a closed-form inverse override this for speed and accuracy. Returns a Pythonfloatfor a scalarpand anumpy.ndarrayfor array-likep.
- class InverseBurr(tau: float, theta: float, gamma: float)[source]¶
Bases:
SeverityModelInverse Burr (Dagum) severity.
X ~ InverseBurr(tau, theta, gamma), support x > 0:
F(x) = [ (x/theta)^gamma / (1 + (x/theta)^gamma) ]^tau E[X^k] = theta^k Gamma(tau + k/gamma) Gamma(1 - k/gamma) / Gamma(tau), for -tau*gamma < k < gamma.
- quantile(p)[source]¶
Inverse CDF (quantile / Value-at-Risk).
Numerically inverts
cdf. Subclasses with a closed-form inverse override this for speed and accuracy. Returns a Pythonfloatfor a scalarpand anumpy.ndarrayfor array-likep.
- class GeneralizedPareto(alpha: float, theta: float, tau: float)[source]¶
Bases:
SeverityModelGeneralized Pareto severity (Klugman transformed-beta form; NOT the EVT GPD).
X ~ GeneralizedPareto(alpha, theta, tau), support x > 0:
F(x) = Beta(tau, alpha; x/(x+theta)) (regularized incomplete beta) E[X^k] = theta^k Gamma(tau + k) Gamma(alpha - k) / (Gamma(alpha) Gamma(tau)), for -tau < k < alpha.
The extreme-value GPD (peaks-over-threshold tail) lives in
extremeloss; this is the three-parameter loss-severity distribution from Loss Models.- quantile(p)[source]¶
Inverse CDF (quantile / Value-at-Risk).
Numerically inverts
cdf. Subclasses with a closed-form inverse override this for speed and accuracy. Returns a Pythonfloatfor a scalarpand anumpy.ndarrayfor array-likep.
- class ParetoII(alpha: float, theta: float)[source]¶
Bases:
SeverityModelPareto (Type II, Lomax) severity – the Loss Models two-parameter “Pareto”.
- X ~ Pareto(alpha, theta), support x > 0.
f(x) = alpha theta^alpha / (x + theta)^(alpha+1) F(x) = 1 - [theta / (x + theta)]^alpha E[X^k] = theta^k k! / [(alpha-1)…(alpha-k)] (positive integer k < alpha) E[X] = theta / (alpha - 1), alpha > 1.
Note: this is distinct from
lossmodels.Pareto, which is the Pareto Type I (single-parameter) distribution with support x >= theta. Appendix A lists this Type II / Lomax form under the name “Pareto”; the Type I appears under “Single-Parameter Pareto”.- quantile(p)[source]¶
Inverse CDF (quantile / Value-at-Risk).
Numerically inverts
cdf. Subclasses with a closed-form inverse override this for speed and accuracy. Returns a Pythonfloatfor a scalarpand anumpy.ndarrayfor array-likep.
- class InversePareto(tau: float, theta: float)[source]¶
Bases:
SeverityModelInverse Pareto severity.
- X ~ InversePareto(tau, theta), support x > 0.
f(x) = tau theta x^(tau-1) / (x + theta)^(tau+1) F(x) = [x / (x + theta)]^tau E[X^k] = theta^k Gamma(tau + k) Gamma(1 - k) / Gamma(tau), -tau < k < 1.
The mean (k = 1) lies on the boundary of the moment range and does not exist, so
mean()andvariance()always raise.- quantile(p)[source]¶
Inverse CDF (quantile / Value-at-Risk).
Numerically inverts
cdf. Subclasses with a closed-form inverse override this for speed and accuracy. Returns a Pythonfloatfor a scalarpand anumpy.ndarrayfor array-likep.
- class Loglogistic(gamma: float, theta: float)[source]¶
Bases:
SeverityModelLoglogistic (Fisk) severity.
- X ~ Loglogistic(gamma, theta), support x > 0.
F(x) = (x/theta)^gamma / (1 + (x/theta)^gamma) E[X^k] = theta^k Gamma(1 + k/gamma) Gamma(1 - k/gamma), -gamma < k < gamma.
- quantile(p)[source]¶
Inverse CDF (quantile / Value-at-Risk).
Numerically inverts
cdf. Subclasses with a closed-form inverse override this for speed and accuracy. Returns a Pythonfloatfor a scalarpand anumpy.ndarrayfor array-likep.
- class Paralogistic(alpha: float, theta: float)[source]¶
Bases:
SeverityModelParalogistic severity (a Burr distribution with gamma = alpha).
- X ~ Paralogistic(alpha, theta), support x > 0.
- E[X^k] = theta^k Gamma(1 + k/alpha) Gamma(alpha - k/alpha) / Gamma(alpha),
for -alpha < k < alpha^2.
- quantile(p)[source]¶
Inverse CDF (quantile / Value-at-Risk).
Numerically inverts
cdf. Subclasses with a closed-form inverse override this for speed and accuracy. Returns a Pythonfloatfor a scalarpand anumpy.ndarrayfor array-likep.
- class InverseParalogistic(tau: float, theta: float)[source]¶
Bases:
SeverityModelInverse paralogistic severity (an inverse Burr with gamma = tau).
- X ~ InverseParalogistic(tau, theta), support x > 0.
- E[X^k] = theta^k Gamma(tau + k/tau) Gamma(1 - k/tau) / Gamma(tau),
for -tau^2 < k < tau.
- quantile(p)[source]¶
Inverse CDF (quantile / Value-at-Risk).
Numerically inverts
cdf. Subclasses with a closed-form inverse override this for speed and accuracy. Returns a Pythonfloatfor a scalarpand anumpy.ndarrayfor array-likep.
- class InverseGamma(alpha: float, theta: float)[source]¶
Bases:
SeverityModelInverse gamma (Vinci) severity.
- X ~ InverseGamma(alpha, theta), support x > 0.
F(x) = 1 - Gamma(alpha; theta/x) E[X^k] = theta^k Gamma(alpha - k) / Gamma(alpha), k < alpha.
- quantile(p)[source]¶
Inverse CDF (quantile / Value-at-Risk).
Numerically inverts
cdf. Subclasses with a closed-form inverse override this for speed and accuracy. Returns a Pythonfloatfor a scalarpand anumpy.ndarrayfor array-likep.
- class InverseWeibull(theta: float, tau: float)[source]¶
Bases:
SeverityModelInverse Weibull (log-Gompertz) severity.
- X ~ InverseWeibull(theta, tau), support x > 0.
F(x) = exp[-(theta/x)^tau] E[X^k] = theta^k Gamma(1 - k/tau), k < tau.
- quantile(p)[source]¶
Inverse CDF (quantile / Value-at-Risk).
Numerically inverts
cdf. Subclasses with a closed-form inverse override this for speed and accuracy. Returns a Pythonfloatfor a scalarpand anumpy.ndarrayfor array-likep.
- class InverseExponential(theta: float)[source]¶
Bases:
SeverityModelInverse exponential severity.
- X ~ InverseExponential(theta), support x > 0.
F(x) = exp(-theta/x) E[X^k] = theta^k Gamma(1 - k), k < 1.
The mean (k = 1) does not exist, so
mean()andvariance()raise.- quantile(p)[source]¶
Inverse CDF (quantile / Value-at-Risk).
Numerically inverts
cdf. Subclasses with a closed-form inverse override this for speed and accuracy. Returns a Pythonfloatfor a scalarpand anumpy.ndarrayfor array-likep.
- class InverseGaussian(mu: float, theta: float)[source]¶
Bases:
SeverityModelInverse Gaussian (Wald) severity, Klugman
(mu, theta)parameterization.- X ~ InverseGaussian(mu, theta), support x > 0.
E[X] = mu, Var[X] = mu^3 / theta.
Mapped to
scipy.stats.invgauss(mu=mu/theta, scale=theta).- quantile(p)[source]¶
Inverse CDF (quantile / Value-at-Risk).
Numerically inverts
cdf. Subclasses with a closed-form inverse override this for speed and accuracy. Returns a Pythonfloatfor a scalarpand anumpy.ndarrayfor array-likep.
- class LogT(r: float, mu: float, sigma: float)[source]¶
Bases:
SeverityModelLog-t severity.
If Y has a Student-t distribution with r degrees of freedom, then X = exp(sigma * Y + mu) has the log-t distribution (support x > 0). It is heavier-tailed than the lognormal; no positive moments exist, so
mean()andvariance()raise.mumay be negative.F(x) = F_r((ln x - mu) / sigma), F_r the Student-t cdf.
- quantile(p)[source]¶
Inverse CDF (quantile / Value-at-Risk).
Numerically inverts
cdf. Subclasses with a closed-form inverse override this for speed and accuracy. Returns a Pythonfloatfor a scalarpand anumpy.ndarrayfor array-likep.
- class SingleParameterPareto(alpha: float, theta: float)[source]¶
Bases:
SeverityModelSingle-Parameter Pareto severity (Pareto Type I; theta is a fixed lower bound).
- X ~ SingleParameterPareto(alpha, theta), support x > theta.
f(x) = alpha theta^alpha / x^(alpha+1), x > theta F(x) = 1 - (theta/x)^alpha, x > theta E[X^k] = alpha theta^k / (alpha - k), k < alpha.
This is the same distribution as
lossmodels.Pareto; it is provided under its Loss Models Appendix A name, where onlyalphais a true parameter –thetais the known lower truncation point set in advance.- quantile(p)[source]¶
Inverse CDF (quantile / Value-at-Risk).
Numerically inverts
cdf. Subclasses with a closed-form inverse override this for speed and accuracy. Returns a Pythonfloatfor a scalarpand anumpy.ndarrayfor array-likep.
- class Beta(a: float, b: float, theta: float)[source]¶
Bases:
SeverityModelBeta severity on (0, theta).
- X ~ Beta(a, b, theta), support 0 < x < theta.
E[X^k] = theta^k Gamma(a+b) Gamma(a+k) / (Gamma(a) Gamma(a+b+k)), k > -a. E[X] = theta a / (a + b).
- quantile(p)[source]¶
Inverse CDF (quantile / Value-at-Risk).
Numerically inverts
cdf. Subclasses with a closed-form inverse override this for speed and accuracy. Returns a Pythonfloatfor a scalarpand anumpy.ndarrayfor array-likep.
- class GeneralizedBeta(a: float, b: float, theta: float, tau: float)[source]¶
Bases:
SeverityModelGeneralized beta severity on (0, theta).
X ~ GeneralizedBeta(a, b, theta, tau), support 0 < x < theta, defined by U = (X/theta)^tau ~ Beta(a, b). Equivalently X = theta * U^(1/tau):
F(x) = Beta(a, b; (x/theta)^tau) E[X^k] = theta^k Gamma(a+b) Gamma(a + k/tau) / (Gamma(a) Gamma(a + b + k/tau)), k > -a*tau.
- quantile(p)[source]¶
Inverse CDF (quantile / Value-at-Risk).
Numerically inverts
cdf. Subclasses with a closed-form inverse override this for speed and accuracy. Returns a Pythonfloatfor a scalarpand anumpy.ndarrayfor array-likep.
- class FrequencyModel[source]¶
Bases:
ABCBase class for all frequency distributions.
All frequency models must implement: - sample - mean - variance
- class Binomial(n: int, p: float)[source]¶
Bases:
FrequencyModelBinomial frequency model.
- Parameters:
n (int) – Number of trials
p (float) – Probability of success
- class Geometric(p: float)[source]¶
Bases:
FrequencyModelGeometric frequency model.
Support starting at 0: {0, 1, 2, 3, …}
- Parameters:
p (float) – Probability of success
Notes
NumPy and SciPy define the geometric distribution on {1, 2, 3, …} as the number of trials until first success. This implementation shifts that convention by 1 so the support starts at 0, which is more natural for claim counts.
- class NegativeBinomial(r: float, p: float)[source]¶
Bases:
FrequencyModelNegative Binomial frequency model.
Parameterization¶
N = number of failures before the r-th success Support: {0, 1, 2, …}
- param r:
Number of successes, with r > 0.
- type r:
float
- param p:
Probability of success, with 0 < p <= 1.
- type p:
float
Notes
This matches SciPy’s negative binomial parameterization: scipy.stats.nbinom(r, p)
- Under this convention:
E[N] = r(1 - p) / p Var(N) = r(1 - p) / p^2
- class Poisson(lam: float)[source]¶
Bases:
FrequencyModelPoisson frequency model.
Parameterization¶
N ~ Poisson(lam)
Support: {0, 1, 2, …}
- param lam:
Expected claim count, with lam >= 0.
- type lam:
float
- class ZeroTruncated(base: FrequencyModel)[source]¶
Bases:
FrequencyModelZero-truncated version of an (a, b, 0) frequency model.
- Parameters:
base (FrequencyModel) – Any frequency model exposing
pmf,cdf,mean,variance, andsample(e.g.Poisson(2.0)). Its mass at zero is removed and the remaining probabilities are rescaled by1 / (1 - p_0).
- class ZeroModified(base: FrequencyModel, p0_modified: float)[source]¶
Bases:
FrequencyModelZero-modified version of an (a, b, 0) frequency model.
- Parameters:
base (FrequencyModel) – Any frequency model exposing
pmf,cdf,mean,variance, andsample.p0_modified (float) – The (arbitrary) probability mass placed at zero, in [0, 1). Setting it to 0 recovers the zero-truncated distribution.
- class Logarithmic(beta: float)[source]¶
Bases:
FrequencyModelLogarithmic frequency distribution (Loss Models B.3.1.3).
- N ~ Logarithmic(beta), support {1, 2, …}.
p_k = (beta / (1+beta))^k / (k ln(1+beta)), k = 1, 2, … E[N] = beta / ln(1+beta) Var[N] = beta [1 + beta - beta/ln(1+beta)] / ln(1+beta)
It is the r -> 0 limit of the zero-truncated negative binomial.
- class CollectiveRiskModel(frequency: FrequencyModel, severity: SeverityModel)[source]¶
Bases:
AggregateModelCollective risk model for aggregate loss:
S = X1 + X2 + … + XN
where: - N is the claim count random variable (frequency) - Xi are iid claim severities
Assumes: - severities are iid - N is independent of severities
- sample(size: int = 1, rng: Generator | int | None = None) ndarray[source]¶
Generate random samples of aggregate loss.
rngmay beNone(legacy globalnumpy.randomstate), anintseed, or anumpy.random.Generator; a seed or generator is threaded through both the frequency and severity draws so the whole aggregate simulation is reproducible.
- class EmpiricalSeverity(data)[source]¶
Bases:
SeverityModelEmpirical severity model based on observed loss data.
- Parameters:
data (array-like) – Observed severity values. Must be nonempty and nonnegative.
- pdf(x: float) float[source]¶
Empirical severity is discrete, so this returns the empirical point mass at x. For continuous-looking data, this will often be 0 except at exact observed values.
- quantile(p)[source]¶
Empirical quantile: the true inverse CDF
inf{x : F(x) >= p}.Returns an observed data point (
np.quantilewithmethod="inverted_cdf"), consistent with this class’s owncdfand with the ecosystem VaR convention inlossmodels.aggregate.risk_measures,risksim, andextremeloss.
- class EmpiricalFrequency(data)[source]¶
Bases:
FrequencyModelEmpirical frequency model based on observed claim count data.
- Parameters:
data (array-like) – Observed claim counts. Must be nonempty, nonnegative, and integer-valued.
- fit_exponential(data, truncation=None, censored=None) Exponential[source]¶
Fit an Exponential severity model by maximum likelihood.
For complete data the MLE is
rate_hat = 1 / mean(data). Under left truncation and right censoring the MLE remains closed form (Loss Models, ch. 11): withn_uuncensored observations,rate_hat = n_u / sum_i (x_i - t_i)
summed over all observations (censored ones contribute their censoring point).
truncation/censoredfollow the conventions oflossmodels.estimation.censoring.
- fit_gamma(data, truncation=None, censored=None) Gamma[source]¶
Fit a Gamma severity model by maximum likelihood using SciPy.
- Returns:
Fitted Gamma(alpha, theta) model.
- Return type:
Notes
This constrains loc = 0 so the support is x > 0, consistent with the severity model implementation.
- fit_lognormal(data, truncation=None, censored=None) Lognormal[source]¶
Fit a Lognormal severity model by maximum likelihood.
- If log(X) ~ Normal(mu, sigma^2), the MLEs are:
mu_hat = mean(log(data)) sigma_hat = sqrt(mean((log(data) - mu_hat)^2))
Notes
This uses the MLE version of the variance (ddof=0).
- fit_pareto(data, truncation=None, censored=None) Pareto[source]¶
Fit a Pareto Type I severity model by maximum likelihood.
For X_i ~ Pareto(alpha, theta) with support x >= theta, the MLEs are:
theta_hat = min(data) alpha_hat = n / sum(log(data / theta_hat))
- Returns:
Fitted Pareto(alpha, theta) model.
- Return type:
- fit_poisson(data) Poisson[source]¶
Fit a Poisson frequency model by maximum likelihood.
- For N_i ~ Poisson(lam), the MLE is:
lam_hat = mean(data)
Notes
An all-zero dataset is valid and yields lam_hat = 0.
- fit_weibull(data, truncation=None, censored=None) Weibull[source]¶
Fit a Weibull severity model by maximum likelihood using SciPy.
- Returns:
Fitted Weibull(k, lam) model.
- Return type:
Notes
This constrains loc = 0 so the support is x > 0, consistent with the severity model implementation.
- fit_negbinomial(data) NegativeBinomial[source]¶
Fit a Negative Binomial frequency model by numerical maximum likelihood.
Parameterization¶
N = number of failures before the r-th success Support: {0, 1, 2, …} Mean = r(1-p)/p Variance = r(1-p)/p^2
- fit_mle(model_class, data, initial_params, bounds=None, truncation=None, censored=None)[source]¶
Generic numerical maximum likelihood estimation for models with a pdf method.
- Parameters:
model_class (class) – A model class that can be instantiated as
model_class(*params)and provides a pdf(x) method.data (array-like) – Observed data.
initial_params (array-like) – Initial parameter guess for the optimizer.
bounds (list of tuple, optional) – Bounds passed to scipy.optimize.minimize.
- Returns:
Fitted model instance of type model_class.
- Return type:
object
- censored_log_likelihood(model, values, truncation=None, censored=None) float[source]¶
Log-likelihood under left truncation and right censoring.
Implements \(\sum_i [\delta_i \log f(x_i) + (1-\delta_i)\log S(x_i) - \log S(t_i)]\). With no truncation and no censoring this reduces to the ordinary log-likelihood. Requires
model.pdfandmodel.cdf.Returns
-infwhen the model assigns nonpositive density or survival where the data require it (e.g. a support violation).
- kaplan_meier(values, truncation=None, censored=None)[source]¶
Product-limit (Kaplan-Meier) survival estimate under left truncation and right censoring.
With event (uncensored) times \(y_1 < y_2 < \dots\), \(d_j\) events at \(y_j\), and risk set \(R_j = \#\{i : t_i < y_j \le x_i\}\),
\[\hat S(y) = \prod_{y_j \le y} \Big(1 - \frac{d_j}{R_j}\Big).\]Under left truncation the estimator is conditional on survival to the smallest truncation point \(t_{\min}\): it estimates \(S(y)/S(t_{\min})\). With a single common deductible this is exactly the conditional severity the data can identify. Risk sets near \(t_{\min}\) can be small when truncation points are heterogeneous; interpret the left end with care.
- Returns:
(times, survival): distinct event times and the estimated survival immediately after each.- Return type:
tuple of np.ndarray
- fit_mle_censored(model_class, values, initial_params, bounds=None, truncation=None, censored=None)[source]¶
Generic numerical MLE under left truncation and right censoring.
The censored-and-truncated analogue of
lossmodels.estimation.fit_mle():model_class(*params)must providepdfandcdf. Uses L-BFGS-B (withbounds) refined by Nelder-Mead if the gradient-based step fails to improve.- Returns:
Fitted
model_classinstance.- Return type:
object
- payments_to_ground_up(payments, deductible=0.0, max_payment=None, rtol=1e-9)[source]¶
Convert per-payment data to
(values, truncation, censored).The per-payment convention: with deductible
dand maximum paymentm(the policy limit net of the deductible), the recorded payment isY = min(X - d, m)for a ground-up lossX > d.- Parameters:
payments (array-like) – Positive payment amounts.
deductible (float or array-like) – Deductible per observation (scalar broadcasts). Default 0.
max_payment (float or array-like, optional) – Maximum payment per observation. Payments within relative tolerance
rtolof the maximum are flagged as right-censored.rtol (float) – Relative tolerance for detecting capped payments.
- Returns:
(values, truncation, censored)on the ground-up scale, ready for any fitter inlossmodels.estimation.- Return type:
tuple of np.ndarray
- fit_burr(data, truncation=None, censored=None) Burr[source]¶
Fit a Burr (Type XII) severity by numerical maximum likelihood, with optional left truncation and right censoring.
A three-parameter fit: initialized at the loglogistic special case (alpha = 1, gamma from the interquartile ratio, theta at the median).
- fit_inverse_gamma(data, truncation=None, censored=None) InverseGamma[source]¶
Fit an Inverse Gamma severity by numerical maximum likelihood, with optional left truncation and right censoring.
Initialization by moments on the uncensored values: alpha0 = 2 + m1^2/(m2 - m1^2), theta0 = m1 (alpha0 - 1).
- fit_loglogistic(data, truncation=None, censored=None) Loglogistic[source]¶
Fit a Loglogistic severity by numerical maximum likelihood, with optional left truncation and right censoring.
Initialization: theta from the median (the loglogistic median is theta) and gamma from the interquartile ratio, q75/q25 = 9**(1/gamma).
- fit_paretoII(data, truncation=None, censored=None) ParetoII[source]¶
Fit a Pareto Type II (Lomax) severity – the Loss Models “Pareto” – by numerical maximum likelihood, with optional left truncation and right censoring (deductibles and limits).
Initialization uses the method of moments on the uncensored values when the implied second moment exists, else a heavy-tail default.
- pit_values(model, data, truncation=None, censored=None) ndarray[source]¶
Probability-integral-transform values for the uncensored observations.
Each uncensored ground-up value \(x_i\) with truncation point \(t_i\) maps to
\[u_i = \frac{F(x_i) - F(t_i)}{1 - F(t_i)},\]which is Uniform(0, 1) under a correctly specified model, whatever the (possibly heterogeneous) truncation points. Requires fully uncensored data: dropping censored observations would leave the remaining PIT values uniform only on a sub-interval of (0, 1), so for censored data use
ks_statistic(), which compares against the Kaplan-Meier estimate instead.
- fit_best_severity(data, candidates=None, method='mle', criterion='aic', truncation=None, censored=None)[source]¶
Fit a set of severity models and return the best one by AIC or BIC.
- Parameters:
data (array-like) – Severity observations.
candidates (list of str, optional) – Candidate model names. Defaults to all supported severity fitters.
method ({"mle", "moments"}) – Fitting method.
criterion ({"aic", "bic"}) – Selection criterion.
- Returns:
Dictionary with keys: - “best_name” - “best_model” - “criterion” - “method” - “results”
- Return type:
dict
Notes
Models that fail to fit are skipped.
- fit_best_frequency(data, candidates=None, method='mle', criterion='aic')[source]¶
Fit a set of frequency models and return the best one by AIC or BIC.
- log_likelihood(model, data, truncation=None, censored=None) float[source]¶
Compute the log-likelihood of observed data under a fitted model.
- Parameters:
model (object) – Model instance with a pdf(x) method for continuous models or pmf(x) method for discrete models.
data (array-like) – Observed data.
truncation (array-like, optional) – Per-observation left-truncation points and right-censoring flags. When either is given the individual-data likelihood of
lossmodels.estimation.censored_log_likelihood()is used (requiresmodel.cdf).censored (array-like, optional) – Per-observation left-truncation points and right-censoring flags. When either is given the individual-data likelihood of
lossmodels.estimation.censored_log_likelihood()is used (requiresmodel.cdf).
- Returns:
Log-likelihood value.
- Return type:
float
- aic(model, data, k: int, truncation=None, censored=None) float[source]¶
Compute Akaike Information Criterion.
- Parameters:
model (object) – Fitted model.
data (array-like) – Observed data.
k (int) – Number of estimated parameters.
- Returns:
AIC value.
- Return type:
float
- bic(model, data, k: int, truncation=None, censored=None) float[source]¶
Compute Bayesian Information Criterion.
- Parameters:
model (object) – Fitted model.
data (array-like) – Observed data.
k (int) – Number of estimated parameters.
- Returns:
BIC value.
- Return type:
float
- ks_statistic(model, data, truncation=None, censored=None) float[source]¶
Kolmogorov-Smirnov distance
sup_x |F_n(x) - F(x)|(smaller is better).Most sensitive in the body of the distribution. See the module note on estimated-parameter p-values. With
truncationonly, the statistic is computed on the PIT sample ofpit_values()against the Uniform(0, 1) cdf – exact under heterogeneous truncation. With censoring present, it is the sup-distance between the Kaplan-Meier estimate and the model cdf, both conditional on exceeding the smallest truncation point.
- anderson_darling(model, data, truncation=None, censored=None) float[source]¶
Anderson-Darling statistic A^2 (smaller is better).
Weights the tails more than KS, so it is the better whole-distribution statistic for heavy-tailed loss data. With
truncationthe statistic is computed on the (exactly uniform) PIT sample ofpit_values(); censored data are not supported – useks_statistic().
- cramer_von_mises(model, data, truncation=None, censored=None) float[source]¶
Cramer-von Mises statistic W^2 (smaller is better).
With
truncationthe statistic is computed on the (exactly uniform) PIT sample ofpit_values(); censored data are not supported – useks_statistic().
- tail_quantile_table(model, data, probs=(0.90, 0.95, 0.99, 0.995)) list[source]¶
Compare fitted vs empirical high quantiles – the tail-fit check.
Returns a list of dicts with keys
prob,empirical,fitted,abs_error,rel_error. A model can win on AIC and still miss here; for tail risk (VaR / TVaR / stop-loss) this is the diagnostic that matters. Requires the model to implementquantile(p).
- goodness_of_fit(model, data, k: int, truncation=None, censored=None) dict[source]¶
One-call fit report combining relative and absolute measures.
Returns
n,log_likelihood,aic,bic(relative – compare across candidates) andks,anderson_darling,cramer_von_mises(absolute distance-to-empirical – smaller is better). See the module note on the estimated-parameter caveat for the distance statistics.With
truncation/censored, the likelihood-based measures use the individual-data likelihood;kscompares against the PIT sample (truncation only) or the Kaplan-Meier estimate (censoring present), andanderson_darling/cramer_von_misesare reported as NaN under censoring, where they are not defined here.n_uncensoredis added to the report.