actuarialpy¶
Experience analysis on a single tidy table, plus the shared numerical core the
rest of the ecosystem builds on. You build one DataFrame — claims/expense,
revenue, exposure, by period — and Experience gives you views (by,
rolling, trend, completion, seasonality, credibility, pooling) without
re-pivoting. numpy and pandas only; no scipy.
Quickstart¶
import pandas as pd
import actuarialpy as ap
df = pd.DataFrame({
"month": pd.period_range("2024-01", periods=6, freq="M").astype(str),
"product": ["PPO"] * 6,
"paid": [120_000, 118_000, 125_000, 130_000, 128_000, 135_000],
"premium": [150_000] * 6,
"member_months": [1000, 1005, 1010, 1008, 1012, 1015],
})
exp = ap.Experience(df, expense="paid", revenue="premium",
exposure="member_months", date="month")
exp.by("product") # grouped view
exp.loss_ratio # paid / premium
ap.per_exposure(df["paid"], df["member_months"]) # amount per exposure unit
ap.loss_ratio(df["paid"], df["premium"]) # as a free function
Credibility¶
Credibility primitives live here — ratingmodels and the other packages
delegate to them rather than re-implementing. Limited-fluctuation (classical)
credibility from exposure:
import actuarialpy as ap
z = ap.limited_fluctuation_z(exposure=96_000, full_credibility_standard=120_000)
# -> 0.894
Greatest-accuracy (Bühlmann–Straub) credibility across risk classes, fit straight from a tidy frame of group / value / weight:
import pandas as pd
import actuarialpy as ap
exp = pd.DataFrame({
"product": ["PPO", "PPO", "HMO", "HMO"],
"paid": [125_000, 130_000, 88_000, 91_000],
"member_months": [1010, 1008, 640, 655],
})
model = ap.BuhlmannStraub.from_frame(
exp, group="product", value="paid", weight="member_months",
)
model.k # Bühlmann k = EPV / VHM
model.z(weight=1_000) # credibility Z for 1,000 units of exposure
model.premium(risk_mean, weight) then blends a class’s own mean toward the
overall mean at that credibility.
Financial mathematics¶
Time value of money on the same numpy/pandas footing — rate conversions, present/future values, annuities-certain, loan amortization, and day-count conventions:
import actuarialpy as ap
ap.present_value(1000, 0.05, 3) # 863.84 — discount at 5% for 3 yrs
ap.future_value(1000, 0.05, 3) # 1157.63
ap.annuity_immediate(0.05, 10) # 7.7217 — PV of 1/yr, 10 yrs @ 5%
ap.annuity_due(0.05, 10) # 8.1078
# level-payment loan: 200k principal, 6% nominal, 30 years monthly
ap.level_payment(200_000, 0.06 / 12, 360) # 1199.10 per month
ap.amortization_schedule(200_000, 0.06 / 12, 360) # full schedule (DataFrame)
ap.year_fraction("2024-01-01", "2024-07-01", convention="30/360") # 0.5
Exposure and age bases¶
Exact and rounded ages, and exposure-years over a study window — the inputs an actual-to-expected study needs:
import pandas as pd
import actuarialpy as ap
ap.age("1980-06-15", "2026-06-30") # 46.04 — exact
ap.age("1980-06-15", "2026-06-30", basis="last") # 46 — age last birthday
ap.age("1980-06-15", "2026-06-30", basis="nearest") # 46 — age nearest birthday
# fraction of a study window each life is exposed
cohort = pd.DataFrame({
"entry": ["2024-03-01", "2024-01-15"],
"term": ["2025-09-01", "2025-12-31"],
})
ap.add_exposure_column(cohort, entry_col="entry", exit_col="term",
study_start="2024-01-01", study_end="2025-12-31")
Retention primitives¶
The pooling module includes two general retention-stability primitives:
retained_cv(outcomes, retention, n_units=1)— coefficient of variation of the retained aggregate ofn_unitsi.i.d. units each capped atretention.retention_for_target_cv(outcomes, n_units, target_cv, ...)— inverts it: the retention at which retained CV hits a target. The basis for a size-graded pooling schedule.
Underwriting margin and weighted rollups¶
The two-tier underwriting income statement — gross margin (revenue less
loss expense, operating expense excluded) and gain/(loss) (gross margin
less operating expense). The ratios mirror the loss_ratio /
expense_ratio / combined_ratio trio in metrics, and denominators are
explicit parameters because real exhibits mix them (loss ratio over net
revenue beside an expense ratio over gross premium); reconciliation()
reports the resulting gap in gain% = 1 − combined ratio. Domain naming is
a view concern, never a calculation concern: the profile option
renames the loss-ratio column the same way summarize_experience does
("health" → mlr, "life" → benefit_ratio), and labels renames
anything else. The full convention is on the conventions
page.
import actuarialpy as ap
uw = ap.UnderwritingSummary.from_per_exposure(
revenue_per_exposure={"premium": 400.0, "refund": -1.4},
loss_per_exposure={"claims": 340.0, "other_loss": 16.4},
expense_per_exposure=37.4,
exposure=300_000,
)
uw.loss_ratio, uw.expense_ratio, uw.combined_ratio # explicit denominators
uw.gross_margin_per_exposure, uw.gain_per_exposure # the two tiers
uw.to_frame(profile="health") # loss_ratio -> mlr, math unchanged
# grouped, from a tidy table: components summed first,
# every ratio a ratio of sums
ap.underwriting_summary(
df, groupby="cohort",
revenue_cols=["premium", "refund"], loss_cols=["claims"],
expense_cols="expense", exposure_col="member_months",
premium_col="premium",
)
# per-exposure outputs are the mechanical {name}_per_{exposure_col};
# domain names (a health shop's _pmpm) are opt-in via labels
Quantities that are already rates at the row level — rate actions,
persistency — cannot be summed. weighted_mean and weighted_summary
average them with a required, named weight and report the weight total
beside every average:
ap.weighted_summary(book, value_cols="rate_action",
weight_col="premium", groupby="cohort")
API reference¶
ActuarialPy: tools for actuarial experience analysis.
- class Experience(data: DataFrame, expense: str | list[str], revenue: str | list[str], exposure: str | list[str] | None = None, date: str | None = None, profile: str | None = None, count: str | None = None, copy: bool = False)[source]¶
Bases:
objectBind an experience dataset to its actuarial column roles.
Experienceis the recommended entry point for repeated experience-analysis workflows. It stores common column roles once and delegates calculations to the package’s free functions. The object is immutable: methods return DataFrames or newExperienceobjects rather than changing stored data in place.Bind
count(a claim or service count) to unlock the frequency-severity views:frequency_severity()anddecompose_trend()(frequency x severity, optionally x mix).fit_trend()regresses a developed trend on the bound history.Grain matters.
Experienceaggregates by summing the bound columns, so it expects rows at the grain of the exposure unit – one row per member-month, withmember_months= 1 (or the eligible fraction). If your data is long (one row per service line, so the same member-month repeats across several rows), summing the exposure column overcounts it, and every per-exposure figure – PMPM, frequency, the loss-ratio denominator – is wrong by the number of rows per member-month.Experiencedoes not detect this: it has no member key, so it cannot tell a long frame from a wide one. For long or multi-table warehouse data, either aggregate to member-month grain first, or usebind(), which sources exposure from a correctly-grained table (e.g. eligibility) viaCountand never sums a repeated column.- actual_vs_expected(expected: str | list[str], *, actual: str | list[str] | None = None, groupby: str | list[str] | None = None, exposure: str | list[str] | None = None, **kwargs: Any) DataFrame[source]¶
Summarize actual-versus-expected experience.
If
actualis omitted, the object’s bound expense columns are used.
- adjust(factors: float | int | Series | DataFrame, *, on: str | list[str] | None = None, columns: str | list[str] | None = None, by: str | list[str] | None = None, how: str = 'multiply', factor_col: str = 'factor', audit_col: str | None = None, default: float | None = None) Experience[source]¶
Return a new
Experiencewith an expense column restated by a factor.The general counterpart to
complete()anddeseasonalize(): joins a factor by the keyon(a column already in the frame, optionally withinbysegments) and multiplies – or, withhow="divide", divides – the selected column(s) in place under the same name, so every downstream view composes on the restated series.factorsis a scalar (one factor for all rows), a Series indexed byon, or a tidy DataFrame keyed byby + on.This is the spine of experience-period restatement – trend, benefit / area / demographic relativities, network discounts – where the methodology is supplied as the factors rather than encoded here. Chain freely (
exp.complete(...).adjust(trend).adjust(area, on="region")); withaudit_colthe cumulative restatement multiplier is carried across the chain, one value per row, for a reviewable audit trail. An absent key surfaces asNaNunlessdefaultis given (default=1.0to mean “no adjustment for this key”).
- by(groupby: str | list[str] | None = None, **kwargs: Any) DataFrame[source]¶
Summarize experience by optional grouping columns.
- by_band(value_col: str, bands: Any, *, labels: Any = None, **kwargs: Any) DataFrame[source]¶
Summarize experience by a size band on
value_col(seesummarize_by_band).
- by_status(status_col: str, *, entity_col: str | None = None, **kwargs: Any) DataFrame[source]¶
Summarize experience by a status column.
- claimant_concentration(claimant_col: str, *, amount_cols: str | list[str] | None = None, groupby: str | list[str] | None = None, **kwargs: Any) DataFrame[source]¶
Summarize how concentrated experience is among top claimants.
- claimants(claimant_col: str, *, amount_cols: str | list[str] | None = None, groupby: str | list[str] | None = None, exposure_col: str | None = None, **kwargs: Any) DataFrame[source]¶
Aggregate the experience to claimant/member/risk level.
- cohort(*, entity_col: str, start_date_col: str, duration_months: int = 12, groupby: str | list[str] | None = None, date_col: str | None = None, **kwargs: Any) DataFrame[source]¶
Summarize each entity’s first N months or cohort-duration window.
- complete(factors: Series, *, valuation_date: Any = None, columns: str | list[str] | None = None, development_col: str | None = None, by: str | list[str] | None = None, date_col: str | None = None) Experience[source]¶
Return a new
Experiencewith paid amounts developed to ultimate.Grosses the expense (loss / claims) columns up to estimated ultimate in place under the same names –
completed = paid / completion_factor– so downstream views (trend(),rolling(),by(), …) then run on the completed series. Each row’s development period isdevelopment_months(date, valuation_date)(the conventionmake_completion_triangle()uses), or an explicitdevelopment_col. The join is by value, so the frame’s index is irrelevant; rows past the triangle’s last development period are taken as fully complete, and only recent, immature months actually move.factorsmay be a flat Series (one pattern, fromcompletion_factors()) or a tidy per-segment table fromcompletion_factors_by(); with the latter, passbynaming the grouping column(s) to join on group plus development period. Only the numerator is developed – exposure is left untouched. This applies to the latest-diagonal shape (one row per incurred month,claimspaid-to-date as ofvaluation_date); a frame already on an ultimate basis must not be completed again.
- component_summary(component_cols: str | list[str], *, groupby: str | list[str] | None = None, exposure_col: str | None = None, **kwargs: Any) DataFrame[source]¶
Summarize component amounts, per-exposure values, and shares.
- components(component_cols: str | list[str], *, exposure_col: str | None = None, groupby: str | list[str] | None = None, date_col: str | None = None, **kwargs: Any) DataFrame[source]¶
Explain component drivers between two periods.
- credibility_weighted(groupby: str | list[str], *, z: Any, metric: str = 'loss_ratio', complement: float | None = None, out_col: str | None = None, **kwargs: Any) DataFrame[source]¶
Blend each group’s
metricwith a complement at credibilityz.Computes the grouped summary (
by()), then blendsmetrictowardcomplementusingz(seeactuarialpy.credibility_weighted_estimate()).zmay be a scalar or values aligned to the grouped rows. Whencomplementis omitted the book-level value ofmetricis used as the complement of credibility.
- decompose_trend(*, count_col: str | None = None, loss_col: str | None = None, exposure_col: str | None = None, mix_by: str | Iterable[str] | None = None, groupby: str | list[str] | None = None, period_col: str | None = None, prior_period: Any = None, current_period: Any = None, date_col: str | None = None, prior_start: Any = None, prior_end: Any = None, current_start: Any = None, current_end: Any = None, prior_filter: Any = None, current_filter: Any = None) DataFrame[source]¶
Decompose the per-exposure loss trend between two periods of the bound data.
Splits the bound frame into prior and current with the same comparison modes as
trend()–period_colwithprior_period/current_period, adate_colwith prior/current ranges (the bounddateis used when nodate_colis passed), or explicitprior_filter/current_filtermasks – then decomposes the change viadecompose_per_exposure_trend(), using the boundcount,expense(as the loss), andexposureroles. Passmix_byto add the third LMDI mix term;groupbyreports one decomposition per group.
- deseasonalize(factors: Series, *, columns: str | list[str] | None = None, freq: str = 'M', by: str | list[str] | None = None, date_col: str | None = None) Experience[source]¶
Return a new
Experiencewith the seasonal pattern divided out.Each selected column is divided by its row’s seasonal factor (as produced by
seasonality_factors()), in place under the same name, so every downstream view –trend(),rolling(),by(), and the rest – then operates on the deseasonalized series. By default the expense (loss / claims) columns are adjusted; passcolumnsto choose others. Only the numerator is touched: exposure is left alone, so a deseasonalized PMPM is simply deseasonalized claims over unchanged member months.factorsmay be a flat Series (one pattern) or a tidy per-segment table fromseasonality_factors_by(); with the latter, passbynaming the grouping column(s) to join on group plus season. Estimate factors on the broader pool, not on this object’s own (often thin) data. To put the pattern back, applyapply_seasonality()to.data.
- duration(*, entity_col: str, start_date_col: str, max_duration_month: int | None = None, date_col: str | None = None, **kwargs: Any) DataFrame[source]¶
Summarize experience by duration month since entity start.
- filter(mask: Any | None = None, *, query: str | None = None, copy: bool = True) Experience[source]¶
Return a new
Experienceobject over a filtered dataset.Use either a boolean mask or a pandas query string.
- fit_trend(*, value_col: str | None = None, exposure_col: str | None = None, date_col: str | None = None, freq: str = 'M', min_periods: int = 3, confidence: float = 0.95) TrendFit[source]¶
Fit an exponential trend to the bound experience by log-linear regression.
Defaults to the bound
expense(claims) over the boundexposure– the PMPM trend – across the bounddate; passvalue_col/exposure_colto override, or leave the exposure unbound to trend the raw amount. Returns aTrendFit(seefit_trend()). Run on completed, deseasonalized history.
- frequency_severity(*, count_col: str | None = None, loss_col: str | None = None, exposure_col: str | None = None, groupby: str | list[str] | None = None) DataFrame[source]¶
Per-group claim frequency, severity, and per-exposure loss (see
frequency_severity_summary).Uses the bound
count,expense(as the loss), andexposureroles, so the columns are specified once on the object. The identityloss_per_exposure == frequency * severityholds for every row.
- margin(groupby: str | list[str] | None = None, *, margin_col: str = 'margin', ratio_col: str = 'margin_ratio', per_exposure_col: str | None = None, **kwargs: Any) DataFrame[source]¶
Underwriting margin (revenue net of expense) by optional grouping.
Aggregates the bound expense and revenue roles with
by(), then adds the margin (total_revenue - total_expense), the margin ratio, and an optional per-exposure margin.
- pool_claimants(claimant_col: str, pooling_point: float, *, amount_cols: str | list[str] | None = None, groupby: str | list[str] | None = None, amount_name: str = 'total_expense', **kwargs: Any) DataFrame[source]¶
Aggregate to claimant level and split each claimant into pooled/excess.
Summarizes the experience to claimant grain (
claimants()) and caps each claimant’s total atpooling_point(seeactuarialpy.pool_losses()), returning pooled and excess columns for capped experience and the excess hand-off to tail modeling.
- rolling(window: int = 12, *, groupby: str | list[str] | None = None, date_col: str | None = None, **kwargs: Any) DataFrame[source]¶
Create a rolling-period experience summary.
- top_claimants(claimant_col: str, *, amount_cols: str | list[str] | None = None, amount_col: str | None = None, groupby: str | list[str] | None = None, n: int = 25, **kwargs: Any) DataFrame[source]¶
Return top claimants by amount.
- trend(*, amount_col: str | None = None, exposure_col: str | None = None, groupby: str | list[str] | None = None, date_col: str | None = None, **kwargs: Any) DataFrame[source]¶
Compare amount or per-exposure experience between two periods.
- views(views: dict[str, str | Iterable[str] | None], **kwargs: Any) dict[str, DataFrame][source]¶
Create several named grouped experience views.
- with_roles(*, data: DataFrame | None = None, expense: str | list[str] | None = None, revenue: str | list[str] | None = None, exposure: str | list[str] | None = None, date: str | None = None, profile: str | None = None, count: str | None = None, copy: bool | None = None) Experience[source]¶
Return a new
Experienceobject with updated data or roles.
- with_status(*, effective_col: str, as_of: Any, termination_col: str | None = None, first_year_months: int = 12, status_col: str = 'status', labels: dict[str, str] | None = None) Experience[source]¶
Return a new
Experiencewith a derived lifecycle status column.Derives active / first-year / termed from effective and termination dates as of a reference date (see
actuarialpy.derive_status()). Summarize the result withby_status().
- actual_to_expected(actual: Any, expected: Any) Any[source]¶
Calculate actual-to-expected: actual divided by expected.
- adjust(df: DataFrame, factors: float | int | Series | DataFrame, *, value_col: str, on: str | list[str] | None = None, by: str | list[str] | None = None, how: str = 'multiply', factor_col: str = 'factor', out_col: str | None = None, audit_col: str | None = None, default: float | None = None, copy: bool = True) DataFrame[source]¶
Multiply or divide a column by a factor joined on a key.
The general factor-application primitive behind trend, benefit / area / demographic relativities, network discounts – any per-key multiplier. The factor for each row is taken from one of:
a scalar
factors– one factor for every row (e.g. a single trend factor);a Series indexed by
on– one key column (e.g. an area factor by region);a tidy DataFrame keyed by
by + onwithfactor_col– per-segment factors (the shape the*_byestimators return).
and applied to
value_col:how="multiply"givesvalue * factor(loads, trend),how="divide"givesvalue / factor(backing a factor out).The join is by value (the frame’s index never participates); the factor table must be unique on its keys – a duplicate would fan out the data – which is enforced. An absent key gives
default(NaNwhendefaultisNone– a surfaced gap, never silently filled); passdefault=1.0when a key missing from the table should mean “no adjustment”. Withaudit_col, the cumulative net multiplier applied tovalue_colis accumulated there (factorfor multiply,1 / factorfor divide), so a chain of adjustments leaves a per-row record of total restatement.
- factor_lookup(df: DataFrame, factors: DataFrame, keys: str | Iterable[str], *, factor_col: str, default: float | None = None) ndarray[source]¶
Join a factor onto
dfby value on one or more existing key columns.The single factor-join primitive behind grouped completion, seasonality, and
adjust().factorsis a tidy table containingkeysandfactor_col; each row ofdfis matched on itskeysvalues. The factor table must be unique onkeys– a duplicate would fan rows out on the join – so this raises otherwise. Returns a float array aligned todf’s row order (the frame’s own index never participates). An absent key givesdefault(NaNwhendefaultisNone– a surfaced gap, never silently filled).
- combined_ratio(losses: Any, expenses: Any, revenue: Any) Any[source]¶
Calculate combined ratio: (losses + expenses) divided by revenue.
- expense_ratio(expenses: Any, revenue: Any) Any[source]¶
Calculate an expense ratio: expenses divided by revenue.
- frequency(claim_count: Any, exposure: Any) Any[source]¶
Calculate claim frequency: claim count divided by exposure.
- indicated_change(required: Any, current: Any) Any[source]¶
Indicated change from current to required amount.
- loss_ratio(losses_or_expenses: Any, revenue: Any) Any[source]¶
Calculate a loss ratio: losses or expenses divided by revenue.
- permissible_loss_ratio(expense_ratio: Any, profit_provision: Any = 0.0) Any[source]¶
Permissible (target / break-even) loss ratio.
PLR = 1 - expense_ratio - profit_provisionwhere both loadings are expressed as a fraction of premium. Also called the zero-margin or target loss ratio: the loss ratio at which premium exactly covers losses, expenses, and the profit/contingency provision. Works element-wise on scalars or Series. (Shops that load fixed expenses on a loss basis instead use(1 - V - Q) / (1 + G); this implements the premium-basis form.)
Calculate pure premium: losses divided by exposure.
- ratio(numerator: Any, denominator: Any) Any[source]¶
Calculate a generic ratio as numerator divided by denominator.
- required_revenue(expense: Any, target_ratio: Any) Any[source]¶
Revenue needed for an expense amount to hit a target ratio.
- safe_divide(numerator: Any, denominator: Any, *, fill_value: float = np.nan) Any[source]¶
Safely divide numerator by denominator.
The return type mirrors the input: scalars return scalars, array-likes return NumPy arrays, and pandas inputs return pandas objects with their index (and name) preserved – so results can be assigned straight back onto the source DataFrame. Zero denominators are returned as
fill_value.
- severity(losses: Any, claim_count: Any) Any[source]¶
Calculate severity: losses divided by claim count.
- class ChainLadder(age_to_age: Series, cdf: Series, completion_factors: Series, tail: float, method: str)[source]¶
Bases:
objectChain-ladder development pattern fitted from a cumulative triangle.
Fit with
fit()from a cumulative development triangle (for example the output ofmake_completion_triangle()withcumulative=True):age_to_age– link (age-to-age) factors, indexed by their starting development period.cdf– cumulative development factor to ultimate by development period, including the tail.completion_factors–1 / cdfby development period: the proportion of ultimate emerged by each development period. These are divide-convention factors in(0, 1](completed = paid / factor), so they line up withvalidate_completion_factors()and downstream completion.
Use
project()to apply the pattern to a triangle and get per-origin ultimate and IBNR.- classmethod fit(triangle: DataFrame, *, method: str = 'volume', tail: float = 1.0) ChainLadder[source]¶
Estimate the development pattern from a cumulative triangle.
methodis"volume"(volume-weighted age-to-age factors, the default) or"simple"(straight average of individual link ratios).tail(>= 1) extends development beyond the latest observed development period.
- project(triangle: DataFrame) DataFrame[source]¶
Project ultimate and IBNR per origin by applying the fitted pattern.
For each origin, takes its latest observed cumulative amount and multiplies by the cumulative development factor at that development period. Returns one row per origin with the latest development period, latest cumulative, development factor applied, ultimate, and IBNR (ultimate minus latest).
- exception InsufficientDataWarning[source]¶
Bases:
UserWarningEmitted when a segment has too little data to fit and is skipped or aggregated.
Filter it with the standard
warningsmachinery, e.g.warnings.filterwarnings("ignore", category=InsufficientDataWarning).
- chain_ladder_by(df: DataFrame, *, groupby: str | list[str], origin_col: str, valuation_col: str, amount_col: str, cumulative: bool = True, method: str = 'volume', tail: float = 1.0, on_insufficient: str = 'raise', warn: bool = True) dict[Any, ChainLadder][source]¶
Fit a chain-ladder development pattern per segment of
df.Groups
dfbygroupby, builds a development triangle for each segment (seemake_completion_triangle()), and fits aChainLadderto each. Returns{segment_key: ChainLadder}– the key is a scalar for a single grouping column, or a tuple for several.Segments too small to fit (fewer than two origins or development periods, a zero cumulative, and so on) are handled by
on_insufficient:"raise"(default): raise aValueErrornaming the failing segment."skip": omit those segments from the result."aggregate": use the pooled pattern fit on the whole frame for them.
When
on_insufficientis"skip"or"aggregate"andwarnis true, anInsufficientDataWarningnaming the affected segments is emitted;warn=Falsesuppresses it (the standardwarningsfilters also apply). To ignore thin segments entirely, useon_insufficient="skip", warn=False.
- completion_factors(triangle: DataFrame, *, method: str = 'volume', tail: float = 1.0) Series[source]¶
Completion factors by development period, via chain-ladder.
Convenience wrapper around
ChainLadder: returns the proportion of ultimate emerged by each development period (1 / cdf) estimated from a cumulative triangle. Divide-convention factors in(0, 1](completed = paid / factor). SeeChainLadderfor the full pattern and per-origin ultimate/IBNR.
- completion_factors_by(df: DataFrame, *, groupby: str | list[str], origin_col: str, valuation_col: str, amount_col: str, cumulative: bool = True, method: str = 'volume', tail: float = 1.0, on_insufficient: str = 'raise', warn: bool = True, development_name: str = 'development_month') DataFrame[source]¶
Completion factors per segment as a tidy table.
Convenience over
chain_ladder_by(): one row per (segment, development period) with the completion factor, ready to review, pivot, or join. Columns are the grouping column(s),development_name, andcompletion_factor.on_insufficientandwarnbehave as inchain_ladder_by().
- apply_completion(df: DataFrame, factors: Series | DataFrame, *, value_col: str, date_col: str | None = None, valuation_date: Any = None, development_col: str | None = None, by: str | list[str] | None = None, factor_col: str = 'completion_factor', development_name: str = 'development_month', out_col: str | None = None, copy: bool = True) DataFrame[source]¶
Develop a paid amount to estimated ultimate with completion factors.
For each row the development period is taken from
development_colif supplied, otherwise computed asdevelopment_months(df[date_col], valuation_date)– the conventionmake_completion_triangle()uses, so factors fromcompletion_factors()orcompletion_factors_by()join by construction. The completed amount ispaid / factor(the divide convention, factors in(0, 1]).factorsmay be either of:a flat Series indexed by development period (one pattern for the whole frame), or
a tidy DataFrame of per-segment factors – grouping column(s), a development-period column (
development_name) and a factor column (factor_col), the shapecompletion_factors_by()returns – joined onbyplus development period. The table must be unique onby + [development](a duplicate would fan out the data); this is checked.
The join is by value, never index alignment, so the frame’s own index is irrelevant. A row past its (group’s) largest development period is taken as fully complete (factor
1.0); a development period inside the fitted range but absent staysNaN– a surfaced gap; a row whose group is absent from the factor table staysNaN; a negative development period (incurred aftervaluation_date) raises. Supply eitherdevelopment_col, or bothdate_colandvaluation_date.
- develop_ultimate(df: DataFrame, factors: Series | DataFrame, *, method: str = 'bornhuetter_ferguson', value_col: str, date_col: str | None = None, valuation_date: Any = None, development_col: str | None = None, apriori_col: str | None = None, exposure_col: str | None = None, by: str | list[str] | None = None, factor_col: str = 'completion_factor', development_name: str = 'development_month', out_col: str | None = None, copy: bool = True) DataFrame[source]¶
Develop a paid amount to estimated ultimate by a chosen reserving method.
All methods share one input – the proportion emerged at each row’s development period, joined exactly as
apply_completion()does (flat Series or per-segment table, beyond-the-triangle rows fully emerged). They differ only in how they combine that with the paid-to-date and an a priori expectation:"chain_ladder"–paid / emerged. Ignores the a priori; equivalent toapply_completion(). Volatile for immature periods (a thin latest diagonal drives the whole tail)."bornhuetter_ferguson"–paid + apriori * (1 - emerged). Takes the unemerged portion from the a priori rather than from the data, so it is stable for green periods. Requiresapriori_col(an expected ultimate per row – an input, e.g. a plan, budget, or manual times exposure)."benktander"– one Bornhuetter-Ferguson iteration using the BF ultimate as the a priori:paid + bf * (1 - emerged). A credibility blend sitting between BF and chain ladder (weightemergedon chain ladder). Requiresapriori_col."cape_cod"– Bornhuetter-Ferguson with the a priori derived from the data: a single expected loss ratio per segment,sum(paid) / sum(exposure * emerged), times each row’s exposure. Requiresexposure_col(an on-level premium / exposure per row). The loss ratio is mechanical; the exposure base is an input.
The library applies a method; it does not pick the a priori or the exposure base. Supply either
development_color bothdate_colandvaluation_date; passbywith a per-segment factor table (and Cape Cod then derives one loss ratio per segment). Returnsdfwith anout_col(defaultf"{value_col}_ultimate").
- ibnr(completed, paid)[source]¶
IBNR as completed minus paid (the completed/paid identity).
Works element-wise on scalars or Series.
completedandpaidmust be on the same basis; the result is the amount bridging paid-to-date to ultimate.
- lag_months(incurred_date, valuation_date)¶
Whole months of development between incurred (origin) and valuation.
Either argument may be a scalar, a Series, or array-like, in any combination (e.g. a column of incurred dates against a single valuation date). The result is a Series when either argument is a Series, otherwise a scalar.
- development_months(incurred_date, valuation_date)[source]¶
Whole months of development between incurred (origin) and valuation.
Either argument may be a scalar, a Series, or array-like, in any combination (e.g. a column of incurred dates against a single valuation date). The result is a Series when either argument is a Series, otherwise a scalar.
- make_completion_triangle(df: DataFrame, *, origin_col: str, valuation_col: str, amount_col: str, cumulative: bool = True, index_name: str = 'origin_period', development_name: str = 'development_month') DataFrame[source]¶
Build a development (completion) triangle by origin period and development period.
Each cell aggregates
amount_colfor an origin month at a given valuation development period (whole months between origin and valuation, viadevelopment_months()).amount_colis treated as the incremental amount in each (origin, development period) cell; withcumulative=True– the default, and the usual basis for estimating development/completion factors – the cells are accumulated across development period. Setcumulative=Falseto return the incremental triangle, or if your input amounts are already cumulative-to-date snapshots.This consumes a compact development aggregate (one row per origin x valuation, i.e. months x months); it does not require transaction/line-level data.
- validate_completion_factors(factors: DataFrame, factor_col: str = 'completion_factor', *, method: str = 'divide') None[source]¶
Validate completion-factor values for a selected convention.
dividefactors (completed = paid / factor) should satisfy0 < factor <= 1;multiplyfactors (completed = paid * factor) should satisfyfactor >= 1. Useful as a sanity check on estimated factors before they are applied upstream.
- status_summary(df: DataFrame, *, status_col: str, entity_col: str | None = None, expense_cols: str | Iterable[str], revenue_cols: str | Iterable[str], exposure_cols: str | Iterable[str] | None = None, profile: str | None = None) DataFrame[source]¶
Summarize experience by status, optionally adding entity counts.
- summarize_experience(df: DataFrame, *, groupby: str | Iterable[str] | None = None, expense_cols: str | Iterable[str], revenue_cols: str | Iterable[str], exposure_cols: str | Iterable[str] | None = None, ratio_col: str | None = None, ratio_name: str | None = None, total_expense_name: str = 'total_expense', total_revenue_name: str = 'total_revenue', profile: str | None = None, labels: dict[str, str] | None = None) DataFrame[source]¶
Summarize experience by grouping columns.
Amounts and exposures are aggregated first. Ratios and per-exposure metrics are calculated after aggregation, which avoids averaging row-level ratios.
By default the ratio column is named
loss_ratio(general across lines of business); thehealthprofile names itmlrandlifebenefit_ratio.profileonly supplies light defaults and does not rename total expense or total revenue.
- summarize_views(df: DataFrame, *, views: dict[str, str | Iterable[str] | None], expense_cols: str | Iterable[str], revenue_cols: str | Iterable[str], exposure_cols: str | Iterable[str] | None = None, ratio_col: str | None = None, ratio_name: str | None = None, total_expense_name: str = 'total_expense', total_revenue_name: str = 'total_revenue', profile: str | None = None) dict[str, DataFrame][source]¶
Create multiple experience summary views from the same input data.
- summarize_actual_vs_expected(df: DataFrame, *, groupby: str | Iterable[str] | None = None, actual_cols: str | Iterable[str], expected_cols: str | Iterable[str], exposure_cols: str | Iterable[str] | None = None, actual_name: str = 'actual', expected_name: str = 'expected', ae_name: str = 'actual_to_expected', variance_name: str = 'variance', variance_pct_name: str = 'variance_pct') DataFrame[source]¶
Summarize actual-versus-expected results by optional grouping columns.
Actual and expected amounts are aggregated before ratios are calculated. This makes the function suitable for claim costs, benefits, expenses, revenue, or any other actual-versus-expected measure.
- summarize_claimants(df: DataFrame, *, claimant_col: str, amount_cols: str | Iterable[str], groupby: str | Iterable[str] | None = None, exposure_col: str | None = None, amount_name: str = 'total_expense') DataFrame[source]¶
Aggregate experience to claimant/member/risk level.
claimant_colcan be a member ID, policy ID, claim group ID, or another entity identifier. The function is descriptive; it does not cap, pool, or otherwise adjust the underlying amounts.
- top_claimants(df: DataFrame, *, claimant_col: str, amount_cols: str | Iterable[str] | None = None, amount_col: str | None = None, groupby: str | Iterable[str] | None = None, n: int = 25, amount_name: str = 'total_expense') DataFrame[source]¶
Return the top claimants by amount, optionally within each group.
- large_claimant_flags(df: DataFrame, *, amount_col: str = 'total_expense', thresholds: Sequence[float] = (50_000, 100_000, 250_000)) DataFrame[source]¶
Add boolean flags for claimants above one or more amount thresholds.
- claim_concentration(df: DataFrame, *, amount_col: str = 'total_expense', groupby: str | Iterable[str] | None = None, top_n: Sequence[int] = (10, 25), thresholds: Sequence[float] = (50_000, 100_000, 250_000)) DataFrame[source]¶
Summarize how concentrated total amounts are among top claimants.
The input should generally be one row per claimant within the requested grouping level, such as the output of
summarize_claimants.
- cohort_summary(df: DataFrame, *, entity_col: str, date_col: str, start_date_col: str, duration_months: int = 12, groupby: str | Iterable[str] | None = None, expense_cols: str | Iterable[str], revenue_cols: str | Iterable[str], exposure_cols: str | Iterable[str] | None = None, profile: str | None = None) DataFrame[source]¶
Summarize each entity’s first N months or cohort-duration window.
Each entity is clipped to its own first
duration_monthsmonths of duration (month 1 is the entity’s start month), aligning entities by tenure rather than calendar time. The output also reports how much of that window is actually present, so partial (not-yet-mature) cohorts can be spotted and excluded:months_observed: count of distinct duration months present (1..N).last_month: latest experience month observed; withfirst_monththis gives the available range.complete: whether the full window is present, i.e.months_observed == duration_months.
For example, to keep only cohorts with a full first year:
cohorts = exp.cohort(entity_col="group", start_date_col="effective_date") mature = cohorts[cohorts["complete"]]
- cohort_summary_by_period(cohort_df: DataFrame, *, cohort_date_col: str = 'first_month', freq: str = 'Q', entity_col: str | None = None, expense_col: str = 'total_expense', revenue_col: str = 'total_revenue', exposure_cols: str | Iterable[str] | None = None) DataFrame[source]¶
Roll entity-level cohort summaries into cohort month/quarter/year buckets.
- frequency_severity_summary(df: DataFrame, *, count_col: str, loss_col: str, exposure_col: str, groupby: str | Iterable[str] | None = None) DataFrame[source]¶
Per-group claim frequency, severity, and per-exposure loss.
Counts, losses, and exposure are aggregated first, then the rates are derived after aggregation (avoiding averaging row-level rates). The identity
loss_per_exposure == frequency * severityholds for every row:frequencyis claims per exposure unit,severityis loss per claim, andloss_per_exposureis loss per exposure unit (the pure premium).
- decompose_per_exposure_trend(prior: DataFrame, current: DataFrame, *, count_col: str, loss_col: str, exposure_col: str, on: str | Iterable[str] | None = None, mix_by: str | Iterable[str] | None = None) DataFrame[source]¶
Decompose the per-exposure loss change from
priortocurrent.With
mix_byomitted this is the two-way split: both frames are summarized withfrequency_severity_summary()(optionally by theonkeys), aligned, and the change reported two exact ways:Multiplicative trend:
loss_per_exposure_trend == frequency_trend * severity_trend, wherefrequency_trendandseverity_trendare the period-over-period ratios of frequency and severity.Additive dollars:
loss_per_exposure_change == frequency_effect + severity_effectvia a symmetric (midpoint) split, so the contributions sum exactly to the per-exposure change.
Pass
mix_by(a column or list of columns) to add a third mix component. The per-exposure loss is then decomposed into frequency, severity, and the effect of the exposure composition shifting across themix_bycells. Frequency and severity are measured within each cell (free of composition), and mix captures the aggregate movement that comes purely from the cell weights changing – the piece the two-way otherwise misattributes to frequency and severity. The split uses the LMDI (logarithmic mean Divisia index) convention, which is order-free and reconciles exactly:loss_per_exposure_trend == frequency_trend * severity_trend * mix_trendandloss_per_exposure_change == frequency_effect + severity_effect + mix_effect.A list of columns in
mix_bydefines the cells as their cross – one blended mix term, not a per-column attribution; to attribute mix to each dimension separately, run the decomposition once per dimension.onandmix_byare orthogonal:ongroups the output rows,mix_bydefines the mix cells within each group. Every cell must have positive count, loss, and exposure in both periods.
- duration_summary(df: DataFrame, *, entity_col: str, date_col: str, start_date_col: str, expense_cols: str | Iterable[str], revenue_cols: str | Iterable[str], exposure_cols: str | Iterable[str] | None = None, max_duration_month: int | None = None) DataFrame[source]¶
Summarize experience by duration month since entity start.
- rolling_summary(df: DataFrame, *, date_col: str, window: int = 12, groupby: str | Iterable[str] | None = None, expense_cols: str | Iterable[str], revenue_cols: str | Iterable[str], exposure_cols: str | Iterable[str] | None = None, min_periods: int | None = None, drop_incomplete: bool = True, ratio_col: str = 'loss_ratio') DataFrame[source]¶
Calculate rolling sums and ratios by period and optional grouping.
The output includes
period_startandperiod_end. By default only complete rolling windows are returned; for a 12-month window, the first output row appears after 12 months of data are available.
- annualized_trend(current: Any, prior: Any, months_between: float) Any[source]¶
Annualize change between two values separated by a number of months.
- midpoint_trend_factor(base_midpoint, projection_midpoint, annual_trend: Any) Any[source]¶
Trend factor between base and projection midpoints.
- period_change(current: Any, prior: Any) Any[source]¶
Calculate period-over-period change: current / prior - 1.
- project_forward(value: Any, annual_trend: Any, months: float) Any[source]¶
Project a value forward using an annual trend rate.
- fit_trend(df: DataFrame, *, value_col: str, date_col: str, exposure_col: str | None = None, freq: str = 'M', min_periods: int = 3, confidence: float = 0.95) TrendFit[source]¶
Fit an exponential trend to a rate series by log-linear regression.
Aggregates
dfto thefreqgrain (summingvalue_coland, if given,exposure_col), forms the rate –value / exposure(e.g. PMPM) whenexposure_colis supplied, otherwisevalueitself – and fitslog(rate) = intercept + slope * tby ordinary least squares, withtin years from the first period. The fitted annual trend isexp(slope) - 1.Unlike
annualized_trend()(a two-point CAGR between a single current and prior value), this uses every period, so one noisy month does not swing the estimate, and it returns goodness of fit and a confidence interval – what a developed (rather than received) trend is judged on. It does not select the trend: the window, the rate basis (allowed vs paid), any benefit leveraging, and the blend with external trends remain judgment. Run it on completed, deseasonalized history (complete -> deseasonalize -> fit_trend) so runout and seasonality do not contaminate the slope; apply the result withtrend_factor()/TrendFit.factor()oradjust().Time is measured from actual period dates, so an occasional missing period is handled correctly. Requires at least
min_periodsdistinct periods with strictly positive rates (non-positive values, which cannot be logged, raise). Returns aTrendFit.
- class TrendFit(annual_trend: float, r_squared: float, std_error: float, ci_low: float, ci_high: float, confidence: float, n_periods: int, slope: float, intercept: float)[source]¶
Bases:
objectResult of
fit_trend(): an exponential trend fitted to a rate series.annual_trendis the fitted multiplicative annual trend (exp(slope) - 1on the log scale).r_squaredis the goodness of fit,std_errorthe delta-method standard error ofannual_trend, and(ci_low, ci_high)its confidence interval (asymmetric – the endpoints are transformed from the log-scale slope interval).slopeandinterceptdescribe the underlyinglog(value) = intercept + slope * tfit withtmeasured in years from the first period.- property ci: tuple[float, float]¶
The confidence interval as a
(low, high)tuple.
- trend_factor(annual_trend: Any, months: float) Any[source]¶
Convert an annual trend rate into a trend factor over a number of months.
- trend_summary(df: DataFrame, *, period_col: str | None = None, prior_period=None, current_period=None, date_col: str | None = None, prior_start=None, prior_end=None, current_start=None, current_end=None, groupby=None, amount_col: str, exposure_col: str | None = None, prior_filter=None, current_filter=None, prior_label: str = 'prior', current_label: str = 'current') DataFrame[source]¶
Summarize current vs prior trend by optional grouping.
Supported comparison modes: -
period_col='year', prior_period=2025, current_period=2026-date_col='incurred_date'with prior/current start and end dates - explicit booleanprior_filterandcurrent_filtermasks
- component_driver_analysis(df: DataFrame, *, period_col: str | None = None, prior_period=None, current_period=None, date_col: str | None = None, prior_start=None, prior_end=None, current_start=None, current_end=None, prior_filter=None, current_filter=None, component_cols: str | Iterable[str], exposure_col: str | None = None, groupby: str | Iterable[str] | None = None) DataFrame[source]¶
Explain component drivers of change between two periods.
The primary comparison is based on component totals, or component amount per exposure when
exposure_colis supplied. The API matchestrend_summaryand supports period-column, date-range, or explicit-filter comparisons.
- component_trend(*args, **kwargs) DataFrame[source]¶
Alias for
component_driver_analysis.The preferred name is
component_driver_analysisbecause the function explains drivers of total component change, not just component-specific trend.
- summarize_components(df: DataFrame, *, groupby: str | Iterable[str] | None = None, component_cols: str | Iterable[str], exposure_col: str | None = None, total_col: str = 'total_expense', include_shares: bool = True) DataFrame[source]¶
Summarize component/category amounts, per-exposure values, and shares.
- class Buhlmann(overall_mean: float, epv: float, vhm: float, n_obs: int)[source]¶
Bases:
objectBühlmann credibility model.
This implementation assumes each risk has the same number of observations.
- Parameters:
overall_mean (float) – Estimated collective mean.
epv (float) – Estimated expected process variance (EPV).
vhm (float) – Estimated variance of hypothetical means (VHM).
n_obs (int) – Number of observations per risk.
- classmethod fit(data: Any) Buhlmann[source]¶
Fit a Bühlmann credibility model from data.
- Parameters:
data (array-like, shape (m, n)) – Observations for m risks, each with n observations.
- Returns:
Fitted Bühlmann model.
- Return type:
Notes
Estimators used:
overall_mean = mean of all observations
EPV = average of within-risk sample variances
VHM = sample variance of risk means minus EPV / n, floored at 0
- property k: float¶
K = EPV / VHM. Returns infinity when VHM = 0.
Compute the Bühlmann credibility premium
Z * risk_mean + (1 - Z) * overall_mean.- Parameters:
risk_mean (float or array-like) – Risk-specific sample mean(s).
- Returns:
Credibility-weighted premium(s).
- Return type:
float or numpy.ndarray
- property z: float¶
Credibility factor
Z = n / (n + K). Returns 0 when K is infinite.
- class BuhlmannStraub(overall_mean: float, epv: float, vhm: float, weights: Any)[source]¶
Bases:
objectBühlmann-Straub credibility model.
This implementation allows different exposure weights by risk and period.
- Parameters:
overall_mean (float) – Estimated collective mean.
epv (float) – Estimated expected process variance (EPV).
vhm (float) – Estimated variance of hypothetical means (VHM).
weights (array-like) – Total weight (exposure) for each risk.
- classmethod fit(data: Any, weights: Any) BuhlmannStraub[source]¶
Fit a Bühlmann-Straub model from observations and weights.
Accepts either a 2D array (equal period counts) or a sequence of 1D arrays per risk (unequal period counts). The estimators are the general unbiased forms
\[\hat s^2 = \frac{\sum_i\sum_j w_{ij}(X_{ij}-\bar X_i)^2}{\sum_i(n_i-1)}, \quad \hat a = \frac{\sum_i m_i(\bar X_i-\bar X)^2 - (r-1)\hat s^2} {m - \sum_i m_i^2/m},\]with \(k=\hat s^2/\hat a\) and \(Z_i=m_i/(m_i+k)\); a negative \(\hat a\) is floored at 0. For equal period counts this reduces to the usual estimator; unlike a divide-by-mean-weight approximation it stays unbiased when risks have different period counts or exposures.
- Parameters:
data (array-like) – Either shape
(r, n)(equal periods) or a sequence ofr1D arraysX_iwhose lengths may differ.weights (array-like) – Exposure weights matching the shape/structure of
data.
- classmethod from_frame(df, *, group: str, value: str, weight: str, period: str | None = None) BuhlmannStraub[source]¶
Fit from long-format data: one row per (risk, period).
- Parameters:
df (pandas.DataFrame) – Long-format observations.
group (str) – Column names for the risk identifier, the per-unit observation (e.g. loss per member-month), and the exposure weight.
value (str) – Column names for the risk identifier, the per-unit observation (e.g. loss per member-month), and the exposure weight.
weight (str) – Column names for the risk identifier, the per-unit observation (e.g. loss per member-month), and the exposure weight.
period (str, optional) – Period column; used only to order observations within a risk. The number of observations per risk may differ.
- Returns:
Fitted model with
groups_(risk labels),risk_means_, andweights(per-risk total exposure), all aligned togroups_.- Return type:
- property k: float¶
K = EPV / VHM. Returns infinity when VHM = 0.
Compute the Bühlmann-Straub premium
Z_i * risk_mean_i + (1 - Z_i) * overall_mean.- Parameters:
risk_mean (float or array-like) – Risk-specific weighted mean(s).
weight (float or array-like) – Total exposure weight(s).
- Returns:
Credibility-weighted premium(s).
- Return type:
float or numpy.ndarray
- credibility_weighted_estimate(observed: Any, complement: Any, z: Any) Any[source]¶
Blend an observed estimate with its complement at credibility
z.Returns
z * observed + (1 - z) * complement. Scalar inputs return a nativefloat;pandas.Seriesinputs return aSerieswith the index preserved; other array-like inputs return anumpy.ndarray. This is the atomic credibility operation; thezmay come from a model below, a filed credibility formula, or any other source.
- limited_fluctuation_z(exposure: Any, full_credibility_standard: float) Any[source]¶
Limited-fluctuation (classical) credibility factor – the square-root rule.
Returns
Z = min(1, sqrt(exposure / full_credibility_standard)).exposureis the volume credibility is based on (claim counts, member months, life-years, …) andfull_credibility_standardis the amount of that volume required for full (Z = 1) credibility – often a filed value. Scalars return a nativefloat;pandas.Seriesinputs return aSeries(index preserved); other array-likes return anumpy.ndarray, so credibility can be computed per group. Feed the result tocredibility_weighted_estimate()to blend experience with its complement.
- full_credibility_claims(*, confidence: float = 0.90, tolerance: float = 0.05, severity_cv: float | None = None) float[source]¶
Classical full-credibility standard, in expected number of claims.
Returns the expected claim count for full credibility under the limited-fluctuation model:
(z / k) ** 2for claim frequency, wherezis the standard-normal quantile for two-sidedconfidenceandkis thetolerance. The classic 90% / 5% choice gives about 1082 claims. Supplyingseverity_cv(the coefficient of variation of individual claim severity) inflates it to(z / k) ** 2 * (1 + severity_cv ** 2)for aggregate losses rather than pure frequency.Many shops use a filed standard instead; pass that straight to
limited_fluctuation_z().
- add_months_in_force(df: DataFrame, *, effective_col: str, period_start, period_end, termination_col: str | None = None, out_col: str = 'months_in_force', copy: bool = True) DataFrame[source]¶
Add whole months of overlap between each entity’s in-force window and a period.
The in-force window is
[effective, termination](a missing termination means the period end). The result is clipped to[period_start, period_end]and floored at 0. Month counting is inclusive of both endpoint months, so a full coverage of an N-month period returns N.
- add_tenure(df: DataFrame, effective_col: str, as_of, *, tenure_col: str = 'tenure_months', one_based: bool = False, copy: bool = True) DataFrame[source]¶
Add tenure in whole months from each entity’s effective date to
as_of.as_ofis a single reference date (e.g. the experience as-of date). Withone_based=Truean entity effective in the as-of month has tenure 1 rather than 0, matching “months of experience” conventions.
- derive_status(df: DataFrame, *, effective_col: str, as_of, termination_col: str | None = None, first_year_months: int = 12, status_col: str = 'status', labels: dict[str, str] | None = None, copy: bool = True) DataFrame[source]¶
Derive an active / first-year / termed status as of a reference date.
Classification (in precedence order):
termed: a termination date is present and on/before
as_of.first_year: not termed and tenure (
as_ofminus effective) is less thanfirst_year_months. The window is a parameter because “first year” means the first 12 months in some shops and the first policy year in others.active: in force beyond the first-year window.
labelsoptionally remaps the three canonical values, e.g.{"first_year": "First Year Account", "termed": "Term"}.
- earned_exposure(df: DataFrame, exposure_col: str, *, effective_col: str, period_start, period_end, termination_col: str | None = None, period_months: int | None = None, out_col: str | None = None, copy: bool = True) DataFrame[source]¶
Prorate a full-period exposure by the fraction of the period in force.
earned = exposure * months_in_force / period_months. Use this when each row carries a full-period exposure (e.g. annualized) that must be reduced for mid-period entry or termination. If your data is already monthly, filtering to in-force months withis_in_force()is usually simpler.
- is_in_force(df: DataFrame, *, effective_col: str, period_start, period_end, termination_col: str | None = None) Series[source]¶
Boolean Series: in force at any point during
[period_start, period_end].In force when effective on/before
period_endand the entity had not terminated beforeperiod_start(a missing termination date means still in force).
- assign_band(df: DataFrame, value_col: str, bands: Sequence[float], *, labels: Sequence[str] | None = None, band_col: str = 'band', right: bool = False, copy: bool = True) DataFrame[source]¶
Assign each row to an ordered size band based on
value_col.bandsare bin edges. For integer counts the natural form is left-closed (right=False), sobands=[0, 51, 76, 151, 251, 501, inf]yields[0, 51),[51, 76), …. A trailingfloat("inf")captures the open top band. The resulting column is an ordered categorical so downstream group-bys keep band order.
- summarize_by_band(df: DataFrame, value_col: str, bands: Sequence[float], *, labels: Sequence[str] | None = None, expense_cols: str | Iterable[str], revenue_cols: str | Iterable[str], exposure_cols: str | Iterable[str] | None = None, band_col: str = 'band', ratio_col: str | None = None, right: bool = False, profile: str | None = None) DataFrame[source]¶
Assign size bands then summarize experience grouped by band.
Returns one row per band in band order (empty bands included), with the same aggregates, loss ratio, and per-exposure metrics as
summarize_experience().
- add_margin(df: DataFrame, *, premium_col: str, expense_cols: str | Iterable[str], out_col: str = 'margin', ratio_col: str | None = None, exposure_col: str | None = None, per_exposure_col: str | None = None, copy: bool = True) DataFrame[source]¶
Add an underwriting-margin column (premium minus summed expense columns).
expense_colsis summed row-wise and may mix losses and loadings (e.g. medical/claims, retention, commission, allocated overhead). Optionally also add the margin ratio (ratio_col) and a per-exposure margin (per_exposure_col, requiresexposure_col) such as margin PMPM.
- margin(premium: Any, expenses: Any) Any[source]¶
Margin = premium - expenses, element-wise.
expensesshould already be the total of losses plus any loadings.
- margin_ratio(margin_amount: Any, premium: Any) Any[source]¶
Margin as a fraction of premium = margin / premium.
- class UnderwritingSummary(revenue: Mapping[str, float], losses: Mapping[str, float], expenses: Mapping[str, float] | float = 0.0, exposure: float | None = None, premium_label: str = 'premium', loss_ratio_denominator: str = 'total_revenue', expense_ratio_denominator: str = 'premium', gain_denominator: str = 'total_revenue')[source]¶
Bases:
objectTwo-tier underwriting income statement for a single entity or period.
- Parameters:
revenue (Mapping[str, float]) – Labeled revenue components (e.g.
{"premium": ..., "refund": ...}). Offsets such as refunds should be signed (negative). The library never interprets the labels; it only sums them.losses (Mapping[str, float]) – Labeled loss components – claim or benefit expense by whatever categories the caller uses.
expenses (Mapping[str, float] | float) – Operating expense, itemized or as a single amount. Default 0.
exposure (float, optional) – Exposure units (member months, policy months, earned exposures, …) for per-exposure figures. Required only when a
*_per_exposureproperty is accessed.premium_label (str) – Which revenue component is the gross premium, used when a denominator is
"premium". Default"premium".loss_ratio_denominator (str) –
"total_revenue"or"premium". Defaults follow the common exhibit convention: loss and gain ratios over total revenue, expense ratio over gross premium.expense_ratio_denominator (str) –
"total_revenue"or"premium". Defaults follow the common exhibit convention: loss and gain ratios over total revenue, expense ratio over gross premium.gain_denominator (str) –
"total_revenue"or"premium". Defaults follow the common exhibit convention: loss and gain ratios over total revenue, expense ratio over gross premium.
Examples
>>> uw = UnderwritingSummary( ... revenue={"premium": 1_200_000.0, "refund": -4_000.0}, ... losses={"claims": 1_090_000.0}, ... expenses=110_000.0, ... exposure=3_000.0, ... ) >>> round(uw.gross_margin, 0) 106000.0 >>> round(uw.gain, 0) -4000.0
- property combined_ratio: float¶
Loss ratio plus expense ratio, each on its own denominator.
- property expense_ratio: float¶
Operating expense over the
expense_ratio_denominator.
- classmethod from_per_exposure(*, revenue_per_exposure: Mapping[str, float], loss_per_exposure: Mapping[str, float], expense_per_exposure: Mapping[str, float] | float = 0.0, exposure: float, **kwargs: Any) UnderwritingSummary[source]¶
Build a summary from per-exposure components and total exposure.
Forecast exhibits are usually stated per exposure unit (PMPM in a health shop, per policy month in life); this converts each component to amounts by
exposureso totals, per-exposure figures, and ratios all come from one set of inputs.
- property gain: float¶
gross margin less operating expense.
- Type:
Tier two
- property gain_ratio: float¶
Gain / (loss) over the
gain_denominator.
- property gross_margin: float¶
total revenue less loss expense (operating expense excluded).
- Type:
Tier one
- property gross_margin_ratio: float¶
Gross margin over the
loss_ratio_denominator(its complement).
- property loss_ratio: float¶
Loss expense over the
loss_ratio_denominator.
- reconciliation() float[source]¶
gain_ratio - (1 - combined_ratio): the mixed-denominator gap.Zero when every denominator is the same series; otherwise the size of the drift introduced by quoting the loss, expense, and gain ratios over different bases. Useful as an exhibit footnote or a data-quality check.
- statement(*, profile: str | None = None, labels: Mapping[str, str] | None = None) Series[source]¶
Exhibit-shaped Series: components, subtotals, tiers, then ratios.
- to_frame(*, profile: str | None = None, labels: Mapping[str, str] | None = None) DataFrame[source]¶
One tidy row of every total and ratio (per-exposure when given).
profilerenames only the loss-ratio column to the domain’s ratio name ("health"->mlr,"life"->benefit_ratio);labelsrenames any output column. Calculations are unaffected.
- underwriting_summary(df: DataFrame, *, groupby: str | Iterable[str] | None = None, revenue_cols: str | Iterable[str], loss_cols: str | Iterable[str], expense_cols: str | Iterable[str], exposure_col: str | None = None, premium_col: str | None = None, loss_ratio_denominator: str = 'total_revenue', expense_ratio_denominator: str = 'premium', gain_denominator: str = 'total_revenue', profile: str | None = None, labels: dict[str, str] | None = None) DataFrame[source]¶
Grouped two-tier underwriting summary from a tidy table.
Component columns are summed first and every ratio is computed on the aggregated totals (ratio of sums, never an average of row-level ratios) – the same contract as
actuarialpy.summarize_experience().- Parameters:
df (pd.DataFrame) – One row per entity / period at whatever grain is being rolled up.
groupby (str | Iterable[str], optional) – Grouping columns; omit for a single all-rows summary.
revenue_cols (str | Iterable[str]) – Component columns for each tier. Revenue offsets (refunds) should be signed.
loss_cols (str | Iterable[str]) – Component columns for each tier. Revenue offsets (refunds) should be signed.
expense_cols (str | Iterable[str]) – Component columns for each tier. Revenue offsets (refunds) should be signed.
exposure_col (str, optional) – Exposure column; adds
{amount}_per_{exposure_col}output columns. Domain-style names (a health shop’s_pmpm) are applied vialabels, never inferred from the column name.premium_col (str, optional) – Gross premium column, required when any denominator is
"premium".loss_ratio_denominator (str) –
"total_revenue"or"premium"; see the module docstring for the convention discussion.expense_ratio_denominator (str) –
"total_revenue"or"premium"; see the module docstring for the convention discussion.gain_denominator (str) –
"total_revenue"or"premium"; see the module docstring for the convention discussion.profile (str, optional) – Renames only the loss-ratio column to the domain’s ratio name (
"health"->mlr,"life"->benefit_ratio).labels (dict, optional) – Explicit output column renames, applied after
profile.
- Returns:
Group keys, component sums,
total_revenue,total_loss,total_expense,gross_margin,gain, the three ratios plusgross_margin_ratioandgain_ratio, and per-exposure columns whenexposure_colis given.- Return type:
pd.DataFrame
- weighted_mean(values: Any, weights: Any, *, skipna: bool = False) float[source]¶
Weighted mean with validated, explicit weights.
- Parameters:
values (array-like) – Row-level rates or ratios to average.
weights (array-like) – Non-negative, finite weights, same length as
values, with a positive total.skipna (bool) – When True, pairs where the value is NaN are dropped before averaging. Default False: a NaN value propagates to the result, so missing data surfaces instead of silently shrinking the base.
- weighted_summary(df: DataFrame, *, value_cols: str | Iterable[str], weight_col: str, groupby: str | Iterable[str] | None = None, skipna: bool = False) DataFrame[source]¶
Grouped weighted means of one or more value columns.
Each value column
xproducesx_weighted= \(\sum wx / \sum w\) per group; the weight total is reported as{weight_col}_totalso the base of every average is visible.Typical use: premium-weighted rate actions by cohort, exposure-weighted persistency by segment.
- excess_over_threshold(df: DataFrame, loss_col: str, threshold: float, *, keep_cols: str | Iterable[str] | None = None, excess_col: str = 'excess') DataFrame[source]¶
Return losses strictly above
thresholdwith their excess amount.excess = loss - thresholdfor rows whereloss > threshold. This is the excess-over-threshold sample used to fit a tail (e.g. a generalized Pareto distribution inextremeloss) or a severity distribution inlossmodels; the threshold is the EVT exceedance threshold / pooling point.keep_colscarries identifier or covariate columns through.
- pool_losses(df: DataFrame, loss_col: str, pooling_point: float, *, pooled_col: str = 'pooled_loss', excess_col: str = 'excess_loss', copy: bool = True) DataFrame[source]¶
Split each loss into a pooled (capped) portion and an excess portion.
pooled = min(loss, pooling_point)is the retained amount used in the group’s experience;excess = max(loss - pooling_point, 0)is the portion pooled across the block. Summingpooled_colby group gives capped experience; summingexcess_colgives the pooled excess. The input is typically one row per claimant (e.g. the output ofsummarize_claimants).
- retained_cv(outcomes, retention, *, n_units=1)[source]¶
Coefficient of variation of the retained aggregate of
n_unitsiid units.Each unit’s outcome is retained (capped) at
retention–min(outcome, retention)– andn_unitssuch units are summed. For independent units this CV iscv(min(X, retention)) / sqrt(n_units), whereXis drawn from the per-unit outcome sampleoutcomes(array-like). Capping discards everything aboveretention, so only the body ofoutcomesmatters.- Parameters:
outcomes (array-like) – Per-unit outcome sample (e.g. one value per member-year, claim, or risk).
retention (float or array-like) – Cap applied to each unit. Scalar returns a float; an array returns the CV at each retention.
n_units (int, default 1) – Number of independent units in the aggregate.
- Returns:
Coefficient of variation of the retained aggregate.
- Return type:
float or numpy.ndarray
- retention_for_target_cv(outcomes, n_units, target_cv, *, bounds=None, n_grid=256)[source]¶
Retention at which the retained aggregate of
n_unitsunits hits a target CV.Inverts
retained_cv(). The single-unit retained CV increases with the retention, so this solvesretained_cv(outcomes, u, n_units=n_units) == target_cvfor the retentionuby interpolation over a grid spanningbounds(defaultmin..maxofoutcomes). Targets below or above the achievable range clamp to the lower or upper bound. Holdingtarget_cvfixed, a largern_unitsyields a higher retention (more independent units stabilize the aggregate, so less needs to be capped) – i.e. the basis for a size-graded retention rule.- Parameters:
outcomes (array-like) – Per-unit outcome sample.
n_units (int) – Number of independent units in the aggregate.
target_cv (float) – Desired coefficient of variation of the retained aggregate.
bounds (tuple(float, float), optional) –
(lo, hi)retention search bounds. Defaults to the min and max ofoutcomes.n_grid (int, default 256) – Number of grid points spanning
bounds.
- Returns:
The retention level, clamped to
bounds.- Return type:
float
- business_days_in_period(periods: Any, *, freq: str = 'M', holidays: Any = 'us_federal', weekmask: str = 'Mon Tue Wed Thu Fri') Series[source]¶
Count business days (weekdays minus holidays) in each distinct period.
periodsis any set of dates; they are mapped to their period (month or quarter) and de-duplicated.holidaysis"us_federal"(pandas’ built-in US federal calendar),None(weekdays only), or a list of holiday dates.weekmaskcontrols which weekdays count. Returns a Series indexed by period start timestamp.
- add_business_days(df: DataFrame, date_col: str, *, freq: str = 'M', out_col: str = 'business_days', holidays: Any = 'us_federal', weekmask: str = 'Mon Tue Wed Thu Fri', copy: bool = True) DataFrame[source]¶
Add a column with the number of business days in each row’s period.
Divide a paid-amount column by this to get an amount-per-business-day series that is comparable across short and long months.
- seasonality_factors(df: DataFrame, *, date_col: str, value_col: str, exposure_col: str | None = None, freq: str = 'M', method: str = 'ratio_to_moving_average', aggregate: str = 'mean', exclude: Iterable[int] | None = None, min_years: int = 2) Series[source]¶
Estimate seasonal factors – one multiplier per calendar period, mean 1.0.
The series is first aggregated to the period grain (summing
value_coland, if given,exposure_col). Withexposure_colthe factors are computed on the ratevalue / exposure(e.g. PMPM), which is the right basis for health seasonality; without it they are computed on the value directly.Methods:
"ratio_to_moving_average"(default): classical multiplicative decomposition. Each period is divided by a centered moving average (which removes trend and level), and the seasonal factor for a calendar period is the average of those ratios across years. Robust to trend and membership growth."period_share": each period expressed as a share of its own year’s average, then averaged by calendar period. Simpler, but assumes little within-year trend.
aggregateis"mean"or"median"(median is more robust to outlier months).excludedrops whole years from the estimate – e.g.exclude=[2020, 2021]to keep COVID-distorted years out of the factors. A warning is raised when fewer thanmin_yearsyears inform any period. Factors are normalized to average exactly 1.0.
- seasonality_factors_by(df: DataFrame, *, groupby: str | list[str], date_col: str, value_col: str, exposure_col: str | None = None, freq: str = 'M', method: str = 'ratio_to_moving_average', aggregate: str = 'mean', exclude: Iterable[int] | None = None, min_years: int = 2, season_name: str = 'season', warn: bool = True) DataFrame[source]¶
Seasonal factors per segment as a tidy table.
Fits
seasonality_factors()within each segment ofgroupbyand stacks the results into one row per(segment, season)– columns are the grouping column(s),season_name, andseasonal_factor– the shapedeseasonalize()andapply_seasonality()consume viaby=. Seasons absent from a segment’s history are omitted for that segment (they surface asNaNon join). Setwarn=Falseto silence the thin-historyInsufficientDataWarningper segment.
- deseasonalize(df: DataFrame, factors: Series | DataFrame, *, date_col: str, value_col: str, freq: str = 'M', by: str | list[str] | None = None, factor_col: str = 'seasonal_factor', season_name: str = 'season', out_col: str | None = None, copy: bool = True) DataFrame[source]¶
Divide
value_colby each row’s seasonal factor, removing the pattern.factorsis either a flat Series indexed by season (one pattern for the frame) or a tidy per-segment DataFrame – grouping column(s), a season column (season_name) and a factor column (factor_col), the shapeseasonality_factors_by()returns – joined onbyplus season. The grouped join is by value (index irrelevant), the factor table must be unique onby + [season], and a row whose(group, season)is absent yieldsNaN.
- apply_seasonality(df: DataFrame, factors: Series | DataFrame, *, date_col: str, value_col: str, freq: str = 'M', by: str | list[str] | None = None, factor_col: str = 'seasonal_factor', season_name: str = 'season', out_col: str | None = None, copy: bool = True) DataFrame[source]¶
Multiply
value_colby each row’s seasonal factor, adding the pattern back.factorsmay be flat (Series indexed by season) or a tidy per-segment table joined onbyplus season; seedeseasonalize()for the grouped-table contract.
- rate_from_force(delta: float) float[source]¶
Effective rate from the force of interest: \(i = e^\delta - 1\).
- nominal_interest(i: float, m: int) float[source]¶
Nominal interest convertible
mtimes: \(i^{(m)} = m[(1+i)^{1/m}-1]\).
- nominal_discount(i: float, m: int) float[source]¶
Nominal discount convertible
mtimes: \(d^{(m)} = m[1-v^{1/m}]\).
- rate_from_nominal_interest(nominal: float, m: int) float[source]¶
Effective rate from a nominal interest rate: \((1+i^{(m)}/m)^m - 1\).
- rate_from_nominal_discount(nominal: float, m: int) float[source]¶
Effective rate from a nominal discount rate: \((1-d^{(m)}/m)^{-m} - 1\).
- present_value(amount: float, i: float, t: float) float[source]¶
Present value of a single
amountdue intyears.
- future_value(amount: float, i: float, t: float) float[source]¶
Accumulated value of a single
amountaftertyears.
- annuity_immediate(i: float, n: int) float[source]¶
Present value of an annuity-immediate \(a_{\overline{n}|}=(1-v^n)/i\).
- annuity_due(i: float, n: int) float[source]¶
Present value of an annuity-due \(\ddot a_{\overline{n}|}=(1-v^n)/d\).
- accumulated_immediate(i: float, n: int) float[source]¶
Accumulated value of an annuity-immediate \(s_{\overline{n}|}\).
- accumulated_due(i: float, n: int) float[source]¶
Accumulated value of an annuity-due \(\ddot s_{\overline{n}|}\).
- deferred_annuity_immediate(i: float, n: int, defer: int) float[source]¶
Present value of an
n-year annuity-immediate deferreddeferyears.
- annuity_continuous(i: float, n: int) float[source]¶
Present value of a continuous annuity \(\bar a_{\overline{n}|}=(1-v^n)/\delta\).
- annuity_immediate_mthly(i: float, n: int, m: int) float[source]¶
Present value of an
m-thly annuity-immediate \(a^{(m)}_{\overline{n}|}\).
- increasing_annuity_immediate(i: float, n: int) float[source]¶
Present value of an increasing annuity \((Ia)_{\overline{n}|}\).
Payments of 1, 2, …, n at times 1, …, n.
- decreasing_annuity_immediate(i: float, n: int) float[source]¶
Present value of a decreasing annuity \((Da)_{\overline{n}|}\).
Payments of n, n-1, …, 1 at times 1, …, n.
- geometric_annuity_immediate(i: float, n: int, growth: float) float[source]¶
Present value of a geometrically increasing annuity-immediate.
Payments \(1, (1+g), (1+g)^2, \ldots\) at times \(1, \ldots, n\):
\[\frac{1 - \left(\frac{1+g}{1+i}\right)^n}{i - g}, \qquad i \neq g.\]
- net_present_value(rate: float, cashflows: Sequence[float], times: Sequence[float] | None = None) float[source]¶
Net present value of
cashflowsdiscounted atrate.If
timesis omitted the cash flows are assumed to occur at times0, 1, 2, ....
- internal_rate_of_return(cashflows: Sequence[float], times: Sequence[float] | None = None, *, low: float = -0.9999, high: float = 1e6, tol: float = 1e-10) float[source]¶
Internal rate of return: the
ratesolvingnet_present_value == 0.Uses a bracketed bisection over
(low, high), which is robust for the usual single-sign-change cash-flow streams. Raises if no sign change is found in the search range (e.g. all-positive or all-negative flows).
- level_payment(principal: float, i: float, n: int) float[source]¶
Level payment amortizing
principalovernperiods at ratei.\(P = L / a_{\overline{n}|}\).
- outstanding_balance(principal: float, i: float, n: int, t: int) float[source]¶
Prospective outstanding loan balance just after the
t-th payment.
- amortization_schedule(principal: float, i: float, n: int, payment: float | None = None) DataFrame[source]¶
Amortization schedule with the interest/principal split and balance.
Returns one row per period with columns
period,payment,interest,principal, andbalance.
- discount_factors(spot_rates: Sequence[float], times: Sequence[float]) ndarray[source]¶
Discount factors \((1+s_t)^{-t}\) from spot rates at
times.
- present_value_curve(cashflows: Sequence[float], spot_rates: Sequence[float], times: Sequence[float]) float[source]¶
Present value of
cashflowsdiscounted on a spot-rate curve.
- year_fraction(start: object, end: object, convention: str = 'actual/365') float[source]¶
Year fraction between two dates under a day-count convention.
Supported conventions:
"actual/365","actual/360","30/360"(US/NASD), and"actual/actual"(ISDA).
- age(date_of_birth, as_of, basis: str = 'exact') float[source]¶
Age of a life at a date on a given basis.
- Parameters:
date_of_birth (date-like) – Date of birth and the valuation date.
as_of (date-like) – Date of birth and the valuation date.
basis ({"exact", "last", "nearest"}) –
"exact"returns the fractional age;"last"is age last birthday (completed years, ALB);"nearest"is age nearest birthday (ANB).
- Returns:
Fractional age for
"exact"; an integer age for"last"and"nearest".- Return type:
float or int
- exposure_years(entry, exit, study_start, study_end, *, convention: str = 'actual/365') float[source]¶
Exposure (in years) a record contributes within a study window.
The exposure is the overlap of
[entry, exit]with[study_start, study_end], measured under the given day-count convention. Returns 0 when the record and study window do not overlap.
- add_exposure_column(df: DataFrame, entry_col: str, exit_col: str, study_start, study_end, *, exposure_col: str = 'exposure_years', convention: str = 'actual/365', copy: bool = True) DataFrame[source]¶
Add an exposure-years column for each record over a study window.
Useful for building the denominator of an actual-to-expected study.