actuarialpy¶

Experience analysis on a single tidy table, plus the shared numerical core the rest of the ecosystem builds on. You build one DataFrame — claims/expense, revenue, exposure, by period — and Experience gives you views (by, rolling, trend, completion, seasonality, credibility, pooling) without re-pivoting. numpy and pandas only; no scipy.

Quickstart¶

import pandas as pd
import actuarialpy as ap

df = pd.DataFrame({
    "month": pd.period_range("2024-01", periods=6, freq="M").astype(str),
    "product": ["PPO"] * 6,
    "paid":    [120_000, 118_000, 125_000, 130_000, 128_000, 135_000],
    "premium": [150_000] * 6,
    "member_months": [1000, 1005, 1010, 1008, 1012, 1015],
})

exp = ap.Experience(df, expense="paid", revenue="premium",
                    exposure="member_months", date="month")

exp.by("product")                              # grouped view
exp.loss_ratio                                 # paid / premium

ap.per_exposure(df["paid"], df["member_months"])  # amount per exposure unit
ap.loss_ratio(df["paid"], df["premium"])       # as a free function

Credibility¶

Credibility primitives live here — ratingmodels and the other packages delegate to them rather than re-implementing. Limited-fluctuation (classical) credibility from exposure:

import actuarialpy as ap

z = ap.limited_fluctuation_z(exposure=96_000, full_credibility_standard=120_000)
# -> 0.894

Greatest-accuracy (Bühlmann–Straub) credibility across risk classes, fit straight from a tidy frame of group / value / weight:

import pandas as pd
import actuarialpy as ap

exp = pd.DataFrame({
    "product": ["PPO", "PPO", "HMO", "HMO"],
    "paid":    [125_000, 130_000, 88_000, 91_000],
    "member_months": [1010, 1008, 640, 655],
})

model = ap.BuhlmannStraub.from_frame(
    exp, group="product", value="paid", weight="member_months",
)

model.k                    # Bühlmann k = EPV / VHM
model.z(weight=1_000)      # credibility Z for 1,000 units of exposure

model.premium(risk_mean, weight) then blends a class’s own mean toward the overall mean at that credibility.

Financial mathematics¶

Time value of money on the same numpy/pandas footing — rate conversions, present/future values, annuities-certain, loan amortization, and day-count conventions:

import actuarialpy as ap

ap.present_value(1000, 0.05, 3)          # 863.84  — discount at 5% for 3 yrs
ap.future_value(1000, 0.05, 3)           # 1157.63
ap.annuity_immediate(0.05, 10)           # 7.7217  — PV of 1/yr, 10 yrs @ 5%
ap.annuity_due(0.05, 10)                 # 8.1078

# level-payment loan: 200k principal, 6% nominal, 30 years monthly
ap.level_payment(200_000, 0.06 / 12, 360)          # 1199.10 per month
ap.amortization_schedule(200_000, 0.06 / 12, 360)  # full schedule (DataFrame)

ap.year_fraction("2024-01-01", "2024-07-01", convention="30/360")  # 0.5

Exposure and age bases¶

Exact and rounded ages, and exposure-years over a study window — the inputs an actual-to-expected study needs:

import pandas as pd
import actuarialpy as ap

ap.age("1980-06-15", "2026-06-30")                   # 46.04  — exact
ap.age("1980-06-15", "2026-06-30", basis="last")     # 46     — age last birthday
ap.age("1980-06-15", "2026-06-30", basis="nearest")  # 46     — age nearest birthday

# fraction of a study window each life is exposed
cohort = pd.DataFrame({
    "entry": ["2024-03-01", "2024-01-15"],
    "term":  ["2025-09-01", "2025-12-31"],
})
ap.add_exposure_column(cohort, entry_col="entry", exit_col="term",
                       study_start="2024-01-01", study_end="2025-12-31")

Retention primitives¶

The pooling module includes two general retention-stability primitives:

retained_cv(outcomes, retention, n_units=1) — coefficient of variation of the retained aggregate of n_units i.i.d. units each capped at retention.
retention_for_target_cv(outcomes, n_units, target_cv, ...) — inverts it: the retention at which retained CV hits a target. The basis for a size-graded pooling schedule.

Underwriting margin and weighted rollups¶

The two-tier underwriting income statement — gross margin (revenue less loss expense, operating expense excluded) and gain/(loss) (gross margin less operating expense). The ratios mirror the loss_ratio / expense_ratio / combined_ratio trio in metrics, and denominators are explicit parameters because real exhibits mix them (loss ratio over net revenue beside an expense ratio over gross premium); reconciliation() reports the resulting gap in gain% = 1 − combined ratio. Domain naming is a view concern, never a calculation concern: the profile option renames the loss-ratio column the same way summarize_experience does ("health" → mlr, "life" → benefit_ratio), and labels renames anything else. The full convention is on the conventions page.

import actuarialpy as ap

uw = ap.UnderwritingSummary.from_per_exposure(
    revenue_per_exposure={"premium": 400.0, "refund": -1.4},
    loss_per_exposure={"claims": 340.0, "other_loss": 16.4},
    expense_per_exposure=37.4,
    exposure=300_000,
)
uw.loss_ratio, uw.expense_ratio, uw.combined_ratio   # explicit denominators
uw.gross_margin_per_exposure, uw.gain_per_exposure   # the two tiers
uw.to_frame(profile="health")                        # loss_ratio -> mlr, math unchanged

# grouped, from a tidy table: components summed first,
# every ratio a ratio of sums
ap.underwriting_summary(
    df, groupby="cohort",
    revenue_cols=["premium", "refund"], loss_cols=["claims"],
    expense_cols="expense", exposure_col="member_months",
    premium_col="premium",
)
# per-exposure outputs are the mechanical {name}_per_{exposure_col};
# domain names (a health shop's _pmpm) are opt-in via labels

Quantities that are already rates at the row level — rate actions, persistency — cannot be summed. weighted_mean and weighted_summary average them with a required, named weight and report the weight total beside every average:

ap.weighted_summary(book, value_cols="rate_action",
                    weight_col="premium", groupby="cohort")

API reference¶

ActuarialPy: tools for actuarial experience analysis.

Bases: object

Bind an experience dataset to its actuarial column roles.

Experience is the recommended entry point for repeated experience-analysis workflows. It stores common column roles once and delegates calculations to the package’s free functions. The object is immutable: methods return DataFrames or new Experience objects rather than changing stored data in place.

Bind count (a claim or service count) to unlock the frequency-severity views: frequency_severity() and decompose_trend() (frequency x severity, optionally x mix). fit_trend() regresses a developed trend on the bound history.

Grain matters. Experience aggregates by summing the bound columns, so it expects rows at the grain of the exposure unit – one row per member-month, with member_months = 1 (or the eligible fraction). If your data is long (one row per service line, so the same member-month repeats across several rows), summing the exposure column overcounts it, and every per-exposure figure – PMPM, frequency, the loss-ratio denominator – is wrong by the number of rows per member-month. Experience does not detect this: it has no member key, so it cannot tell a long frame from a wide one. For long or multi-table warehouse data, either aggregate to member-month grain first, or use bind(), which sources exposure from a correctly-grained table (e.g. eligibility) via Count and never sums a repeated column.

Summarize actual-versus-expected experience.

If actual is omitted, the object’s bound expense columns are used.

Return a new Experience with an expense column restated by a factor.

The general counterpart to complete() and deseasonalize(): joins a factor by the key on (a column already in the frame, optionally within by segments) and multiplies – or, with how="divide", divides – the selected column(s) in place under the same name, so every downstream view composes on the restated series. factors is a scalar (one factor for all rows), a Series indexed by on, or a tidy DataFrame keyed by by + on.

This is the spine of experience-period restatement – trend, benefit / area / demographic relativities, network discounts – where the methodology is supplied as the factors rather than encoded here. Chain freely (exp.complete(...).adjust(trend).adjust(area, on="region")); with audit_col the cumulative restatement multiplier is carried across the chain, one value per row, for a reviewable audit trail. An absent key surfaces as NaN unless default is given (default=1.0 to mean “no adjustment for this key”).

by(groupby: str | list[str] | None = None, **kwargs: Any) → DataFrame[source]¶: Summarize experience by optional grouping columns.

by_band(value_col: str, bands: Any, *, labels: Any = None, **kwargs: Any) → DataFrame[source]¶: Summarize experience by a size band on value_col (see summarize_by_band).

by_status(status_col: str, *, entity_col: str | None = None, **kwargs: Any) → DataFrame[source]¶: Summarize experience by a status column.

claimant_concentration(claimant_col: str, *, amount_cols: str | list[str] | None = None, groupby: str | list[str] | None = None, **kwargs: Any) → DataFrame[source]¶: Summarize how concentrated experience is among top claimants.

claimants(claimant_col: str, *, amount_cols: str | list[str] | None = None, groupby: str | list[str] | None = None, exposure_col: str | None = None, **kwargs: Any) → DataFrame[source]¶: Aggregate the experience to claimant/member/risk level.

cohort(*, entity_col: str, start_date_col: str, duration_months: int = 12, groupby: str | list[str] | None = None, date_col: str | None = None, **kwargs: Any) → DataFrame[source]¶: Summarize each entity’s first N months or cohort-duration window.

Return a new Experience with paid amounts developed to ultimate.

Grosses the expense (loss / claims) columns up to estimated ultimate in place under the same names – completed = paid / completion_factor – so downstream views (trend(), rolling(), by(), …) then run on the completed series. Each row’s development period is development_months(date, valuation_date) (the convention make_completion_triangle() uses), or an explicit development_col. The join is by value, so the frame’s index is irrelevant; rows past the triangle’s last development period are taken as fully complete, and only recent, immature months actually move.

factors may be a flat Series (one pattern, from completion_factors()) or a tidy per-segment table from completion_factors_by(); with the latter, pass by naming the grouping column(s) to join on group plus development period. Only the numerator is developed – exposure is left untouched. This applies to the latest-diagonal shape (one row per incurred month, claims paid-to-date as of valuation_date); a frame already on an ultimate basis must not be completed again.

component_summary(component_cols: str | list[str], *, groupby: str | list[str] | None = None, exposure_col: str | None = None, **kwargs: Any) → DataFrame[source]¶: Summarize component amounts, per-exposure values, and shares.

components(component_cols: str | list[str], *, exposure_col: str | None = None, groupby: str | list[str] | None = None, date_col: str | None = None, **kwargs: Any) → DataFrame[source]¶: Explain component drivers between two periods.

credibility_weighted(groupby: str | list[str], *, z: Any, metric: str = 'loss_ratio', complement: float | None = None, out_col: str | None = None, **kwargs: Any) → DataFrame[source]¶

Blend each group’s metric with a complement at credibility z.

Computes the grouped summary (by()), then blends metric toward complement using z (see actuarialpy.credibility_weighted_estimate()). z may be a scalar or values aligned to the grouped rows. When complement is omitted the book-level value of metric is used as the complement of credibility.

decompose_trend(*, count_col: str | None = None, loss_col: str | None = None, exposure_col: str | None = None, mix_by: str | Iterable[str] | None = None, groupby: str | list[str] | None = None, period_col: str | None = None, prior_period: Any = None, current_period: Any = None, date_col: str | None = None, prior_start: Any = None, prior_end: Any = None, current_start: Any = None, current_end: Any = None, prior_filter: Any = None, current_filter: Any = None) → DataFrame[source]¶

Decompose the per-exposure loss trend between two periods of the bound data.

Splits the bound frame into prior and current with the same comparison modes as trend() – period_col with prior_period / current_period, a date_col with prior/current ranges (the bound date is used when no date_col is passed), or explicit prior_filter / current_filter masks – then decomposes the change via decompose_per_exposure_trend(), using the bound count, expense (as the loss), and exposure roles. Pass mix_by to add the third LMDI mix term; groupby reports one decomposition per group.

Return a new Experience with the seasonal pattern divided out.

Each selected column is divided by its row’s seasonal factor (as produced by seasonality_factors()), in place under the same name, so every downstream view – trend(), rolling(), by(), and the rest – then operates on the deseasonalized series. By default the expense (loss / claims) columns are adjusted; pass columns to choose others. Only the numerator is touched: exposure is left alone, so a deseasonalized PMPM is simply deseasonalized claims over unchanged member months.

factors may be a flat Series (one pattern) or a tidy per-segment table from seasonality_factors_by(); with the latter, pass by naming the grouping column(s) to join on group plus season. Estimate factors on the broader pool, not on this object’s own (often thin) data. To put the pattern back, apply apply_seasonality() to .data.

duration(*, entity_col: str, start_date_col: str, max_duration_month: int | None = None, date_col: str | None = None, **kwargs: Any) → DataFrame[source]¶: Summarize experience by duration month since entity start.

filter(mask: Any | None = None, *, query: str | None = None, copy: bool = True) → Experience[source]¶

Return a new Experience object over a filtered dataset.

Use either a boolean mask or a pandas query string.

fit_trend(*, value_col: str | None = None, exposure_col: str | None = None, date_col: str | None = None, freq: str = 'M', min_periods: int = 3, confidence: float = 0.95) → TrendFit[source]¶

Fit an exponential trend to the bound experience by log-linear regression.

Defaults to the bound expense (claims) over the bound exposure – the PMPM trend – across the bound date; pass value_col / exposure_col to override, or leave the exposure unbound to trend the raw amount. Returns a TrendFit (see fit_trend()). Run on completed, deseasonalized history.

Per-group claim frequency, severity, and per-exposure loss (see frequency_severity_summary).

Uses the bound count, expense (as the loss), and exposure roles, so the columns are specified once on the object. The identity loss_per_exposure == frequency * severity holds for every row.

margin(groupby: str | list[str] | None = None, *, margin_col: str = 'margin', ratio_col: str = 'margin_ratio', per_exposure_col: str | None = None, **kwargs: Any) → DataFrame[source]¶

Underwriting margin (revenue net of expense) by optional grouping.

Aggregates the bound expense and revenue roles with by(), then adds the margin (total_revenue - total_expense), the margin ratio, and an optional per-exposure margin.

pool_claimants(claimant_col: str, pooling_point: float, *, amount_cols: str | list[str] | None = None, groupby: str | list[str] | None = None, amount_name: str = 'total_expense', **kwargs: Any) → DataFrame[source]¶

Aggregate to claimant level and split each claimant into pooled/excess.

Summarizes the experience to claimant grain (claimants()) and caps each claimant’s total at pooling_point (see actuarialpy.pool_losses()), returning pooled and excess columns for capped experience and the excess hand-off to tail modeling.

rolling(window: int = 12, *, groupby: str | list[str] | None = None, date_col: str | None = None, **kwargs: Any) → DataFrame[source]¶: Create a rolling-period experience summary.

top_claimants(claimant_col: str, *, amount_cols: str | list[str] | None = None, amount_col: str | None = None, groupby: str | list[str] | None = None, n: int = 25, **kwargs: Any) → DataFrame[source]¶: Return top claimants by amount.

trend(*, amount_col: str | None = None, exposure_col: str | None = None, groupby: str | list[str] | None = None, date_col: str | None = None, **kwargs: Any) → DataFrame[source]¶: Compare amount or per-exposure experience between two periods.

views(views: dict[str, str | Iterable[str] | None], **kwargs: Any) → dict[str, DataFrame][source]¶: Create several named grouped experience views.

with_roles(*, data: DataFrame | None = None, expense: str | list[str] | None = None, revenue: str | list[str] | None = None, exposure: str | list[str] | None = None, date: str | None = None, profile: str | None = None, count: str | None = None, copy: bool | None = None) → Experience[source]¶: Return a new Experience object with updated data or roles.

with_status(*, effective_col: str, as_of: Any, termination_col: str | None = None, first_year_months: int = 12, status_col: str = 'status', labels: dict[str, str] | None = None) → Experience[source]¶

Return a new Experience with a derived lifecycle status column.

Derives active / first-year / termed from effective and termination dates as of a reference date (see actuarialpy.derive_status()). Summarize the result with by_status().

actual_to_expected(actual: Any, expected: Any) → Any[source]¶: Calculate actual-to-expected: actual divided by expected.

Multiply or divide a column by a factor joined on a key.

The general factor-application primitive behind trend, benefit / area / demographic relativities, network discounts – any per-key multiplier. The factor for each row is taken from one of:

a scalar factors – one factor for every row (e.g. a single trend factor);
a Series indexed by on – one key column (e.g. an area factor by region);
a tidy DataFrame keyed by by + on with factor_col – per-segment factors (the shape the *_by estimators return).

and applied to value_col: how="multiply" gives value * factor (loads, trend), how="divide" gives value / factor (backing a factor out).

The join is by value (the frame’s index never participates); the factor table must be unique on its keys – a duplicate would fan out the data – which is enforced. An absent key gives default (NaN when default is None – a surfaced gap, never silently filled); pass default=1.0 when a key missing from the table should mean “no adjustment”. With audit_col, the cumulative net multiplier applied to value_col is accumulated there (factor for multiply, 1 / factor for divide), so a chain of adjustments leaves a per-row record of total restatement.

factor_lookup(df: DataFrame, factors: DataFrame, keys: str | Iterable[str], *, factor_col: str, default: float | None = None) → ndarray[source]¶

Join a factor onto df by value on one or more existing key columns.

The single factor-join primitive behind grouped completion, seasonality, and adjust(). factors is a tidy table containing keys and factor_col; each row of df is matched on its keys values. The factor table must be unique on keys – a duplicate would fan rows out on the join – so this raises otherwise. Returns a float array aligned to df’s row order (the frame’s own index never participates). An absent key gives default (NaN when default is None – a surfaced gap, never silently filled).

combined_ratio(losses: Any, expenses: Any, revenue: Any) → Any[source]¶: Calculate combined ratio: (losses + expenses) divided by revenue.

expense_ratio(expenses: Any, revenue: Any) → Any[source]¶: Calculate an expense ratio: expenses divided by revenue.

frequency(claim_count: Any, exposure: Any) → Any[source]¶: Calculate claim frequency: claim count divided by exposure.

indicated_change(required: Any, current: Any) → Any[source]¶: Indicated change from current to required amount.

loss_ratio(losses_or_expenses: Any, revenue: Any) → Any[source]¶: Calculate a loss ratio: losses or expenses divided by revenue.

per_exposure(amount: Any, exposure: Any) → Any[source]¶: Calculate amount per exposure unit.

permissible_loss_ratio(expense_ratio: Any, profit_provision: Any = 0.0) → Any[source]¶

Permissible (target / break-even) loss ratio.

PLR = 1 - expense_ratio - profit_provision where both loadings are expressed as a fraction of premium. Also called the zero-margin or target loss ratio: the loss ratio at which premium exactly covers losses, expenses, and the profit/contingency provision. Works element-wise on scalars or Series. (Shops that load fixed expenses on a loss basis instead use (1 - V - Q) / (1 + G); this implements the premium-basis form.)

pure_premium(losses: Any, exposure: Any) → Any[source]¶: Calculate pure premium: losses divided by exposure.

ratio(numerator: Any, denominator: Any) → Any[source]¶: Calculate a generic ratio as numerator divided by denominator.

required_revenue(expense: Any, target_ratio: Any) → Any[source]¶: Revenue needed for an expense amount to hit a target ratio.

safe_divide(numerator: Any, denominator: Any, *, fill_value: float = np.nan) → Any[source]¶

Safely divide numerator by denominator.

The return type mirrors the input: scalars return scalars, array-likes return NumPy arrays, and pandas inputs return pandas objects with their index (and name) preserved – so results can be assigned straight back onto the source DataFrame. Zero denominators are returned as fill_value.

severity(losses: Any, claim_count: Any) → Any[source]¶: Calculate severity: losses divided by claim count.

class ChainLadder(age_to_age: Series, cdf: Series, completion_factors: Series, tail: float, method: str)[source]¶

Bases: object

Chain-ladder development pattern fitted from a cumulative triangle.

Fit with fit() from a cumulative development triangle (for example the output of make_completion_triangle() with cumulative=True):

age_to_age – link (age-to-age) factors, indexed by their starting development period.
cdf – cumulative development factor to ultimate by development period, including the tail.
completion_factors – 1 / cdf by development period: the proportion of ultimate emerged by each development period. These are divide-convention factors in (0, 1] (completed = paid / factor), so they line up with validate_completion_factors() and downstream completion.

Use project() to apply the pattern to a triangle and get per-origin ultimate and IBNR.

classmethod fit(triangle: DataFrame, *, method: str = 'volume', tail: float = 1.0) → ChainLadder[source]¶

Estimate the development pattern from a cumulative triangle.

method is "volume" (volume-weighted age-to-age factors, the default) or "simple" (straight average of individual link ratios). tail (>= 1) extends development beyond the latest observed development period.

project(triangle: DataFrame) → DataFrame[source]¶

Project ultimate and IBNR per origin by applying the fitted pattern.

For each origin, takes its latest observed cumulative amount and multiplies by the cumulative development factor at that development period. Returns one row per origin with the latest development period, latest cumulative, development factor applied, ultimate, and IBNR (ultimate minus latest).

exception InsufficientDataWarning[source]¶

Bases: UserWarning

Emitted when a segment has too little data to fit and is skipped or aggregated.

Filter it with the standard warnings machinery, e.g. warnings.filterwarnings("ignore", category=InsufficientDataWarning).

chain_ladder_by(df: DataFrame, *, groupby: str | list[str], origin_col: str, valuation_col: str, amount_col: str, cumulative: bool = True, method: str = 'volume', tail: float = 1.0, on_insufficient: str = 'raise', warn: bool = True) → dict[Any, ChainLadder][source]¶

Fit a chain-ladder development pattern per segment of df.

Groups df by groupby, builds a development triangle for each segment (see make_completion_triangle()), and fits a ChainLadder to each. Returns {segment_key: ChainLadder} – the key is a scalar for a single grouping column, or a tuple for several.

Segments too small to fit (fewer than two origins or development periods, a zero cumulative, and so on) are handled by on_insufficient:

"raise" (default): raise a ValueError naming the failing segment.
"skip": omit those segments from the result.
"aggregate": use the pooled pattern fit on the whole frame for them.

When on_insufficient is "skip" or "aggregate" and warn is true, an InsufficientDataWarning naming the affected segments is emitted; warn=False suppresses it (the standard warnings filters also apply). To ignore thin segments entirely, use on_insufficient="skip", warn=False.

completion_factors(triangle: DataFrame, *, method: str = 'volume', tail: float = 1.0) → Series[source]¶

Completion factors by development period, via chain-ladder.

Convenience wrapper around ChainLadder: returns the proportion of ultimate emerged by each development period (1 / cdf) estimated from a cumulative triangle. Divide-convention factors in (0, 1] (completed = paid / factor). See ChainLadder for the full pattern and per-origin ultimate/IBNR.

completion_factors_by(df: DataFrame, *, groupby: str | list[str], origin_col: str, valuation_col: str, amount_col: str, cumulative: bool = True, method: str = 'volume', tail: float = 1.0, on_insufficient: str = 'raise', warn: bool = True, development_name: str = 'development_month') → DataFrame[source]¶

Completion factors per segment as a tidy table.

Convenience over chain_ladder_by(): one row per (segment, development period) with the completion factor, ready to review, pivot, or join. Columns are the grouping column(s), development_name, and completion_factor. on_insufficient and warn behave as in chain_ladder_by().

apply_completion(df: DataFrame, factors: Series | DataFrame, *, value_col: str, date_col: str | None = None, valuation_date: Any = None, development_col: str | None = None, by: str | list[str] | None = None, factor_col: str = 'completion_factor', development_name: str = 'development_month', out_col: str | None = None, copy: bool = True) → DataFrame[source]¶

Develop a paid amount to estimated ultimate with completion factors.

For each row the development period is taken from development_col if supplied, otherwise computed as development_months(df[date_col], valuation_date) – the convention make_completion_triangle() uses, so factors from completion_factors() or completion_factors_by() join by construction. The completed amount is paid / factor (the divide convention, factors in (0, 1]).

factors may be either of:

a flat Series indexed by development period (one pattern for the whole frame), or
a tidy DataFrame of per-segment factors – grouping column(s), a development-period column (development_name) and a factor column (factor_col), the shape completion_factors_by() returns – joined on by plus development period. The table must be unique on by + [development] (a duplicate would fan out the data); this is checked.

The join is by value, never index alignment, so the frame’s own index is irrelevant. A row past its (group’s) largest development period is taken as fully complete (factor 1.0); a development period inside the fitted range but absent stays NaN – a surfaced gap; a row whose group is absent from the factor table stays NaN; a negative development period (incurred after valuation_date) raises. Supply either development_col, or both date_col and valuation_date.

develop_ultimate(df: DataFrame, factors: Series | DataFrame, *, method: str = 'bornhuetter_ferguson', value_col: str, date_col: str | None = None, valuation_date: Any = None, development_col: str | None = None, apriori_col: str | None = None, exposure_col: str | None = None, by: str | list[str] | None = None, factor_col: str = 'completion_factor', development_name: str = 'development_month', out_col: str | None = None, copy: bool = True) → DataFrame[source]¶

Develop a paid amount to estimated ultimate by a chosen reserving method.

All methods share one input – the proportion emerged at each row’s development period, joined exactly as apply_completion() does (flat Series or per-segment table, beyond-the-triangle rows fully emerged). They differ only in how they combine that with the paid-to-date and an a priori expectation:

"chain_ladder" – paid / emerged. Ignores the a priori; equivalent to apply_completion(). Volatile for immature periods (a thin latest diagonal drives the whole tail).
"bornhuetter_ferguson" – paid + apriori * (1 - emerged). Takes the unemerged portion from the a priori rather than from the data, so it is stable for green periods. Requires apriori_col (an expected ultimate per row – an input, e.g. a plan, budget, or manual times exposure).
"benktander" – one Bornhuetter-Ferguson iteration using the BF ultimate as the a priori: paid + bf * (1 - emerged). A credibility blend sitting between BF and chain ladder (weight emerged on chain ladder). Requires apriori_col.
"cape_cod" – Bornhuetter-Ferguson with the a priori derived from the data: a single expected loss ratio per segment, sum(paid) / sum(exposure * emerged), times each row’s exposure. Requires exposure_col (an on-level premium / exposure per row). The loss ratio is mechanical; the exposure base is an input.

The library applies a method; it does not pick the a priori or the exposure base. Supply either development_col or both date_col and valuation_date; pass by with a per-segment factor table (and Cape Cod then derives one loss ratio per segment). Returns df with an out_col (default f"{value_col}_ultimate").

ibnr(completed, paid)[source]¶

IBNR as completed minus paid (the completed/paid identity).

Works element-wise on scalars or Series. completed and paid must be on the same basis; the result is the amount bridging paid-to-date to ultimate.

lag_months(incurred_date, valuation_date)¶

Whole months of development between incurred (origin) and valuation.

Either argument may be a scalar, a Series, or array-like, in any combination (e.g. a column of incurred dates against a single valuation date). The result is a Series when either argument is a Series, otherwise a scalar.

development_months(incurred_date, valuation_date)[source]¶

Whole months of development between incurred (origin) and valuation.

make_completion_triangle(df: DataFrame, *, origin_col: str, valuation_col: str, amount_col: str, cumulative: bool = True, index_name: str = 'origin_period', development_name: str = 'development_month') → DataFrame[source]¶

Build a development (completion) triangle by origin period and development period.

Each cell aggregates amount_col for an origin month at a given valuation development period (whole months between origin and valuation, via development_months()). amount_col is treated as the incremental amount in each (origin, development period) cell; with cumulative=True – the default, and the usual basis for estimating development/completion factors – the cells are accumulated across development period. Set cumulative=False to return the incremental triangle, or if your input amounts are already cumulative-to-date snapshots.

This consumes a compact development aggregate (one row per origin x valuation, i.e. months x months); it does not require transaction/line-level data.

validate_completion_factors(factors: DataFrame, factor_col: str = 'completion_factor', *, method: str = 'divide') → None[source]¶

Validate completion-factor values for a selected convention.

divide factors (completed = paid / factor) should satisfy 0 < factor <= 1; multiply factors (completed = paid * factor) should satisfy factor >= 1. Useful as a sanity check on estimated factors before they are applied upstream.

status_summary(df: DataFrame, *, status_col: str, entity_col: str | None = None, expense_cols: str | Iterable[str], revenue_cols: str | Iterable[str], exposure_cols: str | Iterable[str] | None = None, profile: str | None = None) → DataFrame[source]¶: Summarize experience by status, optionally adding entity counts.

Summarize experience by grouping columns.

Amounts and exposures are aggregated first. Ratios and per-exposure metrics are calculated after aggregation, which avoids averaging row-level ratios.

By default the ratio column is named loss_ratio (general across lines of business); the health profile names it mlr and life benefit_ratio. profile only supplies light defaults and does not rename total expense or total revenue.

summarize_views(df: DataFrame, *, views: dict[str, str | Iterable[str] | None], expense_cols: str | Iterable[str], revenue_cols: str | Iterable[str], exposure_cols: str | Iterable[str] | None = None, ratio_col: str | None = None, ratio_name: str | None = None, total_expense_name: str = 'total_expense', total_revenue_name: str = 'total_revenue', profile: str | None = None) → dict[str, DataFrame][source]¶: Create multiple experience summary views from the same input data.

summarize_actual_vs_expected(df: DataFrame, *, groupby: str | Iterable[str] | None = None, actual_cols: str | Iterable[str], expected_cols: str | Iterable[str], exposure_cols: str | Iterable[str] | None = None, actual_name: str = 'actual', expected_name: str = 'expected', ae_name: str = 'actual_to_expected', variance_name: str = 'variance', variance_pct_name: str = 'variance_pct') → DataFrame[source]¶

Summarize actual-versus-expected results by optional grouping columns.

Actual and expected amounts are aggregated before ratios are calculated. This makes the function suitable for claim costs, benefits, expenses, revenue, or any other actual-versus-expected measure.

summarize_claimants(df: DataFrame, *, claimant_col: str, amount_cols: str | Iterable[str], groupby: str | Iterable[str] | None = None, exposure_col: str | None = None, amount_name: str = 'total_expense') → DataFrame[source]¶

Aggregate experience to claimant/member/risk level.

claimant_col can be a member ID, policy ID, claim group ID, or another entity identifier. The function is descriptive; it does not cap, pool, or otherwise adjust the underlying amounts.

top_claimants(df: DataFrame, *, claimant_col: str, amount_cols: str | Iterable[str] | None = None, amount_col: str | None = None, groupby: str | Iterable[str] | None = None, n: int = 25, amount_name: str = 'total_expense') → DataFrame[source]¶: Return the top claimants by amount, optionally within each group.

large_claimant_flags(df: DataFrame, *, amount_col: str = 'total_expense', thresholds: Sequence[float] = (50_000, 100_000, 250_000)) → DataFrame[source]¶: Add boolean flags for claimants above one or more amount thresholds.

claim_concentration(df: DataFrame, *, amount_col: str = 'total_expense', groupby: str | Iterable[str] | None = None, top_n: Sequence[int] = (10, 25), thresholds: Sequence[float] = (50_000, 100_000, 250_000)) → DataFrame[source]¶

Summarize how concentrated total amounts are among top claimants.

The input should generally be one row per claimant within the requested grouping level, such as the output of summarize_claimants.

cohort_summary(df: DataFrame, *, entity_col: str, date_col: str, start_date_col: str, duration_months: int = 12, groupby: str | Iterable[str] | None = None, expense_cols: str | Iterable[str], revenue_cols: str | Iterable[str], exposure_cols: str | Iterable[str] | None = None, profile: str | None = None) → DataFrame[source]¶

Summarize each entity’s first N months or cohort-duration window.

Each entity is clipped to its own first duration_months months of duration (month 1 is the entity’s start month), aligning entities by tenure rather than calendar time. The output also reports how much of that window is actually present, so partial (not-yet-mature) cohorts can be spotted and excluded:

months_observed: count of distinct duration months present (1..N).
last_month: latest experience month observed; with first_month this gives the available range.
complete: whether the full window is present, i.e. months_observed == duration_months.

For example, to keep only cohorts with a full first year:

cohorts = exp.cohort(entity_col="group", start_date_col="effective_date")
mature = cohorts[cohorts["complete"]]

cohort_summary_by_period(cohort_df: DataFrame, *, cohort_date_col: str = 'first_month', freq: str = 'Q', entity_col: str | None = None, expense_col: str = 'total_expense', revenue_col: str = 'total_revenue', exposure_cols: str | Iterable[str] | None = None) → DataFrame[source]¶: Roll entity-level cohort summaries into cohort month/quarter/year buckets.

frequency_severity_summary(df: DataFrame, *, count_col: str, loss_col: str, exposure_col: str, groupby: str | Iterable[str] | None = None) → DataFrame[source]¶

Per-group claim frequency, severity, and per-exposure loss.

Counts, losses, and exposure are aggregated first, then the rates are derived after aggregation (avoiding averaging row-level rates). The identity loss_per_exposure == frequency * severity holds for every row: frequency is claims per exposure unit, severity is loss per claim, and loss_per_exposure is loss per exposure unit (the pure premium).

decompose_per_exposure_trend(prior: DataFrame, current: DataFrame, *, count_col: str, loss_col: str, exposure_col: str, on: str | Iterable[str] | None = None, mix_by: str | Iterable[str] | None = None) → DataFrame[source]¶

Decompose the per-exposure loss change from prior to current.

With mix_by omitted this is the two-way split: both frames are summarized with frequency_severity_summary() (optionally by the on keys), aligned, and the change reported two exact ways:

Multiplicative trend: loss_per_exposure_trend == frequency_trend * severity_trend, where frequency_trend and severity_trend are the period-over-period ratios of frequency and severity.
Additive dollars: loss_per_exposure_change == frequency_effect + severity_effect via a symmetric (midpoint) split, so the contributions sum exactly to the per-exposure change.

Pass mix_by (a column or list of columns) to add a third mix component. The per-exposure loss is then decomposed into frequency, severity, and the effect of the exposure composition shifting across the mix_by cells. Frequency and severity are measured within each cell (free of composition), and mix captures the aggregate movement that comes purely from the cell weights changing – the piece the two-way otherwise misattributes to frequency and severity. The split uses the LMDI (logarithmic mean Divisia index) convention, which is order-free and reconciles exactly: loss_per_exposure_trend == frequency_trend * severity_trend * mix_trend and loss_per_exposure_change == frequency_effect + severity_effect + mix_effect.

A list of columns in mix_by defines the cells as their cross – one blended mix term, not a per-column attribution; to attribute mix to each dimension separately, run the decomposition once per dimension. on and mix_by are orthogonal: on groups the output rows, mix_by defines the mix cells within each group. Every cell must have positive count, loss, and exposure in both periods.

duration_summary(df: DataFrame, *, entity_col: str, date_col: str, start_date_col: str, expense_cols: str | Iterable[str], revenue_cols: str | Iterable[str], exposure_cols: str | Iterable[str] | None = None, max_duration_month: int | None = None) → DataFrame[source]¶: Summarize experience by duration month since entity start.

rolling_summary(df: DataFrame, *, date_col: str, window: int = 12, groupby: str | Iterable[str] | None = None, expense_cols: str | Iterable[str], revenue_cols: str | Iterable[str], exposure_cols: str | Iterable[str] | None = None, min_periods: int | None = None, drop_incomplete: bool = True, ratio_col: str = 'loss_ratio') → DataFrame[source]¶

Calculate rolling sums and ratios by period and optional grouping.

The output includes period_start and period_end. By default only complete rolling windows are returned; for a 12-month window, the first output row appears after 12 months of data are available.

annualized_trend(current: Any, prior: Any, months_between: float) → Any[source]¶: Annualize change between two values separated by a number of months.

midpoint_trend_factor(base_midpoint, projection_midpoint, annual_trend: Any) → Any[source]¶: Trend factor between base and projection midpoints.

period_change(current: Any, prior: Any) → Any[source]¶: Calculate period-over-period change: current / prior - 1.

project_forward(value: Any, annual_trend: Any, months: float) → Any[source]¶: Project a value forward using an annual trend rate.

fit_trend(df: DataFrame, *, value_col: str, date_col: str, exposure_col: str | None = None, freq: str = 'M', min_periods: int = 3, confidence: float = 0.95) → TrendFit[source]¶

Fit an exponential trend to a rate series by log-linear regression.

Aggregates df to the freq grain (summing value_col and, if given, exposure_col), forms the rate – value / exposure (e.g. PMPM) when exposure_col is supplied, otherwise value itself – and fits log(rate) = intercept + slope * t by ordinary least squares, with t in years from the first period. The fitted annual trend is exp(slope) - 1.

Unlike annualized_trend() (a two-point CAGR between a single current and prior value), this uses every period, so one noisy month does not swing the estimate, and it returns goodness of fit and a confidence interval – what a developed (rather than received) trend is judged on. It does not select the trend: the window, the rate basis (allowed vs paid), any benefit leveraging, and the blend with external trends remain judgment. Run it on completed, deseasonalized history (complete -> deseasonalize -> fit_trend) so runout and seasonality do not contaminate the slope; apply the result with trend_factor()/TrendFit.factor() or adjust().

Time is measured from actual period dates, so an occasional missing period is handled correctly. Requires at least min_periods distinct periods with strictly positive rates (non-positive values, which cannot be logged, raise). Returns a TrendFit.

class TrendFit(annual_trend: float, r_squared: float, std_error: float, ci_low: float, ci_high: float, confidence: float, n_periods: int, slope: float, intercept: float)[source]¶

Bases: object

Result of fit_trend(): an exponential trend fitted to a rate series.

annual_trend is the fitted multiplicative annual trend (exp(slope) - 1 on the log scale). r_squared is the goodness of fit, std_error the delta-method standard error of annual_trend, and (ci_low, ci_high) its confidence interval (asymmetric – the endpoints are transformed from the log-scale slope interval). slope and intercept describe the underlying log(value) = intercept + slope * t fit with t measured in years from the first period.

property ci: tuple[float, float]¶: The confidence interval as a (low, high) tuple.

factor(months: float) → float[source]¶: Trend factor over months at the fitted rate: (1 + annual_trend) ** (months / 12).

trend_factor(annual_trend: Any, months: float) → Any[source]¶: Convert an annual trend rate into a trend factor over a number of months.

trend_summary(df: DataFrame, *, period_col: str | None = None, prior_period=None, current_period=None, date_col: str | None = None, prior_start=None, prior_end=None, current_start=None, current_end=None, groupby=None, amount_col: str, exposure_col: str | None = None, prior_filter=None, current_filter=None, prior_label: str = 'prior', current_label: str = 'current') → DataFrame[source]¶

Summarize current vs prior trend by optional grouping.

Supported comparison modes: - period_col='year', prior_period=2025, current_period=2026 - date_col='incurred_date' with prior/current start and end dates - explicit boolean prior_filter and current_filter masks

component_driver_analysis(df: DataFrame, *, period_col: str | None = None, prior_period=None, current_period=None, date_col: str | None = None, prior_start=None, prior_end=None, current_start=None, current_end=None, prior_filter=None, current_filter=None, component_cols: str | Iterable[str], exposure_col: str | None = None, groupby: str | Iterable[str] | None = None) → DataFrame[source]¶

Explain component drivers of change between two periods.

The primary comparison is based on component totals, or component amount per exposure when exposure_col is supplied. The API matches trend_summary and supports period-column, date-range, or explicit-filter comparisons.

component_trend(*args, **kwargs) → DataFrame[source]¶

Alias for component_driver_analysis.

The preferred name is component_driver_analysis because the function explains drivers of total component change, not just component-specific trend.

summarize_components(df: DataFrame, *, groupby: str | Iterable[str] | None = None, component_cols: str | Iterable[str], exposure_col: str | None = None, total_col: str = 'total_expense', include_shares: bool = True) → DataFrame[source]¶: Summarize component/category amounts, per-exposure values, and shares.

class Buhlmann(overall_mean: float, epv: float, vhm: float, n_obs: int)[source]¶

Bases: object

Bühlmann credibility model.

This implementation assumes each risk has the same number of observations.

Parameters:

overall_mean (float) – Estimated collective mean.
epv (float) – Estimated expected process variance (EPV).
vhm (float) – Estimated variance of hypothetical means (VHM).
n_obs (int) – Number of observations per risk.

classmethod fit(data: Any) → Buhlmann[source]¶

Fit a Bühlmann credibility model from data.

Parameters:: data (array-like, shape (m, n)) – Observations for m risks, each with n observations.
Returns:: Fitted Bühlmann model.
Return type:: Buhlmann

Notes

Estimators used:

overall_mean = mean of all observations
EPV = average of within-risk sample variances
VHM = sample variance of risk means minus EPV / n, floored at 0

property k: float¶: K = EPV / VHM. Returns infinity when VHM = 0.

premium(risk_mean: Any) → Any[source]¶

Compute the Bühlmann credibility premium Z * risk_mean + (1 - Z) * overall_mean.

Parameters:: risk_mean (float or array-like) – Risk-specific sample mean(s).
Returns:: Credibility-weighted premium(s).
Return type:: float or numpy.ndarray

property z: float¶: Credibility factor Z = n / (n + K). Returns 0 when K is infinite.

class BuhlmannStraub(overall_mean: float, epv: float, vhm: float, weights: Any)[source]¶

Bases: object

Bühlmann-Straub credibility model.

This implementation allows different exposure weights by risk and period.

Parameters:

overall_mean (float) – Estimated collective mean.
epv (float) – Estimated expected process variance (EPV).
vhm (float) – Estimated variance of hypothetical means (VHM).
weights (array-like) – Total weight (exposure) for each risk.

classmethod fit(data: Any, weights: Any) → BuhlmannStraub[source]¶

Fit a Bühlmann-Straub model from observations and weights.

Accepts either a 2D array (equal period counts) or a sequence of 1D arrays per risk (unequal period counts). The estimators are the general unbiased forms

\[\hat s^2 = \frac{\sum_i\sum_j w_{ij}(X_{ij}-\bar X_i)^2}{\sum_i(n_i-1)}, \quad \hat a = \frac{\sum_i m_i(\bar X_i-\bar X)^2 - (r-1)\hat s^2} {m - \sum_i m_i^2/m},\]

with \(k=\hat s^2/\hat a\) and \(Z_i=m_i/(m_i+k)\); a negative \(\hat a\) is floored at 0. For equal period counts this reduces to the usual estimator; unlike a divide-by-mean-weight approximation it stays unbiased when risks have different period counts or exposures.

Parameters:

data (array-like) – Either shape (r, n) (equal periods) or a sequence of r 1D arrays X_i whose lengths may differ.
weights (array-like) – Exposure weights matching the shape/structure of data.

classmethod from_frame(df, *, group: str, value: str, weight: str, period: str | None = None) → BuhlmannStraub[source]¶

Fit from long-format data: one row per (risk, period).

Parameters:

df (pandas.DataFrame) – Long-format observations.
group (str) – Column names for the risk identifier, the per-unit observation (e.g. loss per member-month), and the exposure weight.
value (str) – Column names for the risk identifier, the per-unit observation (e.g. loss per member-month), and the exposure weight.
weight (str) – Column names for the risk identifier, the per-unit observation (e.g. loss per member-month), and the exposure weight.
period (str, optional) – Period column; used only to order observations within a risk. The number of observations per risk may differ.

Returns:

Fitted model with groups_ (risk labels), risk_means_, and weights (per-risk total exposure), all aligned to groups_.

Return type:

BuhlmannStraub

property k: float¶: K = EPV / VHM. Returns infinity when VHM = 0.

premium(risk_mean: Any, weight: Any) → Any[source]¶

Compute the Bühlmann-Straub premium Z_i * risk_mean_i + (1 - Z_i) * overall_mean.

Parameters:

risk_mean (float or array-like) – Risk-specific weighted mean(s).
weight (float or array-like) – Total exposure weight(s).

Returns:

Credibility-weighted premium(s).

Return type:

float or numpy.ndarray

z(weight: Any) → Any[source]¶

Credibility factor for a given total risk weight: Z_i = w_i / (w_i + K).

Parameters:: weight (float or array-like) – Total exposure weight(s).
Returns:: Credibility factor(s).
Return type:: float or numpy.ndarray

credibility_weighted_estimate(observed: Any, complement: Any, z: Any) → Any[source]¶

Blend an observed estimate with its complement at credibility z.

Returns z * observed + (1 - z) * complement. Scalar inputs return a native float; pandas.Series inputs return a Series with the index preserved; other array-like inputs return a numpy.ndarray. This is the atomic credibility operation; the z may come from a model below, a filed credibility formula, or any other source.

limited_fluctuation_z(exposure: Any, full_credibility_standard: float) → Any[source]¶

Limited-fluctuation (classical) credibility factor – the square-root rule.

Returns Z = min(1, sqrt(exposure / full_credibility_standard)). exposure is the volume credibility is based on (claim counts, member months, life-years, …) and full_credibility_standard is the amount of that volume required for full (Z = 1) credibility – often a filed value. Scalars return a native float; pandas.Series inputs return a Series (index preserved); other array-likes return a numpy.ndarray, so credibility can be computed per group. Feed the result to credibility_weighted_estimate() to blend experience with its complement.

full_credibility_claims(*, confidence: float = 0.90, tolerance: float = 0.05, severity_cv: float | None = None) → float[source]¶

Classical full-credibility standard, in expected number of claims.

Returns the expected claim count for full credibility under the limited-fluctuation model: (z / k) ** 2 for claim frequency, where z is the standard-normal quantile for two-sided confidence and k is the tolerance. The classic 90% / 5% choice gives about 1082 claims. Supplying severity_cv (the coefficient of variation of individual claim severity) inflates it to (z / k) ** 2 * (1 + severity_cv ** 2) for aggregate losses rather than pure frequency.

Many shops use a filed standard instead; pass that straight to limited_fluctuation_z().

add_months_in_force(df: DataFrame, *, effective_col: str, period_start, period_end, termination_col: str | None = None, out_col: str = 'months_in_force', copy: bool = True) → DataFrame[source]¶

Add whole months of overlap between each entity’s in-force window and a period.

The in-force window is [effective, termination] (a missing termination means the period end). The result is clipped to [period_start, period_end] and floored at 0. Month counting is inclusive of both endpoint months, so a full coverage of an N-month period returns N.

add_tenure(df: DataFrame, effective_col: str, as_of, *, tenure_col: str = 'tenure_months', one_based: bool = False, copy: bool = True) → DataFrame[source]¶

Add tenure in whole months from each entity’s effective date to as_of.

as_of is a single reference date (e.g. the experience as-of date). With one_based=True an entity effective in the as-of month has tenure 1 rather than 0, matching “months of experience” conventions.

derive_status(df: DataFrame, *, effective_col: str, as_of, termination_col: str | None = None, first_year_months: int = 12, status_col: str = 'status', labels: dict[str, str] | None = None, copy: bool = True) → DataFrame[source]¶

Derive an active / first-year / termed status as of a reference date.

Classification (in precedence order):

termed: a termination date is present and on/before as_of.
first_year: not termed and tenure (as_of minus effective) is less than first_year_months. The window is a parameter because “first year” means the first 12 months in some shops and the first policy year in others.
active: in force beyond the first-year window.

labels optionally remaps the three canonical values, e.g. {"first_year": "First Year Account", "termed": "Term"}.

earned_exposure(df: DataFrame, exposure_col: str, *, effective_col: str, period_start, period_end, termination_col: str | None = None, period_months: int | None = None, out_col: str | None = None, copy: bool = True) → DataFrame[source]¶

Prorate a full-period exposure by the fraction of the period in force.

earned = exposure * months_in_force / period_months. Use this when each row carries a full-period exposure (e.g. annualized) that must be reduced for mid-period entry or termination. If your data is already monthly, filtering to in-force months with is_in_force() is usually simpler.

is_in_force(df: DataFrame, *, effective_col: str, period_start, period_end, termination_col: str | None = None) → Series[source]¶

Boolean Series: in force at any point during [period_start, period_end].

In force when effective on/before period_end and the entity had not terminated before period_start (a missing termination date means still in force).

assign_band(df: DataFrame, value_col: str, bands: Sequence[float], *, labels: Sequence[str] | None = None, band_col: str = 'band', right: bool = False, copy: bool = True) → DataFrame[source]¶

Assign each row to an ordered size band based on value_col.

bands are bin edges. For integer counts the natural form is left-closed (right=False), so bands=[0, 51, 76, 151, 251, 501, inf] yields [0, 51), [51, 76), …. A trailing float("inf") captures the open top band. The resulting column is an ordered categorical so downstream group-bys keep band order.

summarize_by_band(df: DataFrame, value_col: str, bands: Sequence[float], *, labels: Sequence[str] | None = None, expense_cols: str | Iterable[str], revenue_cols: str | Iterable[str], exposure_cols: str | Iterable[str] | None = None, band_col: str = 'band', ratio_col: str | None = None, right: bool = False, profile: str | None = None) → DataFrame[source]¶

Assign size bands then summarize experience grouped by band.

Returns one row per band in band order (empty bands included), with the same aggregates, loss ratio, and per-exposure metrics as summarize_experience().

add_margin(df: DataFrame, *, premium_col: str, expense_cols: str | Iterable[str], out_col: str = 'margin', ratio_col: str | None = None, exposure_col: str | None = None, per_exposure_col: str | None = None, copy: bool = True) → DataFrame[source]¶

Add an underwriting-margin column (premium minus summed expense columns).

expense_cols is summed row-wise and may mix losses and loadings (e.g. medical/claims, retention, commission, allocated overhead). Optionally also add the margin ratio (ratio_col) and a per-exposure margin (per_exposure_col, requires exposure_col) such as margin PMPM.

margin(premium: Any, expenses: Any) → Any[source]¶

Margin = premium - expenses, element-wise.

expenses should already be the total of losses plus any loadings.

margin_ratio(margin_amount: Any, premium: Any) → Any[source]¶: Margin as a fraction of premium = margin / premium.

class UnderwritingSummary(revenue: Mapping[str, float], losses: Mapping[str, float], expenses: Mapping[str, float] | float = 0.0, exposure: float | None = None, premium_label: str = 'premium', loss_ratio_denominator: str = 'total_revenue', expense_ratio_denominator: str = 'premium', gain_denominator: str = 'total_revenue')[source]¶

Bases: object

Two-tier underwriting income statement for a single entity or period.

Parameters:

revenue (Mapping[str, float]) – Labeled revenue components (e.g. {"premium": ..., "refund": ...}). Offsets such as refunds should be signed (negative). The library never interprets the labels; it only sums them.
losses (Mapping[str, float]) – Labeled loss components – claim or benefit expense by whatever categories the caller uses.
expenses (Mapping[str, float] | float) – Operating expense, itemized or as a single amount. Default 0.
exposure (float, optional) – Exposure units (member months, policy months, earned exposures, …) for per-exposure figures. Required only when a *_per_exposure property is accessed.
premium_label (str) – Which revenue component is the gross premium, used when a denominator is "premium". Default "premium".
loss_ratio_denominator (str) – "total_revenue" or "premium". Defaults follow the common exhibit convention: loss and gain ratios over total revenue, expense ratio over gross premium.
expense_ratio_denominator (str) – "total_revenue" or "premium". Defaults follow the common exhibit convention: loss and gain ratios over total revenue, expense ratio over gross premium.
gain_denominator (str) – "total_revenue" or "premium". Defaults follow the common exhibit convention: loss and gain ratios over total revenue, expense ratio over gross premium.

Examples

>>> uw = UnderwritingSummary(
...     revenue={"premium": 1_200_000.0, "refund": -4_000.0},
...     losses={"claims": 1_090_000.0},
...     expenses=110_000.0,
...     exposure=3_000.0,
... )
>>> round(uw.gross_margin, 0)
106000.0
>>> round(uw.gain, 0)
-4000.0

property combined_ratio: float¶: Loss ratio plus expense ratio, each on its own denominator.

property expense_ratio: float¶: Operating expense over the expense_ratio_denominator.

classmethod from_per_exposure(*, revenue_per_exposure: Mapping[str, float], loss_per_exposure: Mapping[str, float], expense_per_exposure: Mapping[str, float] | float = 0.0, exposure: float, **kwargs: Any) → UnderwritingSummary[source]¶

Build a summary from per-exposure components and total exposure.

Forecast exhibits are usually stated per exposure unit (PMPM in a health shop, per policy month in life); this converts each component to amounts by exposure so totals, per-exposure figures, and ratios all come from one set of inputs.

property gain: float¶

gross margin less operating expense.

Type:: Tier two

property gain_ratio: float¶: Gain / (loss) over the gain_denominator.

property gross_margin: float¶

total revenue less loss expense (operating expense excluded).

Type:: Tier one

property gross_margin_ratio: float¶: Gross margin over the loss_ratio_denominator (its complement).

property loss_ratio: float¶: Loss expense over the loss_ratio_denominator.

reconciliation() → float[source]¶

gain_ratio - (1 - combined_ratio): the mixed-denominator gap.

Zero when every denominator is the same series; otherwise the size of the drift introduced by quoting the loss, expense, and gain ratios over different bases. Useful as an exhibit footnote or a data-quality check.

statement(*, profile: str | None = None, labels: Mapping[str, str] | None = None) → Series[source]¶: Exhibit-shaped Series: components, subtotals, tiers, then ratios.

to_frame(*, profile: str | None = None, labels: Mapping[str, str] | None = None) → DataFrame[source]¶

One tidy row of every total and ratio (per-exposure when given).

profile renames only the loss-ratio column to the domain’s ratio name ("health" -> mlr, "life" -> benefit_ratio); labels renames any output column. Calculations are unaffected.

underwriting_summary(df: DataFrame, *, groupby: str | Iterable[str] | None = None, revenue_cols: str | Iterable[str], loss_cols: str | Iterable[str], expense_cols: str | Iterable[str], exposure_col: str | None = None, premium_col: str | None = None, loss_ratio_denominator: str = 'total_revenue', expense_ratio_denominator: str = 'premium', gain_denominator: str = 'total_revenue', profile: str | None = None, labels: dict[str, str] | None = None) → DataFrame[source]¶

Grouped two-tier underwriting summary from a tidy table.

Component columns are summed first and every ratio is computed on the aggregated totals (ratio of sums, never an average of row-level ratios) – the same contract as actuarialpy.summarize_experience().

Parameters:

df (pd.DataFrame) – One row per entity / period at whatever grain is being rolled up.
groupby (str | Iterable[str], optional) – Grouping columns; omit for a single all-rows summary.
revenue_cols (str | Iterable[str]) – Component columns for each tier. Revenue offsets (refunds) should be signed.
loss_cols (str | Iterable[str]) – Component columns for each tier. Revenue offsets (refunds) should be signed.
expense_cols (str | Iterable[str]) – Component columns for each tier. Revenue offsets (refunds) should be signed.
exposure_col (str, optional) – Exposure column; adds {amount}_per_{exposure_col} output columns. Domain-style names (a health shop’s _pmpm) are applied via labels, never inferred from the column name.
premium_col (str, optional) – Gross premium column, required when any denominator is "premium".
loss_ratio_denominator (str) – "total_revenue" or "premium"; see the module docstring for the convention discussion.
expense_ratio_denominator (str) – "total_revenue" or "premium"; see the module docstring for the convention discussion.
gain_denominator (str) – "total_revenue" or "premium"; see the module docstring for the convention discussion.
profile (str, optional) – Renames only the loss-ratio column to the domain’s ratio name ("health" -> mlr, "life" -> benefit_ratio).
labels (dict, optional) – Explicit output column renames, applied after profile.

Returns:

Group keys, component sums, total_revenue, total_loss, total_expense, gross_margin, gain, the three ratios plus gross_margin_ratio and gain_ratio, and per-exposure columns when exposure_col is given.

Return type:

pd.DataFrame

weighted_mean(values: Any, weights: Any, *, skipna: bool = False) → float[source]¶

Weighted mean with validated, explicit weights.

Parameters:

values (array-like) – Row-level rates or ratios to average.
weights (array-like) – Non-negative, finite weights, same length as values, with a positive total.
skipna (bool) – When True, pairs where the value is NaN are dropped before averaging. Default False: a NaN value propagates to the result, so missing data surfaces instead of silently shrinking the base.

weighted_summary(df: DataFrame, *, value_cols: str | Iterable[str], weight_col: str, groupby: str | Iterable[str] | None = None, skipna: bool = False) → DataFrame[source]¶

Grouped weighted means of one or more value columns.

Each value column x produces x_weighted = \(\sum wx / \sum w\) per group; the weight total is reported as {weight_col}_total so the base of every average is visible.

Typical use: premium-weighted rate actions by cohort, exposure-weighted persistency by segment.

excess_over_threshold(df: DataFrame, loss_col: str, threshold: float, *, keep_cols: str | Iterable[str] | None = None, excess_col: str = 'excess') → DataFrame[source]¶

Return losses strictly above threshold with their excess amount.

excess = loss - threshold for rows where loss > threshold. This is the excess-over-threshold sample used to fit a tail (e.g. a generalized Pareto distribution in extremeloss) or a severity distribution in lossmodels; the threshold is the EVT exceedance threshold / pooling point. keep_cols carries identifier or covariate columns through.

pool_losses(df: DataFrame, loss_col: str, pooling_point: float, *, pooled_col: str = 'pooled_loss', excess_col: str = 'excess_loss', copy: bool = True) → DataFrame[source]¶

Split each loss into a pooled (capped) portion and an excess portion.

pooled = min(loss, pooling_point) is the retained amount used in the group’s experience; excess = max(loss - pooling_point, 0) is the portion pooled across the block. Summing pooled_col by group gives capped experience; summing excess_col gives the pooled excess. The input is typically one row per claimant (e.g. the output of summarize_claimants).

retained_cv(outcomes, retention, *, n_units=1)[source]¶

Coefficient of variation of the retained aggregate of n_units iid units.

Each unit’s outcome is retained (capped) at retention – min(outcome, retention) – and n_units such units are summed. For independent units this CV is cv(min(X, retention)) / sqrt(n_units), where X is drawn from the per-unit outcome sample outcomes (array-like). Capping discards everything above retention, so only the body of outcomes matters.

Parameters:

outcomes (array-like) – Per-unit outcome sample (e.g. one value per member-year, claim, or risk).
retention (float or array-like) – Cap applied to each unit. Scalar returns a float; an array returns the CV at each retention.
n_units (int, default 1) – Number of independent units in the aggregate.

Returns:

Coefficient of variation of the retained aggregate.

Return type:

float or numpy.ndarray

retention_for_target_cv(outcomes, n_units, target_cv, *, bounds=None, n_grid=256)[source]¶

Retention at which the retained aggregate of n_units units hits a target CV.

Inverts retained_cv(). The single-unit retained CV increases with the retention, so this solves retained_cv(outcomes, u, n_units=n_units) == target_cv for the retention u by interpolation over a grid spanning bounds (default min..max of outcomes). Targets below or above the achievable range clamp to the lower or upper bound. Holding target_cv fixed, a larger n_units yields a higher retention (more independent units stabilize the aggregate, so less needs to be capped) – i.e. the basis for a size-graded retention rule.

Parameters:

outcomes (array-like) – Per-unit outcome sample.
n_units (int) – Number of independent units in the aggregate.
target_cv (float) – Desired coefficient of variation of the retained aggregate.
bounds (tuple(float, float), optional) – (lo, hi) retention search bounds. Defaults to the min and max of outcomes.
n_grid (int, default 256) – Number of grid points spanning bounds.

Returns:

The retention level, clamped to bounds.

Return type:

float

business_days_in_period(periods: Any, *, freq: str = 'M', holidays: Any = 'us_federal', weekmask: str = 'Mon Tue Wed Thu Fri') → Series[source]¶

Count business days (weekdays minus holidays) in each distinct period.

periods is any set of dates; they are mapped to their period (month or quarter) and de-duplicated. holidays is "us_federal" (pandas’ built-in US federal calendar), None (weekdays only), or a list of holiday dates. weekmask controls which weekdays count. Returns a Series indexed by period start timestamp.

add_business_days(df: DataFrame, date_col: str, *, freq: str = 'M', out_col: str = 'business_days', holidays: Any = 'us_federal', weekmask: str = 'Mon Tue Wed Thu Fri', copy: bool = True) → DataFrame[source]¶

Add a column with the number of business days in each row’s period.

Divide a paid-amount column by this to get an amount-per-business-day series that is comparable across short and long months.

seasonality_factors(df: DataFrame, *, date_col: str, value_col: str, exposure_col: str | None = None, freq: str = 'M', method: str = 'ratio_to_moving_average', aggregate: str = 'mean', exclude: Iterable[int] | None = None, min_years: int = 2) → Series[source]¶

Estimate seasonal factors – one multiplier per calendar period, mean 1.0.

The series is first aggregated to the period grain (summing value_col and, if given, exposure_col). With exposure_col the factors are computed on the rate value / exposure (e.g. PMPM), which is the right basis for health seasonality; without it they are computed on the value directly.

Methods:

"ratio_to_moving_average" (default): classical multiplicative decomposition. Each period is divided by a centered moving average (which removes trend and level), and the seasonal factor for a calendar period is the average of those ratios across years. Robust to trend and membership growth.
"period_share": each period expressed as a share of its own year’s average, then averaged by calendar period. Simpler, but assumes little within-year trend.

aggregate is "mean" or "median" (median is more robust to outlier months). exclude drops whole years from the estimate – e.g. exclude=[2020, 2021] to keep COVID-distorted years out of the factors. A warning is raised when fewer than min_years years inform any period. Factors are normalized to average exactly 1.0.

seasonality_factors_by(df: DataFrame, *, groupby: str | list[str], date_col: str, value_col: str, exposure_col: str | None = None, freq: str = 'M', method: str = 'ratio_to_moving_average', aggregate: str = 'mean', exclude: Iterable[int] | None = None, min_years: int = 2, season_name: str = 'season', warn: bool = True) → DataFrame[source]¶

Seasonal factors per segment as a tidy table.

Fits seasonality_factors() within each segment of groupby and stacks the results into one row per (segment, season) – columns are the grouping column(s), season_name, and seasonal_factor – the shape deseasonalize() and apply_seasonality() consume via by=. Seasons absent from a segment’s history are omitted for that segment (they surface as NaN on join). Set warn=False to silence the thin-history InsufficientDataWarning per segment.

deseasonalize(df: DataFrame, factors: Series | DataFrame, *, date_col: str, value_col: str, freq: str = 'M', by: str | list[str] | None = None, factor_col: str = 'seasonal_factor', season_name: str = 'season', out_col: str | None = None, copy: bool = True) → DataFrame[source]¶

Divide value_col by each row’s seasonal factor, removing the pattern.

factors is either a flat Series indexed by season (one pattern for the frame) or a tidy per-segment DataFrame – grouping column(s), a season column (season_name) and a factor column (factor_col), the shape seasonality_factors_by() returns – joined on by plus season. The grouped join is by value (index irrelevant), the factor table must be unique on by + [season], and a row whose (group, season) is absent yields NaN.

apply_seasonality(df: DataFrame, factors: Series | DataFrame, *, date_col: str, value_col: str, freq: str = 'M', by: str | list[str] | None = None, factor_col: str = 'seasonal_factor', season_name: str = 'season', out_col: str | None = None, copy: bool = True) → DataFrame[source]¶

Multiply value_col by each row’s seasonal factor, adding the pattern back.

factors may be flat (Series indexed by season) or a tidy per-segment table joined on by plus season; see deseasonalize() for the grouped-table contract.

discount_factor(i: float, t: float = 1.0) → float[source]¶: Discount factor \(v^t = (1+i)^{-t}\).

accumulation_factor(i: float, t: float = 1.0) → float[source]¶: Accumulation factor \((1+i)^t\).

effective_discount(i: float) → float[source]¶: Effective rate of discount \(d = i/(1+i) = 1 - v\).

force_of_interest(i: float) → float[source]¶: Force of interest \(\delta = \ln(1+i)\).

rate_from_force(delta: float) → float[source]¶: Effective rate from the force of interest: \(i = e^\delta - 1\).

nominal_interest(i: float, m: int) → float[source]¶: Nominal interest convertible m times: \(i^{(m)} = m[(1+i)^{1/m}-1]\).

nominal_discount(i: float, m: int) → float[source]¶: Nominal discount convertible m times: \(d^{(m)} = m[1-v^{1/m}]\).

rate_from_nominal_interest(nominal: float, m: int) → float[source]¶: Effective rate from a nominal interest rate: \((1+i^{(m)}/m)^m - 1\).

rate_from_nominal_discount(nominal: float, m: int) → float[source]¶: Effective rate from a nominal discount rate: \((1-d^{(m)}/m)^{-m} - 1\).

present_value(amount: float, i: float, t: float) → float[source]¶: Present value of a single amount due in t years.

future_value(amount: float, i: float, t: float) → float[source]¶: Accumulated value of a single amount after t years.

annuity_immediate(i: float, n: int) → float[source]¶: Present value of an annuity-immediate \(a_{\overline{n}|}=(1-v^n)/i\).

annuity_due(i: float, n: int) → float[source]¶: Present value of an annuity-due \(\ddot a_{\overline{n}|}=(1-v^n)/d\).

accumulated_immediate(i: float, n: int) → float[source]¶: Accumulated value of an annuity-immediate \(s_{\overline{n}|}\).

accumulated_due(i: float, n: int) → float[source]¶: Accumulated value of an annuity-due \(\ddot s_{\overline{n}|}\).

perpetuity_immediate(i: float) → float[source]¶: Present value of a perpetuity-immediate \(1/i\).

perpetuity_due(i: float) → float[source]¶: Present value of a perpetuity-due \(1/d\).

deferred_annuity_immediate(i: float, n: int, defer: int) → float[source]¶: Present value of an n-year annuity-immediate deferred defer years.

annuity_continuous(i: float, n: int) → float[source]¶: Present value of a continuous annuity \(\bar a_{\overline{n}|}=(1-v^n)/\delta\).

annuity_immediate_mthly(i: float, n: int, m: int) → float[source]¶: Present value of an m-thly annuity-immediate \(a^{(m)}_{\overline{n}|}\).

increasing_annuity_immediate(i: float, n: int) → float[source]¶

Present value of an increasing annuity \((Ia)_{\overline{n}|}\).

Payments of 1, 2, …, n at times 1, …, n.

decreasing_annuity_immediate(i: float, n: int) → float[source]¶

Present value of a decreasing annuity \((Da)_{\overline{n}|}\).

Payments of n, n-1, …, 1 at times 1, …, n.

geometric_annuity_immediate(i: float, n: int, growth: float) → float[source]¶

Present value of a geometrically increasing annuity-immediate.

Payments \(1, (1+g), (1+g)^2, \ldots\) at times \(1, \ldots, n\):

\[\frac{1 - \left(\frac{1+g}{1+i}\right)^n}{i - g}, \qquad i \neq g.\]

net_present_value(rate: float, cashflows: Sequence[float], times: Sequence[float] | None = None) → float[source]¶

Net present value of cashflows discounted at rate.

If times is omitted the cash flows are assumed to occur at times 0, 1, 2, ....

internal_rate_of_return(cashflows: Sequence[float], times: Sequence[float] | None = None, *, low: float = -0.9999, high: float = 1e6, tol: float = 1e-10) → float[source]¶

Internal rate of return: the rate solving net_present_value == 0.

Uses a bracketed bisection over (low, high), which is robust for the usual single-sign-change cash-flow streams. Raises if no sign change is found in the search range (e.g. all-positive or all-negative flows).

level_payment(principal: float, i: float, n: int) → float[source]¶

Level payment amortizing principal over n periods at rate i.

\(P = L / a_{\overline{n}|}\).

outstanding_balance(principal: float, i: float, n: int, t: int) → float[source]¶: Prospective outstanding loan balance just after the t-th payment.

amortization_schedule(principal: float, i: float, n: int, payment: float | None = None) → DataFrame[source]¶

Amortization schedule with the interest/principal split and balance.

Returns one row per period with columns period, payment, interest, principal, and balance.

discount_factors(spot_rates: Sequence[float], times: Sequence[float]) → ndarray[source]¶: Discount factors \((1+s_t)^{-t}\) from spot rates at times.

present_value_curve(cashflows: Sequence[float], spot_rates: Sequence[float], times: Sequence[float]) → float[source]¶: Present value of cashflows discounted on a spot-rate curve.

year_fraction(start: object, end: object, convention: str = 'actual/365') → float[source]¶

Year fraction between two dates under a day-count convention.

Supported conventions: "actual/365", "actual/360", "30/360" (US/NASD), and "actual/actual" (ISDA).

age(date_of_birth, as_of, basis: str = 'exact') → float[source]¶

Age of a life at a date on a given basis.

Parameters:

date_of_birth (date-like) – Date of birth and the valuation date.
as_of (date-like) – Date of birth and the valuation date.
basis ({"exact", "last", "nearest"}) – "exact" returns the fractional age; "last" is age last birthday (completed years, ALB); "nearest" is age nearest birthday (ANB).

Returns:

Fractional age for "exact"; an integer age for "last" and "nearest".

Return type:

float or int

exposure_years(entry, exit, study_start, study_end, *, convention: str = 'actual/365') → float[source]¶

Exposure (in years) a record contributes within a study window.

The exposure is the overlap of [entry, exit] with [study_start, study_end], measured under the given day-count convention. Returns 0 when the record and study window do not overlap.

add_exposure_column(df: DataFrame, entry_col: str, exit_col: str, study_start, study_end, *, exposure_col: str = 'exposure_years', convention: str = 'actual/365', copy: bool = True) → DataFrame[source]¶

Add an exposure-years column for each record over a study window.

Useful for building the denominator of an actual-to-expected study.