API reference¶
Auto-generated from the in-source NumPy-style docstrings. For task-oriented
usage see the Quickstart and the runnable notebooks in
examples/.
Forest¶
CompetingRiskForest ¶
CompetingRiskForest(n_estimators: int = 100, max_depth: int = 15, min_samples_split: int = 6, min_samples_leaf: int = 3, max_features: str | int | float | None = 'sqrt', samptype: Literal['swor', 'swr'] = 'swor', sampsize: int | float | Callable[[int], int] | None = None, random_state: int | None = None, mode: str = 'default', n_bins: int = 256, time_grid: int = 200, n_jobs: int = -1, splitrule: str = 'logrankCR', cause: int = 1, cause_weights: ndarray | None = None, nsplit: int | None = None, split_ntime: int | None = DEFAULT_SPLIT_NTIME, rng_mode: str = 'numpy', equivalence: str | None = None, device: Literal['auto', 'cpu', 'cuda'] = 'auto')
Bases: BaseEstimator
Competing risks random forest.
Both modes store compact per-cause event / at-risk counts at leaves
and materialize CIF (Aalen-Johansen) / CHF (Nelson-Aalen) tables
lazily on first predict. mode="default" uses histogram split
search with uint8-binned features; mode="reference" uses
pure-NumPy exact splitting with raw (unbinned) thresholds.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_estimators
|
int
|
Number of trees. |
100
|
max_depth
|
int
|
Maximum tree depth. |
15
|
min_samples_split
|
int
|
Minimum samples required to attempt a split at an internal node. |
6
|
min_samples_leaf
|
int
|
Minimum samples in each child node after a split. |
3
|
max_features
|
('sqrt', 'log2', None)
|
Number of features considered at each split. "sqrt" and "log2"
are rounded up; a float is interpreted as a fraction of
|
"sqrt"
|
samptype
|
('swor', 'swr')
|
Per-tree sampling scheme. |
"swor"
|
sampsize
|
int, float in (0, 1], callable, or None
|
Per-tree sample size. Out-of-bag rows are those not drawn for a tree; OOB-dependent
methods (:meth: |
None
|
random_state
|
int or None
|
Seed for per-tree sampling and mtry draws. If None, results are nondeterministic. |
None
|
mode
|
('default', 'reference')
|
|
"default"
|
n_bins
|
int
|
Number of histogram bins per feature; ignored in reference mode. Must be in [2, 256]. |
256
|
time_grid
|
int
|
Max points on the shared event-time grid for compact leaf storage; ignored in reference mode. |
200
|
n_jobs
|
int or None
|
Number of threads for parallel tree building. Speedup applies only to Joblib dispatch has sub-millisecond overhead per tree. For
|
-1
|
splitrule
|
('logrankCR', 'logrank')
|
Split criterion. |
"logrankCR"
|
cause
|
int
|
1-based cause index for |
1
|
cause_weights
|
array-like of float or None
|
Per-cause weight vector of length |
None
|
nsplit
|
int or None
|
Number of random split-point draws per feature at each node.
|
None
|
split_ntime
|
int or None
|
Coarse time bins for split-search log-rank evaluation in
|
10
|
equivalence
|
(None, 'rfsrc')
|
Preset for cross-library predictive alignment. To achieve bit-identical trees vs rfSRC under a full-data fit, use these parameter mappings:: rfSRC's Known limitation: under resampling ( |
None
|
device
|
('auto', 'cpu', 'cuda')
|
Compute backend for the flat-tree path. In v0.1, |
"auto"
|
feature_importances_
property
¶
Cached result of the last compute_importance call.
Raises:
| Type | Description |
|---|---|
AttributeError
|
If |
predict_cif ¶
Predict cause-specific cumulative incidence (Aalen-Johansen), averaged across trees.
Returns:
| Name | Type | Description |
|---|---|---|
cif |
(ndarray, shape(n_samples, n_causes, n_times), float64)
|
Ensemble mean of per-tree leaf Aalen-Johansen CIFs. When |
predict_chf ¶
Predict cause-specific cumulative hazard (Nelson-Aalen), averaged across trees.
Returns:
| Name | Type | Description |
|---|---|---|
chf |
(ndarray, shape(n_samples, n_causes, n_times), float64)
|
Ensemble mean of per-tree leaf Nelson-Aalen CHFs. When |
predict_risk ¶
Per-subject risk scalar for cause-specific concordance scoring.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
(array - like, shape(n_samples, n_features))
|
|
required |
cause
|
int
|
Cause of interest (1..n_causes_). |
1
|
kind
|
('integrated_chf', 'cif_last')
|
Risk scalar derived from the per-subject CIF/CHF curve:
For Uno IPCW C-index neither scalar dominates. |
"integrated_chf"
|
Returns:
| Name | Type | Description |
|---|---|---|
risk |
(ndarray, shape(n_samples), float64)
|
|
predict_oob_risk ¶
Per-row OOB ensemble integrated-CHF risk on the training set.
For each training row, averages cause-specific integrated CHF over
only the trees where that row was out-of-bag. Mirrors rfSRC's
predict$predicted.oob[, cause] convention. Requires an OOB set,
i.e. samptype="swr" or samptype="swor" with sampsize < n.
Rows in-bag for every tree (probability ~0.37**n_estimators, i.e. vanishingly small for n_estimators >= 100) get a risk of 0.
oob_score ¶
OOB Harrell C-index on the training set for cause.
Computed against the cached training outcomes using the OOB
integrated-CHF risk from :meth:predict_oob_risk. Requires an OOB
set (samptype="swr" or "swor" with sampsize < n).
score ¶
Cause-specific Harrell C-index. kind forwards to predict_risk.
Accepts either the three-positional legacy form score(X, time, event)
or the sklearn-friendly score(X, y) where y is a structured
array with time and event fields (see :class:comprisk.Surv).
predict ¶
sklearn-style alias for predict_risk(X, cause=1).
Returned shape (n_samples,). The cause-1 default lets comprisk
slot into Pipeline / cross_val_predict without a wrapper;
for cause-k risk or for CIF / CHF curves, call
:meth:predict_risk / :meth:predict_cif / :meth:predict_chf
directly.
compute_importance ¶
compute_importance(X_eval=None, y_eval=None, *, causes: list[int] | None = None, n_repeats: int = 5, random_state: int | None = None, n_jobs: int | None = None)
Compute per-cause + composite permutation variable importance.
Two flavours, dispatched by whether an evaluation set is supplied:
- OOB Breiman (
X_eval=Noneandy_eval=None): scored on the cached training data using the Uno IPCW C-index over each tree's out-of-bag rows. Requires an OOB set (samptype="swr"or"swor"withsampsize < n). - Held-out: standard sklearn permutation importance with a per-cause Wolbers-C-index scorer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X_eval
|
(array - like, shape(n_samples, n_features))
|
Held-out features. If both |
None
|
y_eval
|
structured array with fields ``time`` and ``event``
|
Held-out survival outcomes. |
None
|
causes
|
list of int
|
Causes to score. Defaults to |
None
|
n_repeats
|
int
|
|
5
|
random_state
|
int
|
Seed for permutation draws (reproducibility). |
None
|
n_jobs
|
int
|
Override |
None
|
Returns:
| Type | Description |
|---|---|
pd.DataFrame with columns ``feature``, ``cause_{k}_vimp`` for each
|
|
fitted cause in numeric order, and ``composite_vimp``.
|
|
Raises:
| Type | Description |
|---|---|
ValueError
|
OOB mode requires an OOB set ( |
TypeError
|
Held-out mode requires |
Notes
VIMP scales as n_features * n_repeats * n_causes calls to
predict_risk in held-out mode (each call walks every tree in
Python). For wide cohorts the wall time can be material; downsample
X_eval or use OOB mode if cost is a concern.
minimal_depth ¶
Ishwaran-style minimal-depth variable selection.
A variable's minimal depth in a tree is the depth of the highest
split that uses it (root = depth 0). Variables never split on get
a sentinel depth of D_T where D_T is the tree's max depth
(Ishwaran et al. 2010, JASA, Eq. (2)). Smaller mean minimal depth
across the forest indicates a more important variable.
The selection threshold is E[Dv] computed once from the forest-averaged node-count vector l_bar_d and average tree depth D_bar, per Ishwaran et al. (2010, JASA, Theorem 1 + Section 3).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
threshold
|
'md'
|
Selection threshold rule. Only mean-minimal-depth |
"md"
|
aggregation
|
('forest', 'tree')
|
How the null threshold is aggregated across trees. |
"forest"
|
return_extra
|
bool
|
If True, additionally include |
False
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Sorted ascending by |
Raises:
| Type | Description |
|---|---|
NotFittedError
|
If the forest has not been fitted. |
ValueError
|
If |
Notes
The default aggregation="forest" follows Ishwaran et al. 2010 JASA
Section 3 (also 2011 SADM Definition 2): average the geometry across
trees, then compute E[md] once. The 2010 paper uses only this; the
tree-averaged variant is a later randomForestSRC software addition and
its default (conservative=FALSE). Pass aggregation="tree" to
switch. Numeric threshold values may differ from a default rfSRC run
even with equivalence='rfsrc'.
shap_values ¶
TreeSHAP values for cause-specific CIF.
Uses exact polynomial-time TreeSHAP (Lundberg et al. 2018), adapted
to competing-risk forests where each leaf value is a
(n_causes, n_times) CIF tensor.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
(array - like, shape(n_samples, n_features))
|
Samples to explain. |
required |
times
|
array-like of float or None
|
Time points at which to evaluate SHAP. If |
None
|
time_aggregate
|
(None, 'sum', 'trapezoid')
|
Collapse the time axis to a single scalar per cause before the
attribution, returning a "risk-score" SHAP whose values already
sum (over features) to the aggregated CIF. Because SHAP is linear
in the leaf value,
|
None
|
n_jobs
|
int
|
Threads for the parallel-over-samples TreeSHAP kernel ( |
-1
|
Returns:
| Name | Type | Description |
|---|---|---|
shap_values |
ndarray
|
Cause-specific CIF SHAP attributions. Shape
|
base_value |
ndarray
|
Expected (aggregated) CIF for the empty conditioning set
(training-distribution baseline), averaged across trees. Shape
Additivity holds point-wise: .. math:: |
Subdistribution-hazard regression¶
FineGrayRegression ¶
FineGrayRegression(*, cause: int = 1, cencode: int = 0, max_iter: int = 10, gtol: float = 1e-06, robust_se: bool = False)
Fine-Gray subdistribution-hazard regression for competing risks.
Fits the proportional subdistribution-hazards model of Fine & Gray
(1999) via Newton-Raphson on the IPCW-weighted Breslow partial
likelihood. Targets parity with R cmprsk::crr() defaults.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cause
|
int
|
Cause-of-interest event code (cmprsk's |
1
|
cencode
|
int
|
Censoring event code (cmprsk's |
0
|
max_iter
|
int
|
Maximum Newton iterations (cmprsk default). |
10
|
gtol
|
float
|
Convergence tolerance on |
1e-6
|
robust_se
|
bool
|
If |
False
|
Attributes:
| Name | Type | Description |
|---|---|---|
coef_ |
(ndarray, shape(n_features))
|
|
se_ |
(ndarray, shape(n_features))
|
|
var_ |
(ndarray, shape(n_features, n_features))
|
|
n_iter_ |
int
|
|
converged_ |
bool
|
|
log_likelihood_ |
float
|
Maximised partial log-likelihood at |
log_likelihood_null_ |
float
|
Partial log-likelihood at |
Examples:
>>> import numpy as np
>>> from comprisk import FineGrayRegression, Surv
>>> rng = np.random.default_rng(0)
>>> n = 200
>>> X = rng.normal(size=(n, 3))
>>> time = rng.exponential(1.0, size=n) + 0.1
>>> event = rng.choice([0, 1, 2], size=n, p=[0.3, 0.5, 0.2])
>>> y = Surv.from_arrays(event=event, time=time)
>>> fg = FineGrayRegression().fit(X, y)
>>> fg.coef_.shape
(3,)
fit ¶
fit(X, y=None, time=None, event=None, *, cengroup=None) -> FineGrayRegression
Fit the Fine-Gray model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
(array - like, shape(n_samples, n_features))
|
|
required |
y
|
structured array
|
|
None
|
time
|
(array - like, shape(n_samples))
|
Legacy three-argument form |
None
|
event
|
(array - like, shape(n_samples))
|
Legacy three-argument form |
None
|
cengroup
|
array-like of int, shape (n_samples,)
|
Censoring stratum for each subject ( |
None
|
Returns:
| Type | Description |
|---|---|
self
|
|
predict_cumulative_incidence ¶
Predicted cumulative incidence F(t | x).
Uses the cmprsk formula
F(t|x) = 1 - exp(-Λ̂_0(t) * exp(x' β)) where Λ̂_0 is the
cumulative baseline subdistribution hazard.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
(array - like, shape(n_samples, n_features))
|
|
required |
times
|
array - like
|
Times at which to evaluate |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
F |
(ndarray, shape(n_samples, n_times))
|
|
PenalizedFineGrayRegression ¶
PenalizedFineGrayRegression(*, penalty: str = 'lasso', l1_ratio: float = 1.0, gamma: float | None = None, n_lambda: int = 100, lambda_min_ratio: float = 0.001, lambdas=None, standardize: bool = True, cause: int = 1, cencode: int = 0, cv: int | None = None, cv_random_state: int | None = None, n_jobs: int | None = None, max_iter: int = 1000, tol: float = 0.0001)
Bases: BaseEstimator
Penalized Fine-Gray subdistribution-hazard regression.
Fits the proportional subdistribution-hazards model with a sparsity- or
shrinkage-inducing penalty by cyclic coordinate descent on the
IPCW-weighted partial likelihood, warm-started along a lambda path.
Mirrors the algorithm of Fu et al. (2017) / Kawaguchi et al. (2021).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
penalty
|
('lasso', 'ridge', 'elasticnet', 'mcp', 'scad')
|
Penalty family. |
"lasso"
|
l1_ratio
|
float
|
Elastic-net mixing |
1.0
|
gamma
|
float or None
|
Concavity parameter for MCP (> 1; default 2.7) and SCAD (> 2; default 3.7). Ignored for the convex penalties. |
None
|
n_lambda
|
int
|
Number of |
100
|
lambda_min_ratio
|
float
|
Smallest |
0.001
|
lambdas
|
array - like or None
|
Explicit |
None
|
standardize
|
bool
|
Center and scale covariates to unit variance before fitting; the penalty then acts on the standardized coefficients (coefficients are reported on the original scale). |
True
|
cause
|
int
|
Cause-of-interest event code. |
1
|
cencode
|
int
|
Censoring event code. |
0
|
cv
|
int or None
|
If a positive integer |
None
|
cv_random_state
|
int or None
|
Seed for the cross-validation fold split. |
None
|
n_jobs
|
int or None
|
Parallelism for the cross-validation folds ( |
None
|
max_iter
|
int
|
Maximum coordinate-descent sweeps per |
1000
|
tol
|
float
|
Relative-change convergence tolerance. |
1e-4
|
Attributes:
| Name | Type | Description |
|---|---|---|
coef_ |
(ndarray, shape(n_features))
|
Coefficients at the selected |
se_ |
(ndarray, shape(n_features))
|
Sandwich standard errors at the selected |
lambda_ |
float
|
Selected penalty value. |
lambda_index_ |
int
|
Index of the selected |
coef_path_ |
(ndarray, shape(n_features, n_lambda))
|
Coefficients along the full path (original scale). |
se_path_ |
(ndarray, shape(n_features, n_lambda))
|
Sandwich SEs along the path. |
lambdas_ |
(ndarray, shape(n_lambda))
|
The |
deviance_path_ |
(ndarray, shape(n_lambda))
|
|
null_deviance_ |
float
|
Deviance at |
bic_path_ |
(ndarray, shape(n_lambda))
|
|
n_iter_path_ |
ndarray of int, shape (n_lambda,)
|
|
converged_path_ |
ndarray of bool, shape (n_lambda,)
|
|
lambda_min_ |
float or None
|
CV-deviance-minimizing |
lambda_1se_ |
float or None
|
Largest |
cv_deviance_ |
ndarray or None
|
Mean CV deviance per |
cv_deviance_se_ |
ndarray or None
|
Standard error of the CV deviance per |
Examples:
>>> import numpy as np
>>> from comprisk import PenalizedFineGrayRegression, Surv
>>> rng = np.random.default_rng(0)
>>> n = 300
>>> X = rng.normal(size=(n, 8))
>>> eta = 0.7 * X[:, 0] - 0.5 * X[:, 1]
>>> time = rng.exponential(np.exp(-eta)) + 0.05
>>> event = rng.choice([0, 1, 2], size=n, p=[0.3, 0.5, 0.2])
>>> y = Surv.from_arrays(event=event, time=time)
>>> fit = PenalizedFineGrayRegression(penalty="lasso", cv=5,
... cv_random_state=0).fit(X, y)
>>> fit.coef_.shape
(8,)
fit ¶
fit(X, y=None, *, time=None, event=None, cengroup=None) -> PenalizedFineGrayRegression
Fit the penalized Fine-Gray model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
(array - like, shape(n_samples, n_features))
|
|
required |
y
|
structured array
|
|
None
|
time
|
(array - like, shape(n_samples))
|
Legacy three-argument form |
None
|
event
|
(array - like, shape(n_samples))
|
Legacy three-argument form |
None
|
cengroup
|
array-like of int, shape (n_samples,)
|
Censoring stratum for each subject. |
None
|
Returns:
| Type | Description |
|---|---|
self
|
|
predict_cumulative_incidence ¶
Predicted cumulative incidence F(t | x) at the selected lambda.
F(t|x) = 1 - exp(-Lambda_0(t) exp(x' beta)) with Lambda_0
the Breslow cumulative baseline subdistribution hazard.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
(array - like, shape(n_samples, n_features))
|
|
required |
times
|
array - like
|
Times at which to evaluate; defaults to the training cause-of-interest event-time grid. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
F |
(ndarray, shape(n_samples, n_times))
|
|
coef_at ¶
Coefficients at a specific path index or (nearest) lambda value.
Cause-specific hazard regression¶
CauseSpecificCox ¶
Cause-specific Cox proportional-hazards regression.
Fits a Cox PH model with the cause-specific censoring rule: subjects
experiencing a competing event are censored at that event time. Parity
target: survival::coxph(Surv(time, event == cause) ~ X, method="breslow").
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cause
|
int
|
Cause-of-interest event code. All other positive event codes are
competing events; |
1
|
max_iter
|
int
|
|
25
|
gtol
|
float
|
|
1e-9
|
Attributes:
| Name | Type | Description |
|---|---|---|
coef_ |
(ndarray, shape(n_features))
|
|
se_ |
(ndarray, shape(n_features))
|
|
var_ |
(ndarray, shape(n_features, n_features))
|
|
n_iter_ |
int
|
|
converged_ |
bool
|
|
log_likelihood_ |
float
|
|
log_likelihood_null_ |
float
|
|
Examples:
>>> import numpy as np
>>> from comprisk import CauseSpecificCox, Surv
>>> rng = np.random.default_rng(0)
>>> n = 300
>>> X = rng.normal(size=(n, 3))
>>> time = rng.exponential(1.0, size=n)
>>> event = rng.choice([0, 1, 2], size=n, p=[0.3, 0.5, 0.2])
>>> y = Surv.from_arrays(event=event, time=time)
>>> cs = CauseSpecificCox(cause=1).fit(X, y)
>>> cs.coef_.shape
(3,)
fit ¶
fit(X, y=None, time=None, event=None) -> CauseSpecificCox
Fit cause-specific Cox PH.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
(array - like, shape(n_samples, n_features))
|
|
required |
y
|
structured array
|
|
None
|
time
|
array - like
|
Legacy three-argument form. |
None
|
event
|
array - like
|
Legacy three-argument form. |
None
|
Non-parametric estimation & testing¶
CumulativeIncidence ¶
Aalen-Johansen cumulative incidence estimator with Pepe variance.
Estimates the cause-specific cumulative incidence function and pointwise variance for one or more competing event types, optionally stratified by a grouping variable.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cause_codes
|
list of int
|
Positive integer event codes to fit. If |
None
|
Attributes:
| Name | Type | Description |
|---|---|---|
curves_ |
dict
|
Maps |
Notes
Event coding: 0 denotes censoring; positive integers index the
competing causes. The estimator is consistent under independent
right-censoring with discrete or continuous event-time distributions
(Aalen & Johansen, 1978).
fit ¶
fit(time: ndarray | None = None, event: ndarray | None = None, *, group: ndarray | None = None) -> CumulativeIncidence
Fit the estimator.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
time
|
array-like of shape (n,)
|
Observed times. |
None
|
event
|
array-like of shape (n,)
|
Event indicators ( |
None
|
group
|
array-like of shape (n,)
|
Grouping variable. When supplied, a separate CIF is fit per
|
None
|
Returns:
| Name | Type | Description |
|---|---|---|
self |
CumulativeIncidence
|
Fitted estimator. |
timepoints ¶
Evaluate every fitted curve at a common set of query times.
Curves are returned in a deterministic order: sorted by
(str(group), cause) so that single-group fits and grouped fits
share the same key convention.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
t
|
array - like
|
Query times. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
est |
np.ndarray of shape (n_curves, n_t)
|
Cumulative incidence at the query times. |
var |
np.ndarray of shape (n_curves, n_t)
|
Pointwise variance at the query times. |
gray_test ¶
Gray's K-sample test for equality of cumulative incidence functions.
Gray's test (Gray, 1988) is the cumulative-incidence analogue of the log-rank
test for survival data. It compares the cause-specific cumulative incidence
functions F_{1,g}(t) across K >= 2 groups under right-censored
competing-risks observation. The null hypothesis is that the CIF for the
cause of interest is the same across all groups.
The test statistic is built from a (K-1)-dimensional score vector S and
a (K-1) x (K-1) covariance matrix V accumulated over the unique observed
times. S measures, for each non-pivot group, the weighted divergence of
the group's cause-1 hazard from the pooled subdistribution-hazard prediction.
V is the covariance estimator obtained from counting-process martingale
theory by tracking how each group's cumulative-influence row contributes to
score variance through cause-1 events and through the censoring induced by
competing events. The Wald-type quadratic form T = S^T V^{-1} S is
asymptotically chi-square with K-1 degrees of freedom under the null.
This module is a clean-room implementation written directly from the mathematical statement of Gray's procedure plus standard counting-process martingale theory; no GPL-licensed third-party source code (Fortran, R, or otherwise) was consulted while writing it. Variable names follow the statistical literature.
References
Gray, R.J. (1988). "A class of K-sample tests for comparing the cumulative incidence of a competing risk." Annals of Statistics 16(3):1141-1154.
Andersen, P.K., Borgan, O., Gill, R.D., Keiding, N. (1993). Statistical Models Based on Counting Processes. Springer.
GrayTestResult
dataclass
¶
GrayTestResult(stat: float, pvalue: float, df: int, score: ndarray, var: ndarray, n_groups: int, rho: float)
Outcome of :func:gray_test.
Attributes:
| Name | Type | Description |
|---|---|---|
stat |
float
|
Wald-type chi-square statistic |
pvalue |
float
|
Upper-tail p-value from a chi-square distribution with |
df |
int
|
Degrees of freedom, equal to |
score |
np.ndarray of shape (K-1,)
|
Score vector for the non-pivot groups. |
var |
np.ndarray of shape (K-1, K-1)
|
Estimated covariance matrix of |
n_groups |
int
|
Number of distinct groups |
rho |
float
|
Weight exponent |
gray_test ¶
gray_test(time, event, group, *, cause: int = 1, rho: float = 0.0) -> GrayTestResult
Gray's K-sample test for equality of cumulative incidence functions.
Tests the null hypothesis that the cumulative incidence function for the cause of interest is the same across all groups, against the alternative that at least one group differs. Censored observations and competing events are handled via the subdistribution-hazard formulation of Gray (1988).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
time
|
array-like of shape (n,)
|
Observed times (event or censoring), non-negative. |
required |
event
|
array-like of shape (n,)
|
Status code per subject. |
required |
group
|
array-like of shape (n,)
|
Group labels. May be integer- or string-typed; labels are sorted
and recoded to |
required |
cause
|
int
|
Event code identifying the cause of interest. |
1
|
rho
|
float
|
Power-of-pooled-survival weight exponent. The per-time weight is
|
0.0
|
Returns:
| Type | Description |
|---|---|
GrayTestResult
|
Statistic, p-value, degrees of freedom, score vector, score
covariance, group count, and the |
Raises:
| Type | Description |
|---|---|
ValueError
|
If inputs have inconsistent lengths, are empty, or fewer than 2 distinct groups are present. |
References
Gray, R.J. (1988). "A class of K-sample tests for comparing the cumulative incidence of a competing risk." Annals of Statistics 16(3):1141-1154.
Metrics & evaluation¶
concordance_index_cr ¶
Cause-specific concordance index for competing risks (Wolbers, 2009).
A pair (i, j) with event[i] == cause is comparable iff
time[j] > time[i] and subject j did not experience a competing
event at or before time[i]. For each comparable pair the estimate
of subject i is compared to the estimate of subject j: a higher
estimate at i is concordant, a lower one is discordant, equal
estimates count as a half-concordance (tie).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
event
|
array-like of int
|
Event/cause code per subject. |
required |
time
|
array-like of float
|
Observed time (event or censoring) per subject. |
required |
estimate
|
array-like of float
|
Predicted risk score for the cause of interest. Higher values should indicate higher risk. |
required |
cause
|
int
|
The cause of interest. |
1
|
Returns:
| Type | Description |
|---|---|
float
|
The cause-specific concordance index. Returns |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
concordance_index_uno_cr ¶
Cause-specific Uno IPCW concordance index for competing risks.
Combines the Wolbers (2009) cause-specific pair structure with
inverse-probability-of-censoring weighting (Uno, 2011). Returns the
raw ratio numerator / denominator (not 1 - num/denom).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
event
|
array - like
|
Per-subject event code, observed time, and predicted risk score for the cause of interest. |
required |
time
|
array - like
|
Per-subject event code, observed time, and predicted risk score for the cause of interest. |
required |
estimate
|
array - like
|
Per-subject event code, observed time, and predicted risk score for the cause of interest. |
required |
cause
|
int
|
Cause of interest (keyword-only). |
required |
weights
|
ndarray
|
IPCW weights, e.g. as produced by :func: |
required |
Returns:
| Type | Description |
|---|---|
float
|
Concordance value, or |
compute_uno_weights ¶
compute_uno_weights(time, event, *, gmin: float | str = 'auto', ess_frac: float = 0.2, ess_min: int = 20, eps: float = 1e-12, eps_keep: float | None = None) -> ndarray
Per-observation IPCW weights using the KM censoring estimator.
For each subject i the weight is 1 / G(time[i]^-)^2 (Uno,
2011) where G is the KM-of-censoring under a competing-risks
events-first tie convention. Subjects whose left-limit G falls
below the chosen lower clip gmin keep the data row alive with a
tiny eps_keep weight rather than being silently dropped.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
time
|
array-like of float
|
Observed time per subject. |
required |
event
|
array-like of int
|
Event/cause code per subject. |
required |
gmin
|
float | {'auto', 'none'}
|
Lower clip for the censoring survivor. |
"auto"
|
ess_frac
|
float
|
Effective-sample-size fraction target for |
0.20
|
ess_min
|
int
|
Effective-sample-size minimum target for |
20
|
eps
|
float
|
Floor used when squaring |
1e-12
|
eps_keep
|
float
|
Weight assigned to gated-out rows. Defaults to
|
None
|
Returns:
| Type | Description |
|---|---|
ndarray
|
|
score_cr ¶
score_cr(predictions: Mapping[str, ndarray], test_time: ndarray, test_event: ndarray, eval_times: ndarray, *, cause: int = 1, metrics: Sequence[str] = ('auc', 'brier'), n_bootstrap: int = 0, confidence_level: float = 0.95, n_jobs: int = -1, random_state: int | None = None, calibration_at: Sequence[float] | None = None, calibration_n_bins: int = 10) -> ScoreResult
Competing-risks time-dep AUC, Brier, IBS, iAUC.
One-call replacement for the AUC/Brier block of R
riskRegression::Score in CR mode. Accepts an arbitrary number of
candidate models as a dict of name to (n_test, n_eval_times) CIF
probability matrix at the cause of interest.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
predictions
|
Mapping[str, ndarray]
|
|
required |
test_time
|
array-like of float
|
Observed time per test subject. |
required |
test_event
|
array-like of int
|
Event code per test subject. |
required |
eval_times
|
array-like of float
|
Times at which the metrics are evaluated. Must align with the columns of every prediction matrix. |
required |
cause
|
int
|
Cause of interest. |
1
|
metrics
|
sequence of str
|
Subset of |
``("auc", "brier")``
|
n_bootstrap
|
int
|
Number of bootstrap resamples for 95% CIs. |
0
|
confidence_level
|
float
|
|
0.95
|
n_jobs
|
int
|
Number of parallel workers for the bootstrap loop. |
-1
|
random_state
|
int or None
|
Seed for the bootstrap. |
None
|
calibration_at
|
sequence of float
|
When supplied, populates |
None
|
calibration_n_bins
|
int
|
Number of quantile bins for the calibration block. |
10
|
Returns:
| Type | Description |
|---|---|
ScoreResult
|
|
calibration_cr ¶
calibration_cr(predictions: Mapping[str, ndarray], test_time: ndarray, test_event: ndarray, eval_times: ndarray, *, cause: int = 1, n_bins: int = 10, confidence_level: float = 0.95) -> DataFrame
Quantile-decile calibration plot data with per-bin Wilson CI.
Tidy / long-form one-call replacement for the R
riskRegression::plotCalibration(method="quantile", q=10) block.
For every (model, eval_time) pair the predicted CIF values are
partitioned into n_bins quantile bins; per bin the predicted
midpoint is the bin mean of predicted CIF, the observed frequency is
the Aalen-Johansen empirical cumulative incidence (cause of interest)
fit on the bin's subjects and evaluated at the eval time, and the
confidence interval is a textbook Wilson score interval.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
predictions
|
Mapping[str, ndarray]
|
|
required |
test_time
|
array - like
|
Held-out fold time and event code ( |
required |
test_event
|
array - like
|
Held-out fold time and event code ( |
required |
eval_times
|
array-like of float
|
Times at which calibration is evaluated. The same eval-time column index is used across all models. |
required |
cause
|
int
|
|
1
|
n_bins
|
int
|
Number of quantile bins per (model, time). Mirrors R's
|
10
|
confidence_level
|
float
|
Confidence level for the Wilson score interval. |
0.95
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Long-form, one row per (model, eval_time, bin). Columns:
|
Data helpers¶
Surv ¶
Build the structured survival y array sklearn workflows want.
Mirrors :class:sksurv.util.Surv. The returned array has named
fields event and time and shape (n,) — sliceable by
sklearn cross-validation utilities, picklable, copy-equivalent to
a pair of 1-D arrays.
Examples:
>>> import numpy as np
>>> from comprisk import Surv
>>> y = Surv.from_arrays(event=[0, 1, 2, 0], time=[1.0, 2.0, 3.0, 0.5])
>>> y.dtype.names
('event', 'time')
from_arrays
staticmethod
¶
Pack event and time 1-D arrays into a structured array.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
event
|
(array - like, shape(n))
|
Event indicator. |
required |
time
|
(array - like, shape(n))
|
Observed time-to-event or censoring. |
required |
name_event
|
str
|
Field names in the structured array. Defaults match sksurv. |
'event'
|
name_time
|
str
|
Field names in the structured array. Defaults match sksurv. |
'event'
|
Returns:
| Name | Type | Description |
|---|---|---|
y |
structured ndarray, shape (n,)
|
Fields |