Skip to content

Validation

Held-out evaluation, active-learning curves, and uncertainty diagnostics. Use these to quantify how much covariance transfer helps on your data and to compare observation-selection strategies.

import skillinfer
from skillinfer import validation

pop = skillinfer.datasets.onet()
results = validation.held_out_evaluation(pop, frac_observed=0.3, n_splits=10)
results.groupby("method")[["rmse", "pearson_r", "ndcg_at_5", "crps"]].mean()

Functions

held_out_evaluation

def held_out_evaluation(
    pop: Population,
    frac_observed: float | list[float] = 0.3,
    n_splits: int = 10,
    obs_noise: float = 0.02,
    seed: int = 42,
) -> pd.DataFrame

Hold out 20% of entities, observe a fraction of features, and predict the rest. The covariance and mean are re-estimated from the training 80% so there is no leakage into the test predictions. Compares three methods:

  • kalman — full-covariance Gaussian conditioning (with transfer).
  • knn — k-nearest-neighbour regression in observed-feature space (k=10, inverse-distance weighted).
  • prior — population mean (no observations used).

Parameters

Parameter Type Default Description
pop Population Population providing the prior mean and covariance.
frac_observed float \| list[float] 0.3 Fraction(s) of features to observe per held-out entity.
n_splits int 10 Independent train/test splits.
obs_noise float 0.02 Gaussian noise added to each observation.
seed int 42 RNG seed.

Returns: DataFrame with one row per (split, entity, frac_observed, method) and columns:

Column Description
cosine_similarity Directional alignment between predicted and true profile.
rmse, mae, mse Point error on unobserved features.
r_squared Coefficient of determination (1.0 = perfect, 0.0 = no better than the mean).
pearson_r Pearson correlation between predicted and true profile. Shape agreement up to a linear rescaling.
spearman_rho Spearman rank correlation. Captures rank fidelity.
precision_at_5 \|top-5 predicted ∩ top-5 true\| / 5. Top-strengths recovery.
ndcg_at_5 Normalised Discounted Cumulative Gain at 5, true values as gains.
calibration_coverage Fraction of true values inside the posterior 90% CI. ~0.90 is well-calibrated. Kalman only.
mean_log_likelihood Mean Gaussian log-likelihood under the posterior. Kalman only.
crps Mean Continuous Ranked Probability Score under the posterior Gaussian. Lower is better; in the same units as the data. Kalman only.

The last three are NaN for knn and prior because they do not produce a posterior covariance.


active_learning_curve

def active_learning_curve(
    pop: Population,
    true_vector: np.ndarray,
    n_steps: int = 20,
    strategies: tuple[str, ...] = ("uncertainty", "random"),
    obs_noise: float = 0.05,
    n_trials: int = 10,
    seed: int = 42,
) -> pd.DataFrame

Compare observation-selection strategies on a single true profile. For each strategy, repeatedly cold-starts a profile, observes a noisy value at the chosen feature index, and records recovery metrics over all features after each step.

Parameters

Parameter Type Default Description
pop Population Provides the prior mean and covariance.
true_vector np.ndarray Ground-truth profile to recover, shape (K,).
n_steps int 20 Observations per trial.
strategies tuple[str, ...] ("uncertainty", "random") Any of "uncertainty" (pick the highest-std unobserved feature) or "random".
obs_noise float 0.05 Gaussian noise added to each observation.
n_trials int 10 Independent trials per strategy.
seed int 42 Base RNG seed.

Returns: DataFrame with columns [trial, strategy, step, mae, rmse, recovery], where recovery = 1 - ‖μ - true‖² / ‖prior - true‖².

Example

true = pop.matrix.iloc[0].values
df = validation.active_learning_curve(pop, true, n_steps=20, n_trials=5)
df.groupby(["strategy", "step"])["recovery"].mean().unstack("strategy")

Note

Whether "uncertainty" beats "random" is dataset-dependent: uncertainty sampling chases high-variance dimensions, which on highly correlated populations are not always the highest-leverage ones through the covariance. Run the comparison on your population before assuming a winner.


transfer_delta

def transfer_delta(results: pd.DataFrame, metric: str = "cosine_similarity") -> pd.DataFrame

Compute the per-frac_observed advantage of kalman over the knn baseline on the chosen metric. Takes the output of held_out_evaluation and returns one row per observation fraction with columns [frac_observed, kalman, baseline, delta].


uncertainty_shrinkage

def uncertainty_shrinkage(
    state_or_Sigma: Profile | np.ndarray,
    Sigma_0: np.ndarray,
) -> float

Posterior uncertainty as a fraction of prior uncertainty: tr(Σ) / tr(Σ₀). A value of 0.5 means uncertainty has halved since the prior; 0.0 means full collapse. Accepts either a Profile or a posterior covariance matrix.