Profile¶
A skill profile for one entity. Created via Population.profile(), updated via observe() calls. Gets sharper with each observation.
Observation¶
observe¶
Observe one feature value. Runs a Kalman update, updating the full profile in place. Returns self for chaining.
If feature is a Skill with a score, value can be omitted.
Example
profile.observe("BBH", 32.7).observe("IFEval", 47.1)
# Or using Skill objects
profile.observe(Skill("BBH", score=32.7))
observe_many¶
Observe multiple features at once. Accepts a {feature: value} dict or a list of Skill objects with scores. Returns self for chaining.
Example
profile.observe_many({"BBH": 32.7, "IFEval": 47.1, "MATH Lvl 5": 18.0})
# Or using Skill objects
profile.observe_many([Skill("BBH", score=32.7), Skill("IFEval", score=47.1)])
Prediction¶
predict¶
def predict(
self,
feature: str | int | None = None,
level: float = 0.95,
detail: bool = False,
) -> pd.DataFrame | dict
Predict skill values with confidence intervals.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
feature |
str \| int \| None |
None |
Predict one skill, or all if None. |
level |
float |
0.95 |
Confidence level for the interval. |
detail |
bool |
False |
Include confidence (0-1) and source ("observed" / "predicted") columns. |
Returns
- Single feature:
dictwith keysfeature,mean,std,ci_lower,ci_upper - All features:
pd.DataFramewith those columns
Example
profile.predict("GPQA")
# {'feature': 'GPQA', 'mean': 8.13, 'std': 2.78, 'ci_lower': 2.68, 'ci_upper': 13.58}
profile.predict()
# feature mean std ci_lower ci_upper
# 0 IFEval 47.10 1.00 45.14 49.06
# ...
most_uncertain¶
Top-k features with highest posterior uncertainty. Returns DataFrame with columns [feature, mean, std].
Active learning
Use most_uncertain() to decide which feature to observe next — the most uncertain feature gives the most information gain.
Task Matching¶
match_score¶
def match_score(
self,
task_vector: dict[str, float] | np.ndarray | Task,
threshold: float | None = None,
level: float = 0.95,
) -> MatchResult
Score this agent against a task. Computes expected weighted-average performance (normalised by weight sum) and propagates uncertainty.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
task_vector |
dict \| np.ndarray \| Task |
— | Skill importance weights. Normalised internally. |
threshold |
float \| None |
None |
If given, compute P(score > threshold). |
level |
float |
0.95 |
Confidence level for the interval. |
Returns: MatchResult (named tuple) with fields: score, std, ci_lower, ci_upper, p_above_threshold.
p_above_threshold is computed as \(1 - \Phi\!\left(\frac{\text{threshold} - \text{score}}{\text{std}}\right)\) where \(\Phi\) is the standard normal CDF, assuming the weighted score is Gaussian.
Example
from skillinfer import Task
task = Task({"MATH Lvl 5": 1.0, "GPQA": 0.5})
result = profile.match_score(task, threshold=50.0)
print(f"Expected: {result.score:.1f} ± {result.std:.1f}")
print(f"P(score > 50): {result.p_above_threshold:.1%}")
skillinfer.rank_agents¶
skillinfer.rank_agents(
task_vector: dict[str, float] | np.ndarray | Task,
profiles: dict[str, Profile],
threshold: float | None = None,
) -> pd.DataFrame
Rank a pool of agents by expected task performance. Calls match_score on each profile and sorts descending.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
task_vector |
dict \| np.ndarray \| Task |
— | Skill importance weights. |
profiles |
dict[str, Profile] |
— | Map of agent name → Profile. |
threshold |
float \| None |
None |
If given, include P(score > threshold). |
Returns: DataFrame with columns [agent, expected_score, std, p_above_threshold].
Example
task = Task({"math": 1.0, "reasoning": 0.5})
ranking = skillinfer.rank_agents(task, {"alice": alice, "gpt-4o": gpt4o})
print(ranking)
# agent expected_score std p_above_threshold
# 0 alice 0.91 0.03 None
# 1 gpt-4o 0.85 0.03 None
Evaluation¶
summary¶
Summary statistics for this profile.
Returns: dict with keys:
| Key | Type | Description |
|---|---|---|
n_features |
int |
Total feature count |
n_observed |
int |
Number of observed features |
n_predicted |
int |
Number of predicted features |
mean_std |
float |
Average posterior standard deviation |
uncertainty_reduction |
float |
Fraction of prior uncertainty removed (0-1) |
top_predicted |
list[dict] |
Top 3 unobserved features by predicted mean |
most_uncertain |
list[dict] |
Top 3 features by posterior std |
If true_vector is given, also includes:
| Key | Type | Description |
|---|---|---|
mae |
float |
Mean absolute error |
rmse |
float |
Root mean squared error |
max_error |
float |
Largest single prediction error |
cosine_similarity |
float |
Cosine similarity to ground truth |
coverage_95 |
float |
Fraction of true values inside 95% CIs |
mae¶
Mean absolute error between posterior mean and a ground truth vector.
rmse¶
Root mean squared error between posterior mean and a ground truth vector.
metrics_by_category¶
def metrics_by_category(
self,
true_vector: np.ndarray,
categories: dict[str, str] | None = None,
sep: str = ":",
) -> pd.DataFrame
Per-category prediction metrics on a known truth vector. Splits features into categories — by sep in feature names by default (e.g. Skill:Programming → Skill) — and reports MAE, RMSE, and recovery per group. Recovery is the share of squared error eliminated relative to predicting the prior mean.
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
true_vector |
np.ndarray |
— | Ground-truth profile, shape (K,). |
categories |
dict[str, str] \| None |
None |
Optional {feature_name: category} mapping. If None, derived by splitting feature names on sep. |
sep |
str |
":" |
Separator used when deriving categories from feature names. Features without sep go into "uncategorised". |
Returns: DataFrame with columns [category, n_features, n_observed, mae, rmse, recovery].
Example
profile.observe("Skill:Programming", 0.92)
profile.metrics_by_category(true_vec)
# category n_features n_observed mae rmse recovery
# 0 Ability 52 0 0.131 0.171 0.082
# 1 Knowledge 33 0 0.142 0.184 0.421
# 2 Skill 35 1 0.103 0.140 0.290
similarity¶
Cosine similarity between the posterior mean and a target vector. Returns float in [-1, 1].
uncertainty_ratio¶
Fraction of prior uncertainty remaining: tr(Sigma) / tr(Sigma_0). A value of 0.5 means uncertainty has halved since the prior. Useful for deciding when enough observations have been collected.
Parameters
| Parameter | Type | Description |
|---|---|---|
Sigma_0 |
np.ndarray |
(K, K) prior covariance, typically pop.covariance. |
Query Methods¶
mean¶
Posterior mean. Returns a scalar if feature is given, the full (K,) vector otherwise.
std¶
Posterior standard deviation (square root of the covariance diagonal).
confidence_interval¶
Gaussian confidence interval for a single feature. Returns (lower, upper).
to_dataframe¶
Full posterior as a DataFrame with columns [feature, mean, std]. With detail=True, adds confidence and source columns.
copy¶
Deep copy. The returned Profile has independent arrays and preserves all state.
Export / Import¶
to_dict¶
Export the profile as a plain dict (JSON-serialisable). Contains feature_names, mean, std, n_observations, observed_features, noise.
to_json¶
Export the profile as JSON. If path is given, writes to file. Always returns the JSON string.
Profile.from_dict¶
Reconstruct a Profile from the output of to_dict(). Restores the mean vector, observed features, and metadata. The covariance is reconstructed as a diagonal matrix from the exported std values.
Profile.from_json¶
Reconstruct a Profile from a JSON string or file path.
Example
# Save
profile.to_json("agent_profile.json")
# Load (from file)
restored = Profile.from_json("agent_profile.json")
# Or round-trip via dict
d = profile.to_dict()
restored = Profile.from_dict(d)
Note
Export/import preserves the mean vector and metadata but uses a diagonal covariance approximation. For full covariance fidelity, re-create the profile from a Population and re-apply observations.
Properties¶
| Property | Type | Description |
|---|---|---|
agent_vector |
pd.Series |
Posterior mean as a labeled Series |
covariance_matrix |
pd.DataFrame |
Posterior covariance as a labeled DataFrame |
Attributes¶
| Attribute | Type | Description |
|---|---|---|
mu |
np.ndarray |
(K,) posterior mean vector |
Sigma |
np.ndarray |
(K, K) posterior covariance matrix |
feature_names |
list[str] |
Feature names (from Population) |
n_observations |
int |
Number of observe() calls applied |
noise |
float |
Observation noise standard deviation |