Profile¶

A skill profile for one entity. Created via Population.profile(), updated via observe() calls. Gets sharper with each observation.

profile = pop.profile()
profile.observe("BBH", 32.7)
print(profile.predict())

Observation¶

`observe`¶

def observe(self, feature: str | int | Skill, value: float | None = None) -> Profile

Observe one feature value. Runs a Kalman update, updating the full profile in place. Returns self for chaining.

If feature is a Skill with a score, value can be omitted.

Example

profile.observe("BBH", 32.7).observe("IFEval", 47.1)

# Or using Skill objects
profile.observe(Skill("BBH", score=32.7))

`observe_many`¶

def observe_many(self, observations: dict[str | int, float] | list[Skill]) -> Profile

Observe multiple features at once. Accepts a {feature: value} dict or a list of Skill objects with scores. Returns self for chaining.

Example

profile.observe_many({"BBH": 32.7, "IFEval": 47.1, "MATH Lvl 5": 18.0})

# Or using Skill objects
profile.observe_many([Skill("BBH", score=32.7), Skill("IFEval", score=47.1)])

Prediction¶

`predict`¶

def predict(
    self,
    feature: str | int | None = None,
    level: float = 0.95,
    detail: bool = False,
) -> pd.DataFrame | dict

Predict skill values with confidence intervals.

Parameters

Parameter	Type	Default	Description
`feature`	`str \\| int \\| None`	`None`	Predict one skill, or all if `None`.
`level`	`float`	`0.95`	Confidence level for the interval.
`detail`	`bool`	`False`	Include `confidence` (0-1) and `source` (`"observed"` / `"predicted"`) columns.

Returns

Single feature: dict with keys feature, mean, std, ci_lower, ci_upper
All features: pd.DataFrame with those columns

Example

profile.predict("GPQA")
# {'feature': 'GPQA', 'mean': 8.13, 'std': 2.78, 'ci_lower': 2.68, 'ci_upper': 13.58}

profile.predict()
#      feature   mean    std  ci_lower  ci_upper
# 0     IFEval  47.10   1.00    45.14     49.06
# ...

`most_uncertain`¶

def most_uncertain(self, k: int = 10) -> pd.DataFrame

Top-k features with highest posterior uncertainty. Returns DataFrame with columns [feature, mean, std].

Active learning

Use most_uncertain() to decide which feature to observe next — the most uncertain feature gives the most information gain.

Task Matching¶

`match_score`¶

def match_score(
    self,
    task_vector: dict[str, float] | np.ndarray | Task,
    threshold: float | None = None,
    level: float = 0.95,
) -> MatchResult

Score this agent against a task. Computes expected weighted-average performance (normalised by weight sum) and propagates uncertainty.

Parameters

Parameter	Type	Default	Description
`task_vector`	`dict \\| np.ndarray \\| Task`	—	Skill importance weights. Normalised internally.
`threshold`	`float \\| None`	`None`	If given, compute P(score > threshold).
`level`	`float`	`0.95`	Confidence level for the interval.

Returns: MatchResult (named tuple) with fields: score, std, ci_lower, ci_upper, p_above_threshold.

p_above_threshold is computed as \(1 - \Phi\!\left(\frac{\text{threshold} - \text{score}}{\text{std}}\right)\) where \(\Phi\) is the standard normal CDF, assuming the weighted score is Gaussian.

Example

from skillinfer import Task

task = Task({"MATH Lvl 5": 1.0, "GPQA": 0.5})
result = profile.match_score(task, threshold=50.0)
print(f"Expected: {result.score:.1f} ± {result.std:.1f}")
print(f"P(score > 50): {result.p_above_threshold:.1%}")

`skillinfer.rank_agents`¶

skillinfer.rank_agents(
    task_vector: dict[str, float] | np.ndarray | Task,
    profiles: dict[str, Profile],
    threshold: float | None = None,
) -> pd.DataFrame

Rank a pool of agents by expected task performance. Calls match_score on each profile and sorts descending.

Parameters

Parameter	Type	Default	Description
`task_vector`	`dict \\| np.ndarray \\| Task`	—	Skill importance weights.
`profiles`	`dict[str, Profile]`	—	Map of agent name → Profile.
`threshold`	`float \\| None`	`None`	If given, include P(score > threshold).

Returns: DataFrame with columns [agent, expected_score, std, p_above_threshold].

Example

task = Task({"math": 1.0, "reasoning": 0.5})
ranking = skillinfer.rank_agents(task, {"alice": alice, "gpt-4o": gpt4o})
print(ranking)
#     agent  expected_score    std  p_above_threshold
# 0   alice            0.91   0.03               None
# 1  gpt-4o            0.85   0.03               None

Evaluation¶

`summary`¶

def summary(self, true_vector: np.ndarray | None = None) -> dict

Summary statistics for this profile.

Returns: dict with keys:

Key	Type	Description
`n_features`	`int`	Total feature count
`n_observed`	`int`	Number of observed features
`n_predicted`	`int`	Number of predicted features
`mean_std`	`float`	Average posterior standard deviation
`uncertainty_reduction`	`float`	Fraction of prior uncertainty removed (0-1)
`top_predicted`	`list[dict]`	Top 3 unobserved features by predicted mean
`most_uncertain`	`list[dict]`	Top 3 features by posterior std

If true_vector is given, also includes:

Key	Type	Description
`mae`	`float`	Mean absolute error
`rmse`	`float`	Root mean squared error
`max_error`	`float`	Largest single prediction error
`cosine_similarity`	`float`	Cosine similarity to ground truth
`coverage_95`	`float`	Fraction of true values inside 95% CIs

`mae`¶

def mae(self, true_vector: np.ndarray) -> float

Mean absolute error between posterior mean and a ground truth vector.

`rmse`¶

def rmse(self, true_vector: np.ndarray) -> float

Root mean squared error between posterior mean and a ground truth vector.

`metrics_by_category`¶

def metrics_by_category(
    self,
    true_vector: np.ndarray,
    categories: dict[str, str] | None = None,
    sep: str = ":",
) -> pd.DataFrame

Per-category prediction metrics on a known truth vector. Splits features into categories — by sep in feature names by default (e.g. Skill:Programming → Skill) — and reports MAE, RMSE, and recovery per group. Recovery is the share of squared error eliminated relative to predicting the prior mean.

Parameters

Parameter	Type	Default	Description
`true_vector`	`np.ndarray`	—	Ground-truth profile, shape `(K,)`.
`categories`	`dict[str, str] \\| None`	`None`	Optional `{feature_name: category}` mapping. If `None`, derived by splitting feature names on `sep`.
`sep`	`str`	`":"`	Separator used when deriving categories from feature names. Features without `sep` go into `"uncategorised"`.

Returns: DataFrame with columns [category, n_features, n_observed, mae, rmse, recovery].

Example

profile.observe("Skill:Programming", 0.92)
profile.metrics_by_category(true_vec)
#    category  n_features  n_observed    mae   rmse  recovery
# 0   Ability          52           0  0.131  0.171     0.082
# 1 Knowledge          33           0  0.142  0.184     0.421
# 2     Skill          35           1  0.103  0.140     0.290

`similarity`¶

def similarity(self, other: np.ndarray) -> float

Cosine similarity between the posterior mean and a target vector. Returns float in [-1, 1].

`uncertainty_ratio`¶

def uncertainty_ratio(self, Sigma_0: np.ndarray) -> float

Fraction of prior uncertainty remaining: tr(Sigma) / tr(Sigma_0). A value of 0.5 means uncertainty has halved since the prior. Useful for deciding when enough observations have been collected.

Parameters

Parameter	Type	Description
`Sigma_0`	`np.ndarray`	(K, K) prior covariance, typically `pop.covariance`.

Query Methods¶

`mean`¶

def mean(self, feature: str | int | None = None) -> float | np.ndarray

Posterior mean. Returns a scalar if feature is given, the full (K,) vector otherwise.

`std`¶

def std(self, feature: str | int | None = None) -> float | np.ndarray

Posterior standard deviation (square root of the covariance diagonal).

`confidence_interval`¶

def confidence_interval(self, feature: str | int, level: float = 0.95) -> tuple[float, float]

Gaussian confidence interval for a single feature. Returns (lower, upper).

`to_dataframe`¶

def to_dataframe(self, detail: bool = False) -> pd.DataFrame

Full posterior as a DataFrame with columns [feature, mean, std]. With detail=True, adds confidence and source columns.

`copy`¶

def copy(self) -> Profile

Deep copy. The returned Profile has independent arrays and preserves all state.

Export / Import¶

`to_dict`¶

def to_dict(self) -> dict

Export the profile as a plain dict (JSON-serialisable). Contains feature_names, mean, std, n_observations, observed_features, noise.

`to_json`¶

def to_json(self, path: str | None = None) -> str

Export the profile as JSON. If path is given, writes to file. Always returns the JSON string.

`Profile.from_dict`¶

@classmethod
def from_dict(cls, data: dict) -> Profile

Reconstruct a Profile from the output of to_dict(). Restores the mean vector, observed features, and metadata. The covariance is reconstructed as a diagonal matrix from the exported std values.

`Profile.from_json`¶

@classmethod
def from_json(cls, source: str) -> Profile

Reconstruct a Profile from a JSON string or file path.

Example

# Save
profile.to_json("agent_profile.json")

# Load (from file)
restored = Profile.from_json("agent_profile.json")

# Or round-trip via dict
d = profile.to_dict()
restored = Profile.from_dict(d)

Note

Export/import preserves the mean vector and metadata but uses a diagonal covariance approximation. For full covariance fidelity, re-create the profile from a Population and re-apply observations.

Properties¶

Property	Type	Description
`agent_vector`	`pd.Series`	Posterior mean as a labeled Series
`covariance_matrix`	`pd.DataFrame`	Posterior covariance as a labeled DataFrame

Attributes¶

Attribute	Type	Description
`mu`	`np.ndarray`	(K,) posterior mean vector
`Sigma`	`np.ndarray`	(K, K) posterior covariance matrix
`feature_names`	`list[str]`	Feature names (from Population)
`n_observations`	`int`	Number of `observe()` calls applied
`noise`	`float`	Observation noise standard deviation

Profile¶

Observation¶

observe¶

observe_many¶

Prediction¶

predict¶

most_uncertain¶

Task Matching¶

match_score¶

skillinfer.rank_agents¶

Evaluation¶

summary¶

mae¶

rmse¶

metrics_by_category¶

similarity¶

uncertainty_ratio¶

Query Methods¶

mean¶

std¶

confidence_interval¶

to_dataframe¶

copy¶

Export / Import¶

to_dict¶

to_json¶

Profile.from_dict¶

Profile.from_json¶

Properties¶

Attributes¶

`observe`¶

`observe_many`¶

`predict`¶

`most_uncertain`¶

`match_score`¶

`skillinfer.rank_agents`¶

`summary`¶

`mae`¶

`rmse`¶

`metrics_by_category`¶

`similarity`¶

`uncertainty_ratio`¶

`mean`¶

`std`¶

`confidence_interval`¶

`to_dataframe`¶

`copy`¶

`to_dict`¶

`to_json`¶

`Profile.from_dict`¶

`Profile.from_json`¶