Quickstart¶

This guide walks through the core workflow: build a population, observe a few skills, and predict the full profile.

1. Build a population¶

A Population is a matrix of entities (rows) described by skills (columns). skillinfer learns how skills co-vary across the population.

import skillinfer

# Built-in O*NET dataset: 894 occupations x 120 skills
pop = skillinfer.datasets.onet()
print(pop)

Population(894 entities x 120 skills, shrinkage=0.0054)
  Condition number: 1884.2
  Effective dimensions: ~5 (90% variance)

  Top skill correlations:
    Skill:Equipment Maintenance <-> Skill:Repairing                r = +0.972
    Ability:Arm-Hand Steadiness <-> Ability:Manual Dexterity       r = +0.961
                  Skill:Writing <-> Ability:Written Expression     r = +0.950

You can also build from your own data:

import pandas as pd

df = pd.read_csv("scores.csv", index_col=0)  # rows=entities, columns=skills
pop = skillinfer.Population.from_dataframe(df)

By default, each column is normalized to [0, 1]. Pass normalize=False if your data is already on a meaningful scale.

2. Observe and predict¶

Create a Profile for a new entity. It starts at the population mean with the full population covariance as uncertainty.

profile = pop.profile()

# Observe one skill
profile.observe("Skill:Programming", 0.92)

# Predict another — never observed, inferred via covariance
profile.predict("Skill:Mathematics")
# {'feature': 'Skill:Mathematics', 'mean': 0.81, 'std': 0.11, ...}

Each call to observe() runs a Kalman update: the observed skill propagates to every other skill proportionally to how much they co-vary.

You can chain observations or observe many at once:

profile.observe("Skill:Critical Thinking", 0.85).observe("Skill:Writing", 0.70)

# or equivalently:
profile.observe_many({"Skill:Critical Thinking": 0.85, "Skill:Writing": 0.70})

3. Get the full profile¶

print(profile.predict())

                           feature   mean    std  ci_lower  ci_upper
           Skill:Active Learning   0.94   0.12      0.71      1.17
          Skill:Active Listening   0.74   0.10      0.55      0.93
   Skill:Complex Problem Solving   1.02   0.09      0.83      1.20
         Skill:Critical Thinking   0.85   0.01      0.83      0.87  ← observed
             Skill:Programming     0.92   0.01      0.91      0.93  ← observed
...
[120 rows]

# What should we assess next? (highest remaining uncertainty)
profile.most_uncertain(k=3)

                     feature   mean    std
       Knowledge:Mechanical   0.24   0.24
    Skill:Equipment Selection 0.28   0.24
  Ability:Arm-Hand Steadiness 0.01   0.24

4. Use a specific entity as prior¶

If you know the new entity is similar to an existing one, use it as the starting point:

# "This person's background is similar to a Software Developer"
profile = pop.profile(prior_entity="Software Developers")
profile.observe("Skill:Writing", 0.90)  # but stronger at writing

Next steps¶

Core Concepts — understand the model and when it works best
LLM Benchmarks tutorial — full end-to-end example with real data
API Reference — complete method documentation