O*NET — Profiling US occupations¶
Infer a worker's full skill profile from a few task observations using the U.S. Department of Labor's O*NET taxonomy.
What you'll learn
- Working with large feature spaces (894 occupations x 120 features)
- Block structure in covariance (cognitive vs. physical skills)
- Cross-domain transfer: observing one skill type predicts others
- Anti-correlation: cognitive skills negatively predict physical skills
About O*NET¶
O*NET 30.2 describes 894 occupations across 120 features:
- 35 skills (Programming, Writing, Mathematics, ...)
- 33 knowledge areas (Computers & Electronics, Engineering, ...)
- 52 abilities (Deductive Reasoning, Static Strength, ...)
Each feature has a continuous importance rating normalised to [0, 1].
Step 1: Load the population¶
Population(894 entities x 120 skills, shrinkage=0.0054)
Condition number: 1884.2
Effective dimensions: ~5 (90% variance)
Higher shrinkage
With 120 features and 894 entities, the Ledoit-Wolf estimator applies more shrinkage (0.005) than in the LLM example (0.0006 with only 6 features). This regularisation is critical for numerical stability when K approaches N.
Step 2: Explore the covariance structure¶
for _, row in pop.top_correlations(k=8).iterrows():
a = row["feature_a"]
b = row["feature_b"]
print(f" {a:>35} <-> {b:<35} r = {row['correlation']:+.3f}")
Skill:Equipment Maintenance <-> Skill:Repairing r = +0.972
Ability:Arm-Hand Steadiness <-> Ability:Manual Dexterity r = +0.961
Ability:Gross Body Coordination <-> Ability:Stamina r = +0.957
Skill:Writing <-> Ability:Written Expression r = +0.950
Skill:Reading Comprehension <-> Ability:Written Comprehension r = +0.948
The top correlations reveal block structure:
- Cognitive block: Writing ↔ Written Expression (r=0.95), Reading Comprehension ↔ Written Comprehension (r=0.95)
- Physical block: Equipment Maintenance ↔ Repairing (r=0.97), Arm-Hand Steadiness ↔ Manual Dexterity (r=0.96)
- Cross-block anti-correlation: Writing ↔ Static Strength (r ≈ -0.55)
This means observing high Writing skill simultaneously:
- Increases predictions for Written Expression, Critical Thinking
- Decreases predictions for Static Strength, Manual Dexterity
Step 3: Hold out and predict¶
Hold out a Software Developer and observe 3 features:
true_vec = pop.entity("Software Developers")
profile = pop.profile()
profile.observe("Skill:Programming", true_vec[pop.feature_names.index("Skill:Programming")])
profile.observe("Skill:Mathematics", true_vec[pop.feature_names.index("Skill:Mathematics")])
profile.observe("Knowledge:Computers and Electronics", true_vec[pop.feature_names.index("Knowledge:Computers and Electronics")])
# Check predictions on selected features
check = [
"Skill:Complex Problem Solving",
"Skill:Critical Thinking",
"Knowledge:Mathematics",
"Ability:Deductive Reasoning",
"Ability:Written Comprehension",
"Ability:Static Strength",
"Ability:Manual Dexterity",
"Ability:Stamina",
]
for feat in check:
idx = pop.feature_names.index(feat)
pred = profile.mean(feat)
std = profile.std(feat)
true = true_vec[idx]
print(f" {feat:<42} true={true:.3f} pred={pred:.3f} ± {std:.3f} err={abs(true-pred):.3f}")
Skill:Complex Problem Solving true=0.781 pred=0.769 ± 0.036 err=0.012
Skill:Critical Thinking true=0.714 pred=0.701 ± 0.042 err=0.013
Knowledge:Mathematics true=0.626 pred=0.645 ± 0.051 err=0.019
Ability:Deductive Reasoning true=0.714 pred=0.698 ± 0.039 err=0.016
Ability:Written Comprehension true=0.627 pred=0.614 ± 0.044 err=0.013
Ability:Static Strength true=0.143 pred=0.178 ± 0.062 err=0.035 ← correctly low
Ability:Manual Dexterity true=0.286 pred=0.312 ± 0.058 err=0.026 ← correctly low
Ability:Stamina true=0.143 pred=0.189 ± 0.065 err=0.046 ← correctly low
From just 3 observations, skillinfer correctly predicts that a software developer has:
- High cognitive skills (Complex Problem Solving, Critical Thinking, Deductive Reasoning)
- Low physical skills (Static Strength, Manual Dexterity, Stamina)
Step 4: Validate transfer helps¶
results = skillinfer.validation.held_out_evaluation(
pop, frac_observed=[0.1, 0.3, 0.5], n_splits=10, obs_noise=0.05
)
summary = results.groupby(["frac_observed", "method"])["cosine_similarity"].mean()
print(summary)
The Kalman filter (full covariance) consistently outperforms the diagonal baseline (no cross-feature transfer), especially when few features are observed.
Full example¶
See examples/onet.py for the complete script including validation.
Key takeaway¶
With 120 features, observing just 12 (10%) gives a cosine similarity of ~0.95 to the true profile. The block structure in human skills — cognitive vs. physical — means that a few observations from one domain predict the entire profile, including anti-correlated features in other domains.