ESCO — Profiling EU occupations¶

Cross-taxonomy validation with the European Skills taxonomy — a completely different data source that validates skillinfer generalises beyond any single taxonomy's design.

What you'll learn

Working with binary (0/1) data instead of continuous ratings
Cross-taxonomy validation: does the method generalise?
Handling sparse, independently curated skill assignments

About ESCO¶

ESCO v1.2.1 (European Skills, Competences, Qualifications and Occupations) is curated by the European Commission. It differs from O*NET in three key ways:

	O*NET	ESCO
Source	U.S. Department of Labor surveys	EU expert panel curation
Feature type	Continuous ratings (1–5)	Binary assignments (has/doesn't have)
Scale	894 occupations x 120 features	2,999 occupations x 134 skill groups

These differences make ESCO a strong cross-taxonomy validation: if skillinfer works on both, the method generalises beyond any single taxonomy's design choices.

Step 1: Load the population¶

import numpy as np
import skillinfer

pop = skillinfer.datasets.esco()
print(pop)

Population(2999 entities x 134 skills, shrinkage=0.0211)
  Condition number: 468.3
  Effective dimensions: ~5 (90% variance)

The density of the binary matrix is ~10% — most occupations have a small fraction of the skill groups.

Step 2: Explore the covariance structure¶

for _, row in pop.top_correlations(k=5).iterrows():
    a = row["feature_a"]
    b = row["feature_b"]
    print(f"  {a:<30} <-> {b:<30}  r = {row['correlation']:+.3f}")

  assisting and caring           <-> making decisions                r = +0.771
  assisting and caring           <-> counselling                     r = +0.697
  counselling                    <-> making decisions                r = +0.646
  teaching and training          <-> applying civic skills           r = +0.551
  welfare                        <-> assisting and caring            r = +0.548

Step 3: Observe and predict¶

With binary data, observations are 0.0 or 1.0:

profile = pop.profile()

# Observe 3 skill groups
profile.observe("education", 1.0)
profile.observe("teaching and training", 1.0)
profile.observe("counselling", 1.0)

# Show top predicted skill groups
df = profile.predict()
df_sorted = df.sort_values("mean", ascending=False).head(10)
for _, row in df_sorted.iterrows():
    observed = " ← observed" if row["std"] < 0.01 else ""
    print(f"  {row['feature']:<45} mean={row['mean']:.3f} ± {row['std']:.3f}{observed}")

Even with binary data, the covariance structure captures meaningful skill relationships — observing education-related skills increases predictions for related groups like welfare and social sciences.

Step 4: Validate transfer helps¶

results = skillinfer.validation.held_out_evaluation(
    pop, frac_observed=[0.1, 0.3, 0.5], n_splits=10, obs_noise=0.1
)
summary = results.groupby(["frac_observed", "method"])["cosine_similarity"].mean()
print(summary)

The Kalman filter outperforms the diagonal baseline on ESCO just as it does on O*NET, despite the fundamentally different data characteristics (binary vs. continuous, EU curation vs. U.S. surveys).

Full example¶

See examples/esco.py for the complete script including hierarchy traversal and detailed validation.

Key takeaway¶

skillinfer works across different data types (binary and continuous), different curation methodologies (expert panels and surveys), and different scales (134 and 120 features). The covariance transfer mechanism is robust to these variations — it's a property of the math, not the data format.