Constraints/distributions used in synthesis manifest
Constraint or distribution ID
CORR_OCC_EMPLOYMENT
D01
D12
EMPLOYMENT_CODE_LINK
FIRM
GUIDE
HARD
HARMONIZATION
HH_CHILD_ADULT
HH_COUPLE_TWO_ADULTS
HH_SINGLE_SIZE_ONE
HH_SIZE_PLAUSIBLE
HMN_AGE_RANGE
HMN_BIRTH_DATE
HMN_BUILDING_SCHEMA
HMN_DWELLING_BUILDING_REF
HMN_DWELLING_SCHEMA
HMN_EDUCATION
HMN_EDUCATION_AGE
HMN_EDUCATION_GROUP
HMN_EMPLOYMENT
HMN_HOUSEHOLD_DWELLING_REF
HMN_HOUSEHOLD_SCHEMA
HMN_HOUSEHOLD_TYPE
HMN_INDUSTRY
HMN_MARITAL
HMN_OCCUPATION
HMN_ORIGIN
HMN_PERSON_HOUSEHOLD_REF
HMN_PERSON_SCHEMA
HMN_RETIRED_AGE
HMN_SEX
INFORMATIONAL
MODEL_FALLBACK_RATE
MODEL_REGISTRY_PROFILE
SPATIAL
SPT_BUILDING_COORDS
SPT_DWELLING_BUILDING_REF
SPT_DWELLING_HOUSEHOLD_REF
SPT_HH_DWELLING_REF
SPT_PERSON_HOUSEHOLD_REF
STRUCTURAL
XCN_COMPARABILITY
model_notes.md
# SI population review bundle — cycle 1
Run ID: `si_population_review_cycle1_794aa0a6_seed420987`
Bundle path: `/home/synthestat/output/runs/SI/si_population_review_cycle1_794aa0a6_seed420987`
Created at: 2026-05-19T17:06:42Z
Release mode: internal research review.
## What this bundle is
This is the best current Slovenia review bundle that can be produced from the existing Synthestat source/code layer without fabricating precision. It packages the seeded Slovenia population slice: 8 synthetic persons in 8 households, linked to 8 inferred/seeded dwellings and 3 GURS-style seeded buildings across 2 naselje-style test zones (`NAS_SI_TEST_001`, `NAS_SI_TEST_002`).
## HARD residual status
HARD constraints: PASS exact; no HARD residual rows failed.
Validation summary: {'pass': 68, 'warn': 4, 'skip': 2} across 74 rows. Warning rows are preserved in `constraint_residuals.json`; no constraint relaxation was performed for this review bundle.
## Uncertainty and modelled layers
Uncertainty/provenance are first-class outputs. Registry/modelled/transfer inputs are listed in `distribution_diagnostics.json`, `uncertainty_summary.json`, and `source_provenance.json`. `source_provenance.json` now emits entry-level source_id, structured `source` objects, retrieved_at, reference_period, checksum/schema checksum, licence, structured `geography` objects, and quality-flag metadata; unavailable values remain explicit structured unavailable objects because no live SI retrieval artifact is integrated. Hidden populations are explicitly unavailable because the current SI path lacks separate uncertainty-aware small-area sources. Work/school/facility assignments are also unavailable; the bundle does not infer them from weak evidence.
## Constraint/status metadata fixes in this cycle
- Evidence-depth labels in emitted bundle tables and `geography_quality_tiers.json` use the canonical vocabulary `measured`, `constrained`, `modelled`, or `unavailable`; prior `partially_constrained` detail is retained only in `evidence_depth_detail`/original-count metadata.
- `synthetic_dwellings.household_id` is populated from household dwelling links and `occupancy_link_status` records the backref source, so downstream validators do not read occupied dwellings as all vacant.
- SOFT registry evidence is separated from residual constraints in `constraint_residuals.json`: current SI SOFT entries are registry evidence/priors unless emitted as validation residual rows.
## Privacy and release-risk note
This run is internal-only and fixture-sized, but fine geography plus tiny seeded zones can create unique, fixture-like records if misread as real microdata. Do not release this bundle externally or expose row-level records as representative Slovenia people/households. Any broader release needs k-anonymity/uniqueness checks, geography coarsening or suppression for risky cells, clear synthetic-data labelling, and confirmation that building/dwelling assignments are not presented as measured residence locations unless supported by licensed register evidence.
## Quality caveats for reviewer
- Scope is seeded/internal, not nationwide Slovenia 1:1 synthesis.
- Current finest supported geography is seeded naselje-style test zones, not all Slovenian settlements.
- Building/dwelling realism is seeded GURS-style fixture plus dwelling inference, not live national building-register assignment.
- Occupation/industry at fine geography are modelled; ISCO-3 is unavailable and flagged as `fallback_1digit`.
- No live SURS retrieval adapter is implemented yet; current bundle relies on existing seeded/manual catalogue artifacts.
## Expected routing
The bundle is contract-complete for synth-reviewer re-inspection after the cycle-1 metadata/coherence fixes. Remaining non-mechanical improvement requires upstream SI source research/download integration for live SURS/GURS, hidden populations, assignments, and stronger low-confidence GUIDE distributions.
build_manifest.json
{
"assignment_scope": {
"dwelling_building": "available_seeded",
"facility": "unavailable",
"school": "unavailable",
"work": "unavailable"
},
"classification_crosswalk_versions": {
"education": "ISCED-2011 seeded mapping",
"industry": "NACE Rev.2 seeded/modelled mapping",
"occupation": "ISCO-08 seeded/modelled fallback to 1 digit"
},
"constraints_relaxed": [],
"constraints_used": [
"CORR_OCC_EMPLOYMENT",
"D01",
"D12",
"EMPLOYMENT_CODE_LINK",
"FIRM",
"GUIDE",
"HARD",
"HARMONIZATION",
"HH_CHILD_ADULT",
"HH_COUPLE_TWO_ADULTS",
"HH_SINGLE_SIZE_ONE",
"HH_SIZE_PLAUSIBLE",
"HMN_AGE_RANGE",
"HMN_BIRTH_DATE",
"HMN_BUILDING_SCHEMA",
"HMN_DWELLING_BUILDING_REF",
"HMN_DWELLING_SCHEMA",
"HMN_EDUCATION",
"HMN_EDUCATION_AGE",
"HMN_EDUCATION_GROUP",
"HMN_EMPLOYMENT",
"HMN_HOUSEHOLD_DWELLING_REF",
"HMN_HOUSEHOLD_SCHEMA",
"HMN_HOUSEHOLD_TYPE",
"HMN_INDUSTRY",
"HMN_MARITAL",
"HMN_OCCUPATION",
"HMN_ORIGIN",
"HMN_PERSON_HOUSEHOLD_REF",
"HMN_PERSON_SCHEMA",
"HMN_RETIRED_AGE",
"HMN_SEX",
"INFORMATIONAL",
"MODEL_FALLBACK_RATE",
"MODEL_REGISTRY_PROFILE",
"SPATIAL",
"SPT_BUILDING_COORDS",
"SPT_DWELLING_BUILDING_REF",
"SPT_DWELLING_HOUSEHOLD_REF",
"SPT_HH_DWELLING_REF",
"SPT_PERSON_HOUSEHOLD_REF",
"STRUCTURAL",
"XCN_COMPARABILITY"
],
"contract_files": [
"synthetic_persons.parquet",
"synthetic_households.parquet",
"synthetic_dwellings.parquet",
"synthetic_building_assignments.parquet",
"hidden_population_overlays.unavailable.json",
"work_school_assignments.unavailable.json",
"build_manifest.json",
"constraint_residuals.json",
"distribution_diagnostics.json",
"household_diagnostics.json",
"dwelling_building_diagnostics.json",
"assignment_diagnostics.json",
"geography_quality_tiers.json",
"uncertainty_summary.json",
"source_provenance.json",
"model_notes.md",
"unavailable.json"
],
"country": "SI",
"created_at": "2026-05-19T17:06:42Z",
"geography_version": {
"seeded_test_zones": [
"NAS_SI_TEST_001",
"NAS_SI_TEST_002"
],
"target": "SI_NASELJE_SEEDED_REVIEW"
},
"git_commit": "a5ad12d74bcf64a2c256e1fe83d99cc700e02bba-dirty",
"git_dirty": true,
"hard_constraint_status": "pass_exact",
"hidden_population_scope": {
"homelessness": {
"reason": "No Slovenia naselje/municipality-level measured homelessness distribution with uncertainty bounds is integrated in the current source layer.",
"status": "unavailable"
},
"institutional_populations": {
"reason": "No institution/person group-quarter layer integrated for SI in current seeded path.",
"status": "unavailable"
},
"refugees_asylum_seekers": {
"reason": "No integrated SI age/sex/household/small-area refugee/asylum distribution with uncertainty bounds is available in current bundle inputs.",
"status": "unavailable"
},
"students": {
"reason": "Education status exists only as modelled/constrained person attribute; no school enrolment/location overlay or assignment layer is available in the current bundle.",
"status": "unavailable_overlay"
},
"syrian_refugees": {
"reason": "No SI-specific measured small-area source with bounds is integrated; a model-only overlay would violate uncertainty guardrails.",
"status": "unavailable"
},
"ukrainian_displaced_people": {
"reason": "Policy-relevant group, but no separate uncertainty-aware Slovenia small-area overlay source is wired into the current seeded synthesis path.",
"status": "unavailable"
},
"undocumented_seasonal_populations": {
"reason": "No measured Slovenia distribution with uncertainty bounds in current repo inputs.",
"status": "unavailable"
}
},
"known_limitations": [
"Small seeded SI review slice only: 2 naselje-style test zones, 8 persons/households; not nationwide 1:1 Slovenia synthesis.",
"No live SURS retrieval adapter is implemented yet for SI Task 01; current bundle relies on seeded/manual source layer.",
"Buildings are GURS-style seeded fixtures; dwellings may be inferred and are not full national register integration.",
"Hidden populations and work/school assignments unavailable rather than modelled without bounds.",
"Fine occupation/industry at naselje geography are modelled/partially pooled; ISCO-3 unavailable and flagged as fallback_1digit."
],
"population_counts": {
"buildings": 3,
"dwellings": 8,
"households": 8,
"persons": 8
},
"project_root": "/home/synthestat",
"random_seed": 420987,
"release_mode": "internal_research_review",
"run_id": "si_population_review_cycle1_794aa0a6_seed420987",
"source_catalogue_version": {
"readiness_status": "pass",
"registry": "output/catalogue/distribution_registry_SI.json",
"source_inventory_report": "output/SI/source_inventory_report.json"
},
"zones_degraded": []
}
{
"best_distribution_sources": {
"D01_demographics_finest": "SI_SURS_population",
"D05_education": "SI_SURS_education",
"D12_household_type": "SI_SURS_households",
"building_stock": "SI_GURS_seeded_buildings",
"employment_occupation_industry": "SI_SURS_employment",
"geography_boundaries": "SI_SURS_boundaries",
"income": "SI_SURS_income"
},
"catalogue_sources": {
"coverage": "output/catalogue/distribution_coverage_SI.json",
"readiness": "output/catalogue/distribution_readiness_SI.json",
"registry": "output/catalogue/distribution_registry_SI.json"
},
"checksums": {
"note": "Each registry entry carries data checksum status and schema checksum status.",
"status": "entry_level_explicit"
},
"country": "SI",
"created_at": "2026-05-19T17:06:42Z",
"geography_levels": [
"NUTS-1",
"NUTS-2",
"NUTS-3",
"national",
"unknown"
],
"licence_terms": {
"note": "Each registry entry carries licence; unavailable values are represented as structured unavailable objects.",
"status": "entry_level_explicit"
},
"live_download": {
"enabled": false,
"path": null,
"summary": null
},
"live_probe": {
"enabled": false,
"path": null,
"summary": null
},
"manual_sources": [
"SI_SURS_population",
"SI_SURS_households",
"SI_SURS_education",
"SI_SURS_employment",
"SI_SURS_income",
"SI_SURS_boundaries",
"SI_GURS_seeded_buildings",
"SI_ADMIN_address_context"
],
"quality_flags": {
"readiness_status": "pass",
"source_gaps": [
"No live SURS retrieval adapter is implemented yet for SI Task 01.",
"Building integration is still fixture-backed and requires dwelling inference.",
"Current Slovenia execution is an uncertainty-aware seeded slice, not a production national extraction path."
],
"warning_issues": []
},
"reference_periods": {
"note": "Each registry entry carries reference_period; unavailable values are represented as structured unavailable objects.",
"status": "entry_level_explicit"
},
"registry_entries": [
{
"catalogue_id": "literature:de-c01_education_occupation_coupling__transfer_from_de",
"checksum": {
"algorithm": "sha256",
"path": "data/literature/seeded_occupation_priors.yaml",
"status": "available",
"value": "c8de6a03dea87665f5f0a7beb65a8d7ea0466dd593a84b8f8b8914d49a890239"
},
"confidence": 0.6,
"constraint_type": "GUIDE",
"country": "SI",
"data_uri": "data/literature/seeded_occupation_priors.yaml",
"dataset_variant": "comparable_country",
"evidence_quality": "academic_literature",
"finest_geography_status": "modelled",
"geo_level": "national",
"geo_version": "EU_NUTS_2021",
"geography": {
"country": "SI",
"finest_geography_status": "modelled",
"geo_level": "national",
"geo_version": "EU_NUTS_2021",
"region_id": {
"reason": "registry entry is not scoped to a specific region_id",
"status": "unavailable"
},
"source_fields": [
"country",
"geo_level",
"geo_version",
"region_id",
"finest_geography_status"
],
"status": "available",
"zone_scope": "countrywide_or_unspecified_region"
},
"licence": {
"reason": "licence/terms not encoded in current SI seeded registry/source inventory",
"status": "unavailable"
},
"pooling_level": "comparable_country",
"priority_weight": "low",
"provenance_status": "seeded_metadata_complete_with_explicit_geography_source_and_unavailable_fields",
"quality_flag": {
"confidence": 0.6,
"constraint_type": "GUIDE",
"dataset_variant": "comparable_country",
"evidence_quality": "academic_literature",
"finest_geography_status": "modelled"
},
"reference_period": {
"reason": "reference period not present in current SI seeded registry entry",
"status": "unavailable"
},
"region_id": null,
"retrieved_at": {
"reason": "no live SI retrieval/download artifact is integrated for this seeded bundle",
"status": "unavailable"
},
"schema_checksum": {
"algorithm": "sha256",
"status": "available",
"value": "37dd178fbfc096717d310f69212adab59f44c786ab0a7ba3a619cd14a1ad25a8"
},
"schema_hash": "37dd178fbfc096717d310f69212adab59f44c786ab0a7ba3a619cd14a1ad25a8",
"source": {
"best_distribution_mapping_keys": [],
"catalogue_id": "literature:de-c01_education_occupation_coupling__transfer_from_de",
"data_uri": "data/literature/seeded_occupation_priors.yaml",
"ingestion_mode": "seeded_manual_catalogue",
"manual_source_catalogue_membership": false,
"provider": "seeded comparable-country literature prior",
"source_family": "academic_literature_transfer_prior",
"source_id": "literature:de-c01_education_occupation_coupling__transfer_from_de",
"source_inventory_report": "output/SI/source_inventory_report.json",
"source_record_id": "literature:de-c01_education_occupation_coupling__transfer_from_de",
"status": "available"
},
"source_id": "literature:de-c01_education_occupation_coupling__transfer_from_de",
"spec_id": "C01_education_occupation_coupling",
"spec_label": "Education-occupation coupling strength",
"uncertainty": {
"bounds_uri": null,
"credible_level": 0.9,
"mean_cell_cv": 0.2,
"method": "literature_regression"
}
},
{
"catalogue_id": "literature:de-c02_assortative_mating_education__transfer_from_de",
"checksum": {
"algorithm": "sha256",
"path": "data/literature/seeded_occupation_priors.yaml",
"status": "available",
"value": "c8de6a03dea87665f5f0a7beb65a8d7ea0466dd593a84b8f8b8914d49a890239"
},
"confidence": 0.61,
"constraint_type": "GUIDE",
"country": "SI",
"data_uri": "data/literature/seeded_occupation_priors.yaml",
"dataset_variant": "comparable_country",
"evidence_quality": "academic_literature",
"finest_geography_status": "modelled",
"geo_level": "NUTS-1",
"geo_version": "EU_NUTS_2021",
"geography": {
"country": "SI",
"finest_geography_status": "modelled",
"geo_level": "NUTS-1",
"geo_version": "EU_NUTS_2021",
"region_id": {
"reason": "registry entry is not scoped to a specific region_id",
"status": "unavailable"
},
"source_fields": [
"country",
"geo_level",
"geo_version",
"region_id",
"finest_geography_status"
],
"status": "available",
"zone_scope": "countrywide_or_unspecified_region"
},
"licence": {
"reason": "licence/terms not encoded in current SI seeded registry/source inventory",
"status": "unavailable"
},
"pooling_level": "comparable_country",
"priority_weight": "low",
"provenance_status": "seeded_metadata_complete_with_explicit_geography_source_and_unavailable_fields",
"quality_flag": {
"confidence": 0.61,
"constraint_type": "GUIDE",
"dataset_variant": "comparable_country",
"evidence_quality": "academic_literature",
"finest_geography_status": "modelled"
},
"reference_period": {
"reason": "reference period not present in current SI seeded registry entry",
"status": "unavailable"
},
"region_id": null,
"retrieved_at": {
"reason": "no live SI retrieval/download artifact is integrated for this seeded bundle",
"status": "unavailable"
},
"schema_checksum": {
"algorithm": "sha256",
"status": "available",
"value": "eaee1a327700e2d610d66250874c4a02384c91acf0978dc0da7068f160c72ccb"
},
"schema_hash": "eaee1a327700e2d610d66250874c4a02384c91acf0978dc0da7068f160c72ccb",
"source": {
"best_distribution_mapping_keys": [],
"catalogue_id": "literature:de-c02_assortative_mating_education__transfer_from_de",
"data_uri": "data/literature/seeded_occupation_priors.yaml",
"ingestion_mode": "seeded_manual_catalogue",
"manual_source_catalogue_membership": false,
"provider": "seeded comparable-country literature prior",
"source_family": "academic_literature_transfer_prior",
"source_id": "literature:de-c02_assortative_mating_education__transfer_from_de",
"source_inventory_report": "output/SI/source_inventory_report.json",
"source_record_id": "literature:de-c02_assortative_mating_education__transfer_from_de",
"status": "available"
},
"source_id": "literature:de-c02_assortative_mating_education__transfer_from_de",
"spec_id": "C02_assortative_mating_education",
"spec_label": "Assortative mating by education",
"uncertainty": {
"bounds_uri": null,
"credible_level": 0.9,
"mean_cell_cv": 0.19,
"method": "literature_transition"
}
},
{
"catalogue_id": "literature:de-c03_assortative_mating_age__transfer_from_de",
"checksum": {
"algorithm": "sha256",
"path": "data/literature/seeded_occupation_priors.yaml",
"status": "available",
"value": "c8de6a03dea87665f5f0a7beb65a8d7ea0466dd593a84b8f8b8914d49a890239"
},
"confidence": 0.68,
"constraint_type": "GUIDE",
"country": "SI",
"data_uri": "data/literature/seeded_occupation_priors.yaml",
"dataset_variant": "comparable_country",
"evidence_quality": "academic_literature",
"finest_geography_status": "modelled",
"geo_level": "NUTS-1",
"geo_version": "EU_NUTS_2021",
"geography": {
"country": "SI",
"finest_geography_status": "modelled",
"geo_level": "NUTS-1",
"geo_version": "EU_NUTS_2021",
"region_id": {
"reason": "registry entry is not scoped to a specific region_id",
"status": "unavailable"
},
"source_fields": [
"country",
"geo_level",
"geo_version",
"region_id",
"finest_geography_status"
],
"status": "available",
"zone_scope": "countrywide_or_unspecified_region"
},
"licence": {
"reason": "licence/terms not encoded in current SI seeded registry/source inventory",
"status": "unavailable"
},
"pooling_level": "comparable_country",
"priority_weight": "low",
"provenance_status": "seeded_metadata_complete_with_explicit_geography_source_and_unavailable_fields",
"quality_flag": {
"confidence": 0.68,
"constraint_type": "GUIDE",
"dataset_variant": "comparable_country",
"evidence_quality": "academic_literature",
"finest_geography_status": "modelled"
},
"reference_period": {
"reason": "reference period not present in current SI seeded registry entry",
"status": "unavailable"
},
"region_id": null,
"retrieved_at": {
"reason": "no live SI retrieval/download artifact is integrated for this seeded bundle",
"status": "unavailable"
},
"schema_checksum": {
"algorithm": "sha256",
"status": "available",
"value": "3eb83fc2d668330c0741a9ecbc79396899723ab82c36e6ea65145422ab21298e"
},
"schema_hash": "3eb83fc2d668330c0741a9ecbc79396899723ab82c36e6ea65145422ab21298e",
"source": {
"best_distribution_mapping_keys": [],
"catalogue_id": "literature:de-c03_assortative_mating_age__transfer_from_de",
"data_uri": "data/literature/seeded_occupation_priors.yaml",
"ingestion_mode
… truncated after 12,000 characters …
unavailable.json
{
"categories": {
"homelessness": {
"reason": "No Slovenia naselje/municipality-level measured homelessness distribution with uncertainty bounds is integrated in the current source layer.",
"status": "unavailable"
},
"institutional_populations": {
"reason": "No institution/person group-quarter layer integrated for SI in current seeded path.",
"status": "unavailable"
},
"refugees_asylum_seekers": {
"reason": "No integrated SI age/sex/household/small-area refugee/asylum distribution with uncertainty bounds is available in current bundle inputs.",
"status": "unavailable"
},
"students": {
"reason": "Education status exists only as modelled/constrained person attribute; no school enrolment/location overlay or assignment layer is available in the current bundle.",
"status": "unavailable_overlay"
},
"syrian_refugees": {
"reason": "No SI-specific measured small-area source with bounds is integrated; a model-only overlay would violate uncertainty guardrails.",
"status": "unavailable"
},
"ukrainian_displaced_people": {
"reason": "Policy-relevant group, but no separate uncertainty-aware Slovenia small-area overlay source is wired into the current seeded synthesis path.",
"status": "unavailable"
},
"undocumented_seasonal_populations": {
"reason": "No measured Slovenia distribution with uncertainty bounds in current repo inputs.",
"status": "unavailable"
}
},
"country": "SI",
"created_at": "2026-05-19T17:06:42Z",
"files": {
"hidden_population_overlays.parquet": "hidden_population_overlays.unavailable.json",
"work_school_assignments.parquet": "work_school_assignments.unavailable.json"
},
"principle": "Unavailable/weak layers are explicit and do not alter de jure/core HARD constraints.",
"run_id": "si_population_review_cycle1_794aa0a6_seed420987"
}