Body
Build the cycle-1 Synthestat population synthesis review bundle for Kosovo (XK).
Project root: /home/synthestat
Country: XK
Target geography: finest available official geography; seeded slice is acceptable only if explicitly labelled and uncertainty/degradation are visible.
Release mode: internal research review
Contract: /home/synthestat/docs/contracts/population_review_bundle.md
Goal:
Generate the best currently possible 1:1 synthetic population review bundle for XK: persons in households, households in dwellings, dwellings in houses/buildings where available. Include separate uncertainty-aware overlays where evidence supports them for homelessness, refugees/asylum seekers, Ukrainian displaced people, Syrian refugees, undocumented/seasonal populations, students, institutional populations. Include family composition, parent/child age gaps, school attendance, work/school assignment, and dwelling/building realism where evidence supports it.
Required context to read before modelling:
- /home/synthestat/docs/SOUL.md
- /home/synthestat/docs/contracts/population_review_bundle.md
- /home/synthestat/docs/dashboard/countries/XK.md
- /home/synthestat/config/production/countries/XK.yaml
- /home/synthestat/docs/tasks/learnings/2026-04-11_XK_cross_border_seeded_review.md if relevant
- Existing XK artifacts under /home/synthestat/output/XK/
Allowed write paths:
- /home/synthestat/output/runs/XK/<deterministic_run_id>/
- /home/synthestat/workspace/manager_handoffs/modeller/ for a concise build handoff and missing-requirements note
- Optional durable memo under /home/synthestat/docs/wiki/outputs/ if it materially helps review
Required bundle files, exactly following the contract where possible:
- synthetic_persons.parquet or .csv
- synthetic_households.parquet or .csv
- synthetic_dwellings.parquet or .csv, or unavailable.json with reason
- synthetic_building_assignments.parquet or .csv, or unavailable.json with reason
- hidden_population_overlays.parquet or .csv, or unavailable.json with reason
- work_school_assignments.parquet or .csv, or unavailable.json with reason
- build_manifest.json
- constraint_residuals.json
- distribution_diagnostics.json
- household_diagnostics.json
- dwelling_building_diagnostics.json
- assignment_diagnostics.json
- geography_quality_tiers.json
- uncertainty_summary.json
- source_provenance.json
- model_notes.md
Non-negotiables:
- HARD constraints must not break.
- Weak Kosovo evidence means relaxed constraints / wider uncertainty / explicit C-tier or degraded zones, never fake precision.
- Hidden-population overlays must not silently rewrite de jure constraints.
- Fine geography, occupation, industry, school/work assignment, and hidden-population quantities are model-driven unless directly measured; flag them as modelled or unavailable.
- Record source IDs/URLs, retrieval timestamps where available, geography levels, reference periods, quality flags, relaxed constraints, missing data, and failed/degraded layers.
- Do not promote quarantined other_branch artifacts or live-source claims unless they already satisfy provenance/metadata gates; if useful but not validated, mention as blocked/unavailable evidence.
Definition of done:
- A contract-complete review bundle exists under /home/synthestat/output/runs/XK/<run_id>/.
- build_manifest.json includes country, run_id, created_at, project_root, git commit/dirty marker, random_seed, source catalogue/geography/crosswalk versions, constraints used/relaxed, zones degraded, hidden-population scope, assignment scope, and known limitations.
- The handoff names the bundle path, run_id, counts, constraints status, unavailable layers, and exact review focus for synth-reviewer.