Body
Build the cycle-1 Synthestat population synthesis review bundle for Bulgaria (BG).
Parent manager task: t_2a07ba7d
Project root: /home/synthestat
Country: BG
Target geography: finest available official geography
Release mode: internal research review
Read before work:
- /home/synthestat/docs/SOUL.md
- /home/synthestat/docs/contracts/population_review_bundle.md
- /home/synthestat/docs/specs/research_knowledge_base.md
- relevant BG source inventory / registry / country-output artifacts already in the repo, if present
Goal:
Generate the best responsible 1:1 synthetic population review bundle for BG: persons in households, households in dwellings, dwellings in houses/buildings where available. Include hidden/weakly measured population overlays where evidence supports them: homelessness, refugees/asylum seekers, Ukrainian displaced people, Syrian refugees, undocumented/seasonal populations, students, and institutional populations. Include family composition, parent/child age gaps, school attendance, work/school assignment, and dwelling/building realism only where supported by evidence or explicitly modelled with uncertainty.
Required output contract:
Create a complete review bundle under output/runs/BG/<run_id>/ following docs/contracts/population_review_bundle.md, including at minimum:
- synthetic_persons.parquet or .csv
- synthetic_households.parquet or .csv
- synthetic_dwellings.parquet/.csv or unavailable.json with reason
- synthetic_building_assignments.parquet/.csv or unavailable.json with reason
- hidden_population_overlays.parquet/.csv or unavailable.json with reason
- work_school_assignments.parquet/.csv or unavailable.json with reason
- build_manifest.json
- constraint_residuals.json
- distribution_diagnostics.json
- household_diagnostics.json
- dwelling_building_diagnostics.json
- assignment_diagnostics.json
- geography_quality_tiers.json
- uncertainty_summary.json
- source_provenance.json
- model_notes.md
Non-negotiable guardrails:
- HARD constraints must not break. If a HARD input is unavailable, do not invent it; downgrade scope explicitly or emit invalid-output blockers.
- FIRM/SOFT/GUIDE precedence must be explicit in diagnostics.
- Model-based estimates without uncertainty bounds are invalid.
- Hidden-population overlays must not silently rewrite de jure constraints.
- Weak evidence means wider uncertainty / relaxed constraints, not fake precision.
- Every missing source, relaxed constraint, degraded zone, failed download dependency, and modelled estimate must be logged.
- Reuse existing Synthestat modules and generators; do not duplicate country-specific synthesis logic.
Allowed write paths:
- /home/synthestat/output/runs/BG/<run_id>/
- /home/synthestat/docs/wiki/outputs/ only for a short implementation note if needed
- /home/synthestat/workspace/ only for scratch/handoff notes if needed
Definition of done:
- Bundle directory exists and satisfies the population_review_bundle.md file contract, or the task blocks with a precise reason why a valid bundle cannot be produced.
- build_manifest.json records country, run_id, created_at, project_root, git commit/dirty marker, deterministic seed, source catalogue/geography/classification versions when available, constraints used/relaxed, zones degraded, hidden-population scope, assignment scope, and known limitations.
- source_provenance.json includes source IDs/URLs/retrieval timestamps/reference periods/geography levels/quality flags for every material input.
- uncertainty_summary.json covers all modelled estimates and overlays.
- model_notes.md states which quantities are measured, constrained, modelled, or unknown.
- Complete via kanban with the absolute bundle path and any blockers/relaxations in metadata; do not claim PASS yourself, the reviewer decides.