Body
Build cycle 1 review bundle for NL population synthesis QA.
Country: NL
Project root: /home/synthestat
Target geography: finest available official geography
Release mode: internal research review
Parent manager task: t_c692bc83
Read before work:
- /home/synthestat/docs/SOUL.md (if stale/wrong in repo, follow injected Synthestat constitution: uncertainty-first, HARD/FIRM/SOFT/GUIDE precedence, no silent degradation)
- /home/synthestat/docs/contracts/population_review_bundle.md
- /home/synthestat/docs/specs/research_knowledge_base.md
- /home/synthestat/docs/dashboard/infrastructure.md
- /home/synthestat/docs/tasks/infrastructure/16_cto_delivery_execution_queue.md
Goal:
Generate the best currently responsible 1:1 synthetic population review bundle for NL: persons in households; households in dwellings; dwellings in real houses/buildings where available; separate uncertainty-aware overlays for hidden/weakly measured populations where evidence supports them (homelessness, refugees/asylum seekers, Ukrainian displaced people, Syrian refugees, undocumented/seasonal populations, students, institutional populations); family composition, parent/child age gaps, school attendance, work/school assignment, and dwelling/building realism where evidence supports it.
Required output:
A review bundle under output/runs/NL/<run_id>/ matching docs/contracts/population_review_bundle.md, including at minimum:
- synthetic_persons.parquet or .csv
- synthetic_households.parquet or .csv
- synthetic_dwellings.parquet/csv OR unavailable.json with reason
- synthetic_building_assignments.parquet/csv OR unavailable.json with reason
- hidden_population_overlays.parquet/csv OR unavailable.json with reason
- work_school_assignments.parquet/csv OR unavailable.json with reason
- build_manifest.json
- constraint_residuals.json
- distribution_diagnostics.json
- household_diagnostics.json
- dwelling_building_diagnostics.json
- assignment_diagnostics.json
- geography_quality_tiers.json
- uncertainty_summary.json
- source_provenance.json
- model_notes.md
Non-negotiables:
- HARD constraints must not break. If the code cannot satisfy a declared HARD constraint, stop and make the violation explicit rather than hiding it.
- FIRM/SOFT/GUIDE semantics must be explicit; GUIDE sources shape priors only.
- Hidden-population overlays must not silently rewrite de jure official constraints; represent them as overlays with separate evidence status and uncertainty.
- Every missing source, relaxed constraint, degraded zone, failed download/input, and modelled estimate must be explicit in manifest/diagnostics/model_notes.
- Occupation/industry/fine variables at sparse geographies are model-driven unless measured and must be flagged with uncertainty.
- Reuse existing project modules and algorithms; do not fork country-specific synthesis logic.
Allowed write paths:
- output/runs/NL/<run_id>/
- docs/wiki/outputs/ for a short implementation note if useful
- temporary files under /tmp or project-local scratch paths only if cleaned or documented
Definition of done:
- Bundle exists at output/runs/NL/<run_id>/ and contains all required files or explicit unavailable.json substitutes where allowed by contract.
- build_manifest.json includes required fields: country, run_id, created_at, project_root, git_commit/dirty marker, random_seed, source catalogue/geography/classification versions, constraints used/relaxed, degraded zones, hidden population scope, assignment scope, known limitations.
- Diagnostics include uncertainty bounds and residuals, not just point estimates.
- Handoff metadata names bundle_path, run_id, tests/commands run, limitations, and any reviewer attention flags.