← Back to SE country layer · Country index

SE population QA cycle 2 reviewer: inspect non-toy/source-upgraded bundle

done synth-reviewer

Task metadata

idt_66bdf062
titleSE population QA cycle 2 reviewer: inspect non-toy/source-upgraded bundle
assigneesynth-reviewer
statusdone
tenantsynthestat
priority105
workspace_kinddir
workspace_path/home/synthestat
created_bysynth-manager
created_at2026-05-19 20:00:51 CEST
started_at2026-05-19 21:01:11 CEST
completed_at2026-05-19 21:10:34 CEST

Latest summary

Reviewed SE cycle-2 bundle at /home/synthestat/output/runs/SE/se_population_review_cycle2_3a9d999a_seed420987 and returned NEEDS_MODEL_FIX. Cycle 2 materially improves scope/provenance over the 8-person fixture and passes selected-zone age-sex HARD residuals, but invalid child-only households, missing FIRM/SOFT residual reporting, inadequate modelled-attribute uncertainty, and overoptimistic geography/layer quality metadata block PASS.

Body

Country: SE (Sweden)
Project root / allowed write root: /home/synthestat
Parent manager task: t_0c611b3b
Depends on modeler rerun task: t_29a0c9c4

Mission:
Review the SE population QA cycle-2 bundle produced by t_29a0c9c4. Do not repair or regenerate the bundle yourself. Produce a routeable QA verdict for the manager.

Mandatory context to read before review:
- Parent modeler handoff from t_29a0c9c4, especially bundle_path and run_id.
- Prior reviewer handoff t_33ff07f7 and its NEEDS_MORE_SOURCES findings.
- Local methodology ingest: /home/synthestat/workspace/manager_handoffs/SE_other_synthesis_ingest.md
- Source/downloader handoff t_1bbf9f63.
- /home/synthestat/docs/contracts/population_review_bundle.md
- /home/synthestat/docs/SOUL.md (if stale/misaligned, use injected Synthestat SOUL rules: uncertainty first-class, HARD/FIRM/SOFT/GUIDE precedence, no silent degradation, provenance required)
- /home/synthestat/docs/specs/research_knowledge_base.md

Review requirements:
1. Validate bundle completeness against docs/contracts/population_review_bundle.md.
2. Check HARD constraints are never broken; FIRM/SOFT residuals have declared tolerances/reasons; GUIDE sources are not represented as measurements.
3. Check every modelled/weakly measured estimate has uncertainty bounds and provenance/quality flags.
4. Check hidden-population overlays remain separate, uncertainty-aware, and do not silently rewrite de jure constraints.
5. Check fine-geography sparse attributes and school/work/building/hidden layers are labelled measured/constrained/modelled/unknown.
6. Check source_provenance includes source IDs, retrieval timestamps where available, geography levels, reference periods, and quality flags; note any licensed/proxy/scaffold inherited layers.
7. Check geography_quality_tiers/model_notes honestly label sliced/test/degraded zones, missing data, unavailable components, and limitations.
8. Run lightweight deterministic checks where possible.
9. Explicitly compare against cycle-1 findings. This is the second SE review cycle after a source-acquisition branch; if findings are materially similar without concrete new source/model path, recommend human review per loop guard.

Required completion metadata:
- verdict: exactly PASS, NEEDS_MODEL_FIX, NEEDS_MORE_SOURCES, BLOCKED_INVALID_OUTPUT, EVIDENCE_EXHAUSTED_HUMAN_REVIEW, or MODEL_IMPROVEMENT_EXHAUSTED_HUMAN_REVIEW
- bundle_path
- run_id
- blocking_findings
- non_blocking_findings
- recommended_next_branch
- checks_run
- cycle_comparison_to_t_33ff07f7

Verdict guidance:
- PASS only if satisfactory for declared country tier/release mode with limitations explicit.
- NEEDS_MODEL_FIX if sources are adequate but model/build logic can concretely improve artifacts/residuals/metadata.
- NEEDS_MORE_SOURCES only if concrete additional source acquisition is likely available and not just repetition of cycle 1.
- Exhaustion verdicts if source/model progress plateaued or evidence cannot responsibly support further improvement.
- BLOCKED_INVALID_OUTPUT for missing/invalid bundle, hard violations, absent uncertainty for modelled estimates, or unusable provenance.

Parents

[
  "t_29a0c9c4"
]

Children

[]

Runs

IDProfileStatusOutcomeStartedEndedSummary/error
146synth-reviewercrashedcrashed2026-05-19 21:01:11 CEST2026-05-19 21:06:37 CESTpid 1703786 not alive
151synth-reviewerdonecompleted2026-05-19 21:06:37 CEST2026-05-19 21:10:34 CESTReviewed SE cycle-2 bundle at /home/synthestat/output/runs/SE/se_population_review_cycle2_3a9d999a_seed420987 and returned NEEDS_MODEL_FIX. Cycle 2 materially improves scope/provenance over the 8-person fixture and passes selected-zone age-sex HARD residuals, but invalid child-only households, missing FIRM/SOFT residual reporting, inadequate modelled-attribute uncertainty, and overoptimistic geography/layer quality metadata block PASS.

Events

TimeKindPayload
2026-05-19 20:00:51 CESTcreated{ "assignee": "synth-reviewer", "status": "todo", "parents": [ "t_29a0c9c4" ], "tenant": "synthestat", "skills": [ "kanban-worker" ] }
2026-05-19 21:00:45 CESTpromotednull
2026-05-19 21:01:11 CESTclaimed{ "lock": "vmi3188806:1590352", "expires": 1779218171, "run_id": 146 }
2026-05-19 21:01:11 CESTspawned{ "pid": 1703786 }
2026-05-19 21:06:37 CESTcrashed{ "pid": 1703786, "claimer": "vmi3188806:1590352" }
2026-05-19 21:06:37 CESTclaimed{ "lock": "vmi3188806:1706016", "expires": 1779218497, "run_id": 151 }
2026-05-19 21:06:37 CESTspawned{ "pid": 1706049 }
2026-05-19 21:10:12 CESTcommented{ "author": "synth-reviewer", "len": 10935 }
2026-05-19 21:10:34 CESTcompleted{ "result_len": 0, "summary": "Reviewed SE cycle-2 bundle at /home/synthestat/output/runs/SE/se_population_review_cycle2_3a9d999a_seed420987 and returned NEEDS_MODEL_FIX. Cycle 2 materially improves scope/provenance over the 8-person fixture and passes selected-zone age-sex HARD residuals, but invalid child-only households, missing FIRM/SOFT residual reporting, inadequate modelled-attribute uncertainty, and overoptimistic geogr" }

Comments

2026-05-19 21:10:12 CEST · synth-reviewer
QA report — SE population QA cycle 2 reviewer

country: SE
run_id: se_population_review_cycle2_3a9d999a_seed420987
artifact_reviewed: /home/synthestat/output/runs/SE/se_population_review_cycle2_3a9d999a_seed420987
verdict: NEEDS_MODEL_FIX
confidence_in_verdict: high
summary: Cycle 2 is a material improvement over cycle 1: it is contract-complete and has moved from an 8-person/two-test-zone fixture to a 91,030-person, 43,739-household, 42-DeSO stratified slice with exact selected-zone SCB age-sex HARD residuals. It still cannot PASS because household/family construction is structurally invalid, FIRM/SOFT residuals are not exercised despite newly frozen source data, modelled attributes do not have attribute-specific uncertainty bounds, and unavailable building/hidden/work-school layers remain correctly explicit but unresolved.
constraint_fit:
  hard: PASS for declared selected-zone age-sex scope. constraint_residuals.json reports hard_constraint_status=pass_exact, 84/84 HARD residual rows pass, selected-zone official target population 91,030 equals synthetic person count 91,030, residual 0. Independent parquet count check found 91,030 persons and 42 selected DeSO zones in diagnostics.
  firm: NOT ADEQUATELY EXERCISED. build_manifest lists national household-size prior and modelled/fallback metadata, but constraint_residuals.json contains only HARD rows. Frozen cycle-2 sources include DeSO household type, education, labour, housing/tenure, income, etc.; the bundle does not report FIRM residuals/tolerances for these, so fit cannot be accepted beyond age-sex.
  soft: NOT ADEQUATELY EXERCISED. Household/family realism, occupation/industry, origin/categorical attributes, and dwelling shells are labelled modelled/fallback, but no SOFT residual summaries or numeric tolerance checks are present.
household_family_checks: FAIL. Deterministic parquet check found 11,526 households where all members are children under 18, with samples such as household 240 containing only three age-2 persons all labelled child. Household type counts are implausible/internally inconsistent: HH_COUPLE_CHILDREN=10 while 23,973 persons are labelled child and 6,659 households are HH_SINGLE_PARENT; child allocation appears not linked to adult guardian/couple shells. Household sizes match reported member counts, but member composition violates the no impossible children-alone rule and household type/member composition coherence.
dwelling_building_checks: PARTIAL/UNAVAILABLE. synthetic_dwellings.parquet is present with 43,739 shell dwellings and household backlinks now consistent; building_id is null for all dwellings and synthetic_building_assignments.unavailable.json honestly states no official residential building/address/dwelling anchor is frozen. This is acceptable as explicit unavailability for a slice, but it remains non-PASS for real-house grounding.
hidden_population_checks: UNAVAILABLE BUT HONEST. hidden_population_overlays.unavailable.json states hidden overlays are unavailable and not folded into de jure private households. This preserves HARD de jure constraints, but homelessness/irregular/seasonal/student/institutional/refugee overlays remain unresolved and should not be represented as covered.
work_school_assignment_checks: UNAVAILABLE BUT HONEST. work_school_assignments.unavailable.json and assignment_diagnostics.json state OD commuters are future-prior provenance only and no individual work/school/facility assignments are emitted. This avoids hallucinated assignments, but the layer is not reviewable.
distribution_checks: FAIL FOR CURRENT MODEL PASS. distribution_diagnostics confirms only national household-size prior use plus modelled fine attributes. Source acquisition produced stronger SCB tables, including HushallDesoTyp, but the model did not convert them into reported FIRM/SOFT residuals or coherent household-family generation. Occupation/industry are fallback_1digit/not_applicable and correctly not measured.
geography_checks: MIXED. The bundle clearly declares stratified_multi_DeSO_not_full_national and 42 selected zones, which fixes the cycle-1 toy-scope issue. However geography_quality_tiers reports degraded_zone_count=0 and every zone quality_tier=B/degraded=false even though every zone lacks building assignment and non-age-sex attributes are modelled. That is less severe than cycle-1 A-tier overclaiming, but still overstates zone quality; zones with unavailable buildings/assignments and national-prior household construction should be degraded or explicitly tier-C for those layers.
uncertainty_provenance_checks: FAIL/PARTIAL. source_provenance has 20 frozen records with source IDs, retrieval timestamps, table IDs, geography levels, reference periods, checksums, source systems, and license_access_notes. Per-row provenance/fallback columns exist. But modelled attributes do not have attribute-specific numeric uncertainty bounds: synthetic_persons has uncertainty_low=uncertainty_high=1.0 for all rows while many columns are modelled; uncertainty_summary is qualitative and says wide categorical uncertainty without exposing bounds. Source records use evidence_tier/quality_flags but lack normalized candidate_use/quality_flag fields expected by the task wording.
privacy_release_checks: Fine-geography DeSO slice with unique household/person records and modelled sensitive attributes has material re-identification and misinterpretation risk. Internal review only; do not treat as anonymous or production-release safe.
critical_failures:
  - Household/family graph is structurally invalid: 11,526 child-only households with no adult/guardian, including age-2 child-only households.
  - Household type/member composition is incoherent: almost no HH_COUPLE_CHILDREN households despite 23,973 child-labelled persons and thousands of single-parent labels.
  - FIRM/SOFT residuals are not reported for available newly frozen SCB household/person attribute sources; only age-sex HARD residuals are exercised.
  - Modelled attributes lack attribute-specific uncertainty bounds; constant person-level uncertainty_low/high=1.0 is misleading for modelled origin/education/labour/income/occupation/industry fields.
  - Geography quality still overclaims: degraded_zone_count=0 and zone degraded=false despite unavailable buildings/assignments and national-prior household construction for every selected zone.
model_fix_requests:
  - Rebuild household generation so children are placed only with adult guardians/parents or explicitly sourced institutional/exceptional placements; enforce household_type/member-role coherence and parent/guardian age-gap checks.
  - Consume frozen HushallDesoTyp and other SCB cycle-2 tables as FIRM/SOFT constraints where appropriate, and emit residual rows with tolerances/reasons for household type, education/labour/housing/income/origin where claimed.
  - Replace constant per-person uncertainty_low/high=1.0 with attribute-level uncertainty/status fields or a diagnostics table covering each modelled attribute/zone; do not imply exact certainty for modelled categorical assignments.
  - Mark zones/layers degraded honestly: at minimum layer-specific tier C for building/work-school/hidden and household-family layers when only national priors or unavailable anchors are used.
  - Keep unavailable building/hidden/work-school artifacts explicit unless approved source anchors exist; do not silently infer them.
source_gap_requests:
  marginals:
    - Residential building/address/dwelling anchors remain blocked on Lantmäteriet/contract credentials or a human-approved proxy/scaffold decision.
    - Hidden-population overlays remain evidence-exhausted/partial for DeSO-ready homelessness, undocumented, seasonal, institutional, student-dormitory and refugee/Ukrainian/Syrian resident-stock semantics.
    - Current workplace/school destination evidence remains insufficient for individual assignment; OD source is stale/GUIDE for assignments.
  distributions:
    - The claimed Sweden household-composition prior bundle remains absent and should be rebuilt/mirrored if the model needs richer parent-child/couple/guardian priors beyond SCB household tables.
stopping_condition_assessment: Do not PASS. Do not use NEEDS_MORE_SOURCES as the primary verdict: cycle 2 source acquisition materially improved the evidence base, and the most immediate blockers are model logic/diagnostic/uncertainty failures using already-frozen inputs. Findings are not merely a repeat of cycle 1: the toy-scope/source-freeze issues improved, but household-family and residual-reporting failures now block. Human review may be needed for licensed building/proxy and hidden-overlay scope decisions, but model improvement is not exhausted because concrete fixes exist.
recommended_next_cards:
  - assignee: synth-modeler
    title: SE cycle-2 household-family/residual/uncertainty model fix rerun
    reason: Fix child-only households, household-type coherence, FIRM/SOFT residual reporting, attribute uncertainty, and layer-specific degradation metadata using frozen SCB sources.
    depends_on: t_66bdf062
  - assignee: synth-manager
    title: SE building-anchor and hidden-overlay scope decision
    reason: Decide whether to pause production building/hidden layers, obtain licensed Lantmäteriet/official sources, or permit labelled proxy/scaffold anchors for internal review only.
    depends_on: t_66bdf062

Checks run:
- Read current task t_66bdf062 and parent modeler handoff t_29a0c9c4 via kanban_show.
- Read prior cycle-1 reviewer t_33ff07f7 and source/downloader handoff t_1bbf9f63 via kanban_show.
- Read /home/synthestat/workspace/manager_handoffs/SE_other_synthesis_ingest.md.
- Read /home/synthestat/docs/contracts/population_review_bundle.md, /home/synthestat/docs/SOUL.md, and /home/synthestat/docs/specs/research_knowledge_base.md.
- Read model_notes.md and downloader freeze handoff.
- Listed bundle files and validated required contract artifacts/unavailable artifacts exist.
- Parsed all JSON diagnostics/provenance with Python json.
- Read parquet files with pyarrow and checked row counts, schemas, IDs, quality/evidence/fallback fields, person-household joins, household-dwelling joins, age/sex counts, and household age composition.
- Parsed source_provenance fields/timestamps/checksums and unavailable artifact reasons.

Cycle comparison to t_33ff07f7: Materially improved, not a flat repeat. Cycle 1 was an 8-person/two-zone fixture with seeded buildings and inconsistent A-tier geography. Cycle 2 is a non-toy 42-DeSO/91,030-person slice with frozen SCB provenance and exact selected-zone age-sex HARD controls. The old building/hidden/work-school source gaps remain explicit, but the decisive new blocker is model/build quality: invalid household-family construction, absent FIRM/SOFT residual reporting despite new sources, inadequate modelled-attribute uncertainty, and still-overoptimistic geography/layer quality metadata.