Architecture
Review pipeline
Daydream runs in two modes. Deep review (the default) is a staged pipeline
implemented in
daydream/deep/orchestrator.py.
The user-visible stages are:
- Exploration pre-scan: tree-sitter import resolution and convention
detection across changed files, implemented in
daydream/exploration_runner.py. - Intent analysis: the agent reads the diff and commit history to understand
what the change is trying to do (
phase_understand_intent). - Alternative review: identifies potential improvements as numbered findings,
each with file, line, severity, and rationale (
phase_alternative_review). - Per-stack reviews: parallel Beagle skill invocations, one per detected stack
(Python, React, Elixir, Go, Rust, iOS). Each stack gets its own review agent
that runs the matching
review-*skill (phase_per_stack_reviews). - Per-stack parse and dedup: each stack's raw output is parsed into structured findings and cross-checked for duplicates.
- Cross-stack merge: deduplicates per-stack findings into a unified report
(
phase_cross_stack_merge).
After the merge, three sub-stages run:
- Arbiter (
phase_arbiter_review): resolves high-severity and contested findings by re-reading the code. The arbiter fires only when qualifying findings exist; the pre-flight estimate always includes it so the extra model call is never a surprise. An arbiter verdict can revise severity, confidence, description, or rationale. An explicitkeep: falsedrops a finding. A missing or ambiguous verdict fails open: the original finding is retained. - Verification (
phase_verify_recommendations): adjudicates each finding against the actual code, attaching a per-finding verdict ofconsistent,uncertain, orcontradicts, with evidence. Acontradictsverdict blocks the fixer from applying that recommendation. - Fix gate (
phase_fix_parallelthenphase_test_and_heal): applies the surviving fixes one at a time and validates the result against the project's test suite, with a bounded fix-and-retry heal loop on failure.
The pre-flight agent count for a deep run is
2 + 2N + 2, where N is the number of detected stacks: two TTT agents (intent
and alternative review), N per-stack review agents, N per-stack parse passes,
one merge agent, and one conditional arbiter agent
(total_agent_count).
Shallow review (--shallow) collapses this to a single-skill loop: review,
parse, fix, test. It is useful for single-stack projects or when forcing a
specific Beagle skill with --skill.
Tiny-diff short-circuit (issue #172): when a deep-mode diff has at most 2
changed files (configurable via
shallow_fanout_threshold),
the per-language fan-out collapses to a single combined assignment and the merge
and arbiter stages are skipped. A single-file single-language diff still gets a
per-language Beagle skill; only diffs with two or more distinct language stacks
fall back to the generic reviewer.
Trajectory recording
Every run produces an ATIF v1.6 trajectory capturing prompts, responses, tool calls, and per-step token and cost metrics. The recorder is the sole owner of ATIF Pydantic model construction; other modules import only its public surface.
Each agent invocation (run_agent) opens one Invocation scope on the shared
recorder. Backends emit a unified AgentEvent stream
(daydream/backends/__init__.py)
that the invocation buffers into ATIF Steps. Event types: TextEvent,
ThinkingEvent, ToolStartEvent, ToolResultEvent, CostEvent,
MetricsEvent, TurnEndEvent, and ResultEvent. A TurnEndEvent closes the
open Step, so a multi-turn invocation produces N Steps rather than one collapsed
Step.
Parallel fan-outs fork sibling trajectories under the run directory. Each fork writes a separate trajectory file alongside the parent.
Daydream redacts secrets before writing. Ordered rules cover URL credentials,
PEM private key blocks, secret-bearing environment variable assignments, bare
API keys (sk-, ghp_, ghs_, xoxb-, AKIA), JWTs, and local username paths. The
redactor applies to all four ATIF text surfaces: step messages, reasoning
content, tool call arguments, and observation results. On internal failure, the
redactor degrades to [REDACTION_FAILED] rather than letting the raw value
through.
On SIGINT or SIGTERM, the signal handler
(daydream/cli.py)
flushes a .partial trajectory file with extra.partial set, so interrupted
runs are not lost. The handler uses a module-level recorder stack rather than
the async ContextVar, because signal handlers fire in the main thread outside
the asyncio task context.
Archive
Each run is archived to ~/.daydream/archive/runs/<session_id>/ with its
manifest, trajectory (including forks), review output, evaluation results (when
--eval is set), deep artifacts, and the diff patch. Archiving is non-fatal:
if it fails, the run still completes and the failure is logged.
A SQLite index at ~/.daydream/archive/index.db
(daydream/archive/_schema.py)
supports querying across projects. Two tables:
runs: 39 columns includingrepo_slug,backend,total_cost_usd,grounding_rate,outcome_labels,composite_reward,changed_files,head_sha,base_sha, and per-phase backend columns.label_observations: the bitemporal label table. Tracks transaction time (observed_at), valid time (valid_at), evidence SHA, reward version, reward JSON, composite reward, reviewer logins, posterior flag, and source (autoorhuman). Human labels take precedence over automated labels in every projection.
The index runs in WAL mode with a 5-second busy timeout. Five indexes cover
repo_slug, archived_at, outcome_labels, observed_at, and session_id.