Architecture

Review pipeline

Daydream runs in two modes. Deep review (the default) is a staged pipeline implemented in daydream/deep/orchestrator.py. The user-visible stages are:

  1. Exploration pre-scan: tree-sitter import resolution and convention detection across changed files, implemented in daydream/exploration_runner.py.
  2. Intent analysis: the agent reads the diff and commit history to understand what the change is trying to do (phase_understand_intent).
  3. Alternative review: identifies potential improvements as numbered findings, each with file, line, severity, and rationale (phase_alternative_review).
  4. Per-stack reviews: parallel Beagle skill invocations, one per detected stack (Python, React, Elixir, Go, Rust, iOS). Each stack gets its own review agent that runs the matching review-* skill (phase_per_stack_reviews).
  5. Per-stack parse and dedup: each stack's raw output is parsed into structured findings and cross-checked for duplicates.
  6. Cross-stack merge: deduplicates per-stack findings into a unified report (phase_cross_stack_merge).

After the merge, three sub-stages run:

  • Arbiter (phase_arbiter_review): resolves high-severity and contested findings by re-reading the code. The arbiter fires only when qualifying findings exist; the pre-flight estimate always includes it so the extra model call is never a surprise. An arbiter verdict can revise severity, confidence, description, or rationale. An explicit keep: false drops a finding. A missing or ambiguous verdict fails open: the original finding is retained.
  • Verification (phase_verify_recommendations): adjudicates each finding against the actual code, attaching a per-finding verdict of consistent, uncertain, or contradicts, with evidence. A contradicts verdict blocks the fixer from applying that recommendation.
  • Fix gate (phase_fix_parallel then phase_test_and_heal): applies the surviving fixes one at a time and validates the result against the project's test suite, with a bounded fix-and-retry heal loop on failure.

The pre-flight agent count for a deep run is 2 + 2N + 2, where N is the number of detected stacks: two TTT agents (intent and alternative review), N per-stack review agents, N per-stack parse passes, one merge agent, and one conditional arbiter agent (total_agent_count).

Shallow review (--shallow) collapses this to a single-skill loop: review, parse, fix, test. It is useful for single-stack projects or when forcing a specific Beagle skill with --skill.

Tiny-diff short-circuit (issue #172): when a deep-mode diff has at most 2 changed files (configurable via shallow_fanout_threshold), the per-language fan-out collapses to a single combined assignment and the merge and arbiter stages are skipped. A single-file single-language diff still gets a per-language Beagle skill; only diffs with two or more distinct language stacks fall back to the generic reviewer.

Trajectory recording

Every run produces an ATIF v1.6 trajectory capturing prompts, responses, tool calls, and per-step token and cost metrics. The recorder is the sole owner of ATIF Pydantic model construction; other modules import only its public surface.

Each agent invocation (run_agent) opens one Invocation scope on the shared recorder. Backends emit a unified AgentEvent stream (daydream/backends/__init__.py) that the invocation buffers into ATIF Steps. Event types: TextEvent, ThinkingEvent, ToolStartEvent, ToolResultEvent, CostEvent, MetricsEvent, TurnEndEvent, and ResultEvent. A TurnEndEvent closes the open Step, so a multi-turn invocation produces N Steps rather than one collapsed Step.

Parallel fan-outs fork sibling trajectories under the run directory. Each fork writes a separate trajectory file alongside the parent.

Daydream redacts secrets before writing. Ordered rules cover URL credentials, PEM private key blocks, secret-bearing environment variable assignments, bare API keys (sk-, ghp_, ghs_, xoxb-, AKIA), JWTs, and local username paths. The redactor applies to all four ATIF text surfaces: step messages, reasoning content, tool call arguments, and observation results. On internal failure, the redactor degrades to [REDACTION_FAILED] rather than letting the raw value through.

On SIGINT or SIGTERM, the signal handler (daydream/cli.py) flushes a .partial trajectory file with extra.partial set, so interrupted runs are not lost. The handler uses a module-level recorder stack rather than the async ContextVar, because signal handlers fire in the main thread outside the asyncio task context.

Archive

Each run is archived to ~/.daydream/archive/runs/<session_id>/ with its manifest, trajectory (including forks), review output, evaluation results (when --eval is set), deep artifacts, and the diff patch. Archiving is non-fatal: if it fails, the run still completes and the failure is logged.

A SQLite index at ~/.daydream/archive/index.db (daydream/archive/_schema.py) supports querying across projects. Two tables:

  • runs: 39 columns including repo_slug, backend, total_cost_usd, grounding_rate, outcome_labels, composite_reward, changed_files, head_sha, base_sha, and per-phase backend columns.
  • label_observations: the bitemporal label table. Tracks transaction time (observed_at), valid time (valid_at), evidence SHA, reward version, reward JSON, composite reward, reviewer logins, posterior flag, and source (auto or human). Human labels take precedence over automated labels in every projection.

The index runs in WAL mode with a 5-second busy timeout. Five indexes cover repo_slug, archived_at, outcome_labels, observed_at, and session_id.

Back to Daydream

Daydream overview