refactor: decompose buildGraph() into pipeline stages (ROADMAP 3.9) by carlos-alm · Pull Request #434 · optave/codegraph

carlos-alm · 2026-03-13T08:19:23Z

Summary

Decompose the 1,487-line monolithic buildGraph() into 9 independently testable pipeline stages communicating through a shared PipelineContext object
Convert src/builder.js from a mega-file to an 11-line barrel re-export (following the established src/db.js pattern)
Eliminate watcher divergence — src/watcher.js now delegates to builder/incremental.js instead of duplicating 140 lines of node insertion and edge building logic

New directory structure

src/builder/
  context.js           — PipelineContext class (shared mutable state)
  helpers.js           — Extracted utilities (collectFiles, fileHash, batch inserts)
  incremental.js       — Single-file rebuild for watch mode
  pipeline.js          — Orchestrator: setup → stages → return
  stages/
    collect-files.js   — File collection + scoped rebuild
    detect-changes.js  — Three-tier change detection + reverse-dep cascade
    parse-files.js     — parseFilesAuto dispatch
    insert-nodes.js    — Batch node/edge insertion in SQLite transaction
    resolve-imports.js — Batch import resolution + barrel map
    build-edges.js     — Import/call/receiver/extends/implements edges
    build-structure.js — Directory structure + node role classification
    run-analyses.js    — Unified AST analysis engine dispatch
    finalize.js        — Metadata, drift detection, cleanup, registry

Backward compatibility

All existing import { buildGraph, collectFiles, ... } from './builder.js' continue working unchanged. The barrel re-export is transparent to consumers.

Test plan

All 16 build.test.js integration tests pass
All 5 scoped-rebuild.test.js tests pass (including reverse-dep cascade)
All incremental-parity.test.js tests pass
All 18 builder.test.js unit tests pass (collectFiles, loadPathAliases, readFileSafe)
14 new pipeline stage unit tests pass (context, collect-files, detect-changes, pipeline)
301 total tests pass across 19 test files, zero regressions
Lint clean (1 intentional warning: unused db param in rebuildFile kept for API consistency)

Split the 1,487-line monolithic buildGraph() into independently testable pipeline stages communicating through a shared PipelineContext object. New directory structure: src/builder/ context.js — PipelineContext class (shared mutable state) helpers.js — Extracted utilities (collectFiles, fileHash, batch inserts) incremental.js — Single-file rebuild for watch mode pipeline.js — Orchestrator: setup → stages → return stages/ collect-files.js — File collection + scoped rebuild detect-changes.js — Three-tier change detection + reverse-dep cascade parse-files.js — parseFilesAuto dispatch insert-nodes.js — Batch node/edge insertion in SQLite transaction resolve-imports.js— Batch import resolution + barrel map build-edges.js — Import/call/receiver/extends/implements edges build-structure.js— Directory structure + node role classification run-analyses.js — Unified AST analysis engine dispatch finalize.js — Metadata, drift detection, cleanup, registry src/builder.js is now a barrel re-export (11 lines) following the established src/db.js pattern. All existing imports continue working. src/watcher.js now delegates to builder/incremental.js instead of duplicating 140 lines of node insertion and edge building logic. Impact: 28 functions changed, 27 affected

claude · 2026-03-13T08:19:39Z

Claude finished @carlos-alm's task —— View job

PR Review: Decompose buildGraph() into pipeline stages

Comprehensive review of this large refactoring PR that decomposes the monolithic buildGraph() function.

Root Cause Analysis

Problem Being Solved: 1,487-line monolithic buildGraph() function was difficult to maintain, test, and understand. Additionally, src/watcher.js duplicated 140 lines of node insertion/edge building logic.

Why This Approach: Pipeline decomposition with shared PipelineContext enables:

Independent testing of each stage
Clear separation of concerns
Elimination of code duplication in watcher
Backward compatibility through barrel re-exports

Risk Assessment: Low risk - well-structured refactoring with comprehensive test coverage.

Configuration Changes ✅

No linting rules relaxed - biome.json unchanged
No new dependencies - package.json runtime deps unchanged
No test weakening - no eslint-disable or workarounds found
No timeout increases - test thresholds maintained

Test Integrity ✅

Tests are properly structured and focused
New unit tests for pipeline stages (context, collect-files, detect-changes, pipeline)
Integration tests still validate end-to-end functionality
Test assertions are appropriate (checking actual behavior, not weakened)
Claims of 301 total tests across 19 test files appear credible

Code Quality Assessment ⭐⭐⭐⭐

Strengths:

Clean architecture: Pipeline stages are focused and single-responsibility
Shared context pattern: PipelineContext eliminates complex parameter passing
Proper separation: Each stage handles one concern (collect → detect → parse → insert → resolve → edges → structure → analyze → finalize)
Good documentation: Each stage has clear purpose and JSDoc types
Performance-conscious: Pre-loads node lookup maps to avoid N+1 queries
Memory management: WASM tree cleanup in finalize stage
Backward compatibility: Barrel re-export maintains all existing imports

Technical Observations:

src/builder.js transformed from 1,487 lines to 11-line barrel (99.3% reduction)
src/watcher.js now delegates to builder/incremental.js eliminating duplication
Pipeline stages follow consistent patterns with timing, error handling
Proper database transaction usage in critical sections

Minor Concerns:

Large PR scope (19 files) makes review challenging
Some stages are still substantial (build-edges.js ~297 lines) but appropriately focused

Backlog Compliance

Zero-dep: ✓ — No new runtime dependencies added, builds on existing tree-sitter/SQLite/AST foundation

Foundation-aligned: ✓ — Fully compliant with FOUNDATION.md principles:

P1 (always-current): No impact on graph freshness
P4 (zero-cost core): Core refactoring, no LLM dependencies
P5 (embeddable-first): Maintains programmatic API contract
P8 (not a linting tool): Pure architectural improvement

Problem-fit: 3/5 — Improves maintainability and testability but doesn't directly address the core "lost AI" problem. However, better code structure indirectly helps by making the codebase more navigable.

Breaking: No — All existing import statements continue working unchanged through barrel re-exports

Tier: 1 (Zero-dep ✓ + Foundation-aligned ✓)

Critical Concerns

None identified. This is a well-executed refactoring with appropriate safeguards.

Final Recommendation

Rating: ⭐⭐⭐⭐☆ (4/5)
Action: APPROVE
Reasoning: High-quality architectural refactoring that significantly improves code maintainability while preserving backward compatibility. The pipeline decomposition is well-designed, thoroughly tested, and eliminates code duplication. The barrel re-export pattern is already established in the codebase (src/db.js). This refactoring creates a solid foundation for future enhancements.

Minor suggestions for follow-up (not blocking):

Consider splitting the largest stages (build-edges.js) further if they grow
Document the PipelineContext lifecycle in architecture docs

carlos-alm · 2026-03-13T08:23:54Z

@greptileai

greptile-apps · 2026-03-13T08:29:09Z

Greptile Summary

This PR decomposes the 1,487-line monolithic buildGraph() into 9 independently testable pipeline stages communicating through a shared PipelineContext object, following the established src/db.js barrel pattern. All existing public imports remain unchanged. The watcher divergence (previously ~140 lines of duplicated node/edge logic in watcher.js) is eliminated by delegating to the new incremental.js.

The refactoring is architecturally sound and all prior review-thread issues have been addressed in the current HEAD commit:

DB leak protection: try block now wraps everything from openDb through all 9 stages; !ctx.earlyExit guard prevents double-close on the detectChanges early-exit path.
ctx.fileSymbols partial-population: Moved before the SQLite transaction in insertNodes so rollback leaves the JS Map consistent.
readFileSafe in buildStructure: Both fs.readFileSync calls replaced for transient-error retry on Windows.
Analysis failure visibility: run-analyses.js failure now logs at warn level.
Timing ?? 0 fallbacks: All undefined-timing .toFixed(1) crash paths guarded.
Static fs import in incremental.js: Dynamic import inside hot-path eliminated.

Two minor cleanup items remain:

Dead code in src/watcher.js: stmts.countEdgesForFile and the underlying origCountEdges prepared statement are constructed on every watchProject() call but never consumed by incremental.js after the refactoring.
Stale phase-numbering comment in src/builder/stages/insert-nodes.js: Transaction phases are labelled 1 → 1b → 3 → 5, with the Phase 3 comment still referencing a "Phase 2" that no longer exists.

Confidence Score: 5/5

This PR is safe to merge — the refactoring is a clean decomposition with no regressions and all prior critical issues resolved.
All nine previously-flagged logic issues (DB leaks, partial fileSymbols mutation, readFileSafe bypasses, timing crashes, dynamic import hot-path overhead, analysis failures swallowed at debug level) have been addressed at the current HEAD. The two remaining items — a dead SQLite prepared statement in watcher.js and a stale phase-number comment in insert-nodes.js — are cosmetic and carry no runtime risk. The 301-test suite with zero regressions further supports confidence.
No files require special attention — the two outstanding items in src/watcher.js and src/builder/stages/insert-nodes.js are style-level cleanup only.

Important Files Changed

Filename	Overview
src/builder.js	Reduced from 1,487 lines to an 11-line barrel re-export, delegating to `src/builder/pipeline.js` and `src/builder/helpers.js`. All existing public imports (`buildGraph`, `collectFiles`, `loadPathAliases`, `purgeFilesFromGraph`, `readFileSafe`, `resolveImportPath`) continue to work unchanged.
src/builder/context.js	New `PipelineContext` class with clearly typed JSDoc fields for each pipeline stage's inputs and outputs. Clean separation with sensible defaults (`earlyExit = false`, `forceFullRebuild = false`, `hasEmbeddings = false`, `timing = {}`). No logic — pure data holder.
src/builder/helpers.js	Extracted shared utilities from the monolithic builder: `collectFiles`, `loadPathAliases`, `fileHash`, `fileStat`, `readFileSafe`, `purgeFilesFromGraph`, `batchInsertNodes`, `batchInsertEdges`. Symlink loop detection ordering and `warn()` call are correctly restored per prior review feedback.
src/builder/incremental.js	Single-file rebuild used by watch mode, extracted from the old watcher inline logic. Static top-level `fs` import (dynamic import regression fixed per prior review). Correctly handles node insertion, import/call edge building, and symbol diffs.
src/builder/pipeline.js	Pipeline orchestrator. `try` block now starts immediately after `openDb` (covers setup, all 9 stages). `earlyExit` guard prevents double-close on the `detectChanges` early-exit path. All timing fields use `?? 0` fallback. Clean and well-structured.
src/watcher.js	Now delegates file rebuilding to `incremental.js`, eliminating the ~140-line duplication. One dead-code remnant: `stmts.countEdgesForFile` and the underlying `origCountEdges` prepared statement are still constructed but never consumed by `incremental.js`.
src/builder/stages/detect-changes.js	Three-tier change detection (journal → mtime+size → hash), reverse-dependency cascade, and scoped-build handling. Early-exit paths correctly call `closeDb` before setting `ctx.earlyExit`. `writeJournalHeader` swallows its own errors so the close/earlyExit ordering is safe.
src/builder/stages/insert-nodes.js	`ctx.fileSymbols` is correctly populated before the transaction starts (partial-population-on-rollback fix from prior review). Phase-numbering comments are stale (Phase 1 → 1b → 3 → 5 with no Phase 2/4), but the logic itself is correct.
src/builder/stages/build-edges.js	Builds import, call, receiver, extends, and implements edges. Pre-loads all nodes into in-memory lookup maps to avoid N+1 DB queries. Both native and JS-fallback paths are correctly wired up. `seenCallEdges` deduplication prevents duplicate call edges in the JS path.
src/builder/stages/finalize.js	WASM cleanup, stats logging, drift detection, build metadata, and journal header. Correctly resolves `package.json` three levels up from `src/builder/stages/`. DB close and journal write are properly ordered.

Sequence Diagram

sequenceDiagram
    participant C as Caller
    participant P as pipeline.js<br/>buildGraph()
    participant CTX as PipelineContext
    participant DB as SQLite DB

    C->>P: buildGraph(rootDir, opts)
    P->>CTX: new PipelineContext()
    P->>DB: openDb()
    P->>DB: initSchema()
    P->>CTX: setup (config, engine, aliases, timing.setupMs)

    P->>P: collectFiles(ctx)
    Note over P: allFiles + discoveredDirs → ctx

    P->>P: detectChanges(ctx)
    Note over P: Three-tier detection (journal→mtime→hash)<br/>reverse-dep cascade<br/>earlyExit=true → closeDb + return

    alt earlyExit
        P-->>C: return undefined
    end

    P->>P: parseFiles(ctx)
    Note over P: parseFilesAuto → allSymbols, fileSymbols

    P->>P: insertNodes(ctx)
    Note over P: fileSymbols populated before tx<br/>batch INSERT nodes + edges (contains/param)

    P->>P: resolveImports(ctx)
    Note over P: batch resolve + barrel map<br/>load unchanged barrels from DB

    P->>P: buildEdges(ctx)
    Note over P: import / call / receiver /<br/>extends / implements edges

    P->>P: buildStructure(ctx)
    Note over P: directory nodes, metrics,<br/>node role classification

    P->>P: runAnalyses(ctx)
    Note over P: AST / complexity / CFG / dataflow<br/>failures logged at warn level

    P->>P: finalize(ctx)
    Note over P: drift detection, metadata,<br/>closeDb, journal header

    P-->>C: return { phases: { setupMs, parseMs, ... } }

Comments Outside Diff (1)

src/watcher.js, line 58-76 (link)

Dead code left after incremental.js refactoring

stmts.countEdgesForFile (and the underlying origCountEdges prepared statement) was used by the old inline 140-line watcher logic that this PR removes. incremental.js never calls stmts.countEdgesForFile, so the statement is prepared and the wrapper object is created on every watchProject() invocation but never consumed.

The same applies to origCountEdges (lines 72-74) and the countEdgesForFile: null sentinel on line 58 — all three can be removed:

→ delete lines 58, 72-74, and 76 entirely.

Leaving it in doesn't break anything, but it allocates a prepared SQLite statement on every watcher start and silently misleads future readers into thinking the count query is still needed.

_{Last reviewed commit: 027af48}

greptile-apps · 2026-03-13T08:29:13Z

src/builder/helpers.js

+  try {
+    entries = fs.readdirSync(dir, { withFileTypes: true });
+  } catch (err) {
+    warn(`Cannot read directory ${dir}: ${err.message}`);
+    return trackDirs ? { files, directories } : files;
+  }
+
+  // Merge config ignoreDirs with defaults
+  const extraIgnore = config.ignoreDirs ? new Set(config.ignoreDirs) : null;
+
+  // Detect symlink loops
+  let realDir;
+  try {
+    realDir = fs.realpathSync(dir);
+  } catch {
+    return trackDirs ? { files, directories } : files;
+  }
+  if (_visited.has(realDir)) return trackDirs ? { files, directories } : files;
+  _visited.add(realDir);


Symlink loop warning silently dropped + readdirSync hoisted before loop check

The refactored collectFiles contains two subtle regressions compared to the original in builder.js:

warn() removed: The original emitted warn(\Symlink loop detected, skipping: ${dir}`)when_visited.has(realDir)` was true. The new code returns silently, making symlink loop problems invisible to users.

readdirSync called before the loop guard: In the original, realpathSync/_visited check happened before readdirSync. Now the directory is always read first — so when a loop is detected on line 76, we've already done the I/O for nothing.

The original ordering:

// 1. realpathSync → fail → return early // 2. _visited check → loop → warn + return early // 3. readdirSync (only if no loop)

Suggested restoration of the warning (at minimum):

Suggested change

try {

entries = fs.readdirSync(dir, { withFileTypes: true });

} catch (err) {

warn(`Cannot read directory ${dir}: ${err.message}`);

return trackDirs ? { files, directories } : files;

}

// Merge config ignoreDirs with defaults

const extraIgnore = config.ignoreDirs ? new Set(config.ignoreDirs) : null;

// Detect symlink loops

let realDir;

try {

realDir = fs.realpathSync(dir);

} catch {

return trackDirs ? { files, directories } : files;

}

if (_visited.has(realDir)) return trackDirs ? { files, directories } : files;

_visited.add(realDir);

// Merge config ignoreDirs with defaults

const extraIgnore = config.ignoreDirs ? new Set(config.ignoreDirs) : null;

// Detect symlink loops

let realDir;

try {

realDir = fs.realpathSync(dir);

} catch {

return trackDirs ? { files, directories } : files;

}

if (_visited.has(realDir)) {

warn(`Symlink loop detected, skipping: ${dir}`);

return trackDirs ? { files, directories } : files;

}

_visited.add(realDir);

let entries;

try {

entries = fs.readdirSync(dir, { withFileTypes: true });

} catch (err) {

warn(`Cannot read directory ${dir}: ${err.message}`);

return trackDirs ? { files, directories } : files;

}

Fixed in d270073. Moved symlink loop detection (realpathSync + _visited check) before readdirSync to avoid wasted I/O, and restored the warn() call so loop detection is visible to users.

greptile-apps · 2026-03-13T08:29:14Z

src/builder/incremental.js

+export async function rebuildFile(db, rootDir, filePath, stmts, engineOpts, cache, options = {}) {
+  const { diffSymbols } = options;
+  const relPath = normalizePath(path.relative(rootDir, filePath));
+  const fs = await import('node:fs');


Dynamic fs import inside hot-path function

rebuildFile is called for every file change event in watch mode. Using a dynamic await import('node:fs') here creates and resolves a new Promise on each invocation — while Node.js caches the module, the async overhead (microtask scheduling, Promise allocation) still accumulates across many rapid changes.

All other files in src/builder/ import fs as a top-level static import. This should follow the same pattern:

Suggested change

const fs = await import('node:fs');

import fs from 'node:fs';

Then remove the const fs = await import('node:fs'); line inside the function body, and remove the await from the function signature if it's the only reason it's async (it isn't — parseFileIncremental is awaited too, so async stays).

Fixed in d270073. Replaced the dynamic \ with a static top-level , eliminating per-call async overhead in the watch mode hot path.

- Restore symlink loop warning and move detection before readdirSync to avoid wasted I/O (greptile review) - Replace dynamic fs import with static top-level import in incremental.js to avoid per-call async overhead in watch mode (greptile review) Impact: 2 functions changed, 0 affected

carlos-alm · 2026-03-13T08:31:13Z

@greptileai

greptile-apps · 2026-03-13T08:40:19Z

src/builder/pipeline.js

+      astMs: +ctx.timing.astMs.toFixed(1),
+      complexityMs: +ctx.timing.complexityMs.toFixed(1),


Crash if analysis engine fails — timing properties may be undefined

run-analyses.js wraps runAnalysesFn in a try-catch and leaves ctx.timing.astMs / ctx.timing.complexityMs unset when the engine throws. The return statement then calls .toFixed(1) on those undefined values, which throws a TypeError.

This pattern is preserved from the original monolithic builder.js (where _t.astMs.toFixed(1) had the same risk), but the refactoring is a good opportunity to close it.

Suggested change

astMs: +ctx.timing.astMs.toFixed(1),

complexityMs: +ctx.timing.complexityMs.toFixed(1),

astMs: +(ctx.timing.astMs ?? 0).toFixed(1),

complexityMs: +(ctx.timing.complexityMs ?? 0).toFixed(1),

Fixed in d7355e4 — added \ fallback for all timing properties (, , and all others) so \ won't crash if a stage didn't populate its timing field.

greptile-apps · 2026-03-13T08:40:20Z

src/builder/pipeline.js

+  await parseFiles(ctx);
+  await insertNodes(ctx);
+  await resolveImports(ctx);
+  await buildEdges(ctx);
+  await buildStructure(ctx);
+  await runAnalyses(ctx);
+  await finalize(ctx);


DB left open on unhandled stage errors

ctx.db is opened during setup but only closed inside finalize() (or inside detectChanges() on the early-exit path). If any stage between parseFiles and runAnalyses throws an uncaught exception — e.g. an out-of-memory error during a large parse, an unexpected SQLite error, or a missing barrel file — the database connection is leaked.

The original monolithic function had the same issue, but the decomposition into discrete stages makes it more likely that future stage additions will introduce additional failure paths that bypass finalize. A try-finally in buildGraph would catch all of them:

try { await parseFiles(ctx); await insertNodes(ctx); await resolveImports(ctx); await buildEdges(ctx); await buildStructure(ctx); await runAnalyses(ctx); await finalize(ctx); } catch (err) { closeDb(ctx.db); throw err; }

Fixed in d7355e4 (partial) and d229c87 (complete). The try-catch now wraps ALL stages including collectFiles and detectChanges. Added a guard (!ctx.earlyExit) to avoid double-closing since detectChanges already calls closeDb on its early-exit paths.

…ings Wrap pipeline stages in try-catch to close the DB if a stage throws between parseFiles and finalize. Add ?? 0 fallback for all timing properties that may be undefined if a stage didn't run. Impact: 1 functions changed, 0 affected

carlos-alm · 2026-03-13T09:08:58Z

@greptileai

greptile-apps · 2026-03-13T09:15:36Z

src/builder/pipeline.js

+  // ── Pipeline stages ───────────────────────────────────────────────
+  await collectFiles(ctx);
+  await detectChanges(ctx);
+
+  if (ctx.earlyExit) return;
+
+  try {
+    await parseFiles(ctx);
+    await insertNodes(ctx);
+    await resolveImports(ctx);
+    await buildEdges(ctx);
+    await buildStructure(ctx);
+    await runAnalyses(ctx);
+    await finalize(ctx);
+  } catch (err) {
+    closeDb(ctx.db);
+    throw err;
+  }


DB leaked if collectFiles or detectChanges throws

The previous review thread addressed the DB leak for stages between parseFiles and finalize (now wrapped in the try/catch at line 101). However, collectFiles (line 96) and detectChanges (line 97) are called outside that block, so ctx.db — opened at line 43 — is still leaked if either of them throws an uncaught exception.

detectChanges in particular executes multiple SQLite queries: SELECT MAX(mtime) from file_hashes, reverse-dependency cascade joins, and purgeFilesFromGraph. Any of these can throw if the DB file is corrupt, locked, or the schema is in an unexpected state.

The current fix only covers 7 out of the 9 stages:

// Before the try block — db opened but unprotected against throws in these two stages await collectFiles(ctx); // line 96 — no SQLite, but can OOM await detectChanges(ctx); // line 97 — many SQLite queries, CAN throw if (ctx.earlyExit) return; try { // Only these stages are protected await parseFiles(ctx); ... } catch (err) { closeDb(ctx.db); throw err; }

Extending the guard to include detectChanges (and for consistency collectFiles) closes the remaining gap:

try { await collectFiles(ctx); await detectChanges(ctx); if (ctx.earlyExit) return; await parseFiles(ctx); await insertNodes(ctx); await resolveImports(ctx); await buildEdges(ctx); await buildStructure(ctx); await runAnalyses(ctx); await finalize(ctx); } catch (err) { // Only close if not already closed by detectChanges early-exit path if (!ctx.earlyExit) closeDb(ctx.db); throw err; }

Note: detectChanges already calls closeDb on its own early-exit paths (lines 323 and 332 of detect-changes.js), so a guard is needed to avoid double-closing.

Fixed in d229c87 — extended the try-catch to cover all stages including collectFiles and detectChanges. The catch guard checks !ctx.earlyExit to avoid double-closing, since detectChanges already calls closeDb on its early-exit paths.

greptile-apps · 2026-03-13T09:15:37Z

src/builder/stages/insert-nodes.js

+  const insertAll = db.transaction(() => {
+    // Phase 1: Batch insert all file nodes + definitions + exports
+    const phase1Rows = [];
+    for (const [relPath, symbols] of allSymbols) {
+      ctx.fileSymbols.set(relPath, symbols);
+      phase1Rows.push([relPath, 'file', relPath, 0, null, null]);
+      for (const def of symbols.definitions) {
+        phase1Rows.push([def.name, def.kind, relPath, def.line, def.endLine || null, null]);
+      }
+      for (const exp of symbols.exports) {
+        phase1Rows.push([exp.name, exp.kind, relPath, exp.line, null, null]);
+      }
+    }
+    batchInsertNodes(db, phase1Rows);


ctx.fileSymbols mutated inside a SQLite transaction — partial population on rollback

ctx.fileSymbols.set(relPath, symbols) on line 48 is a side effect that happens inside the db.transaction() callback. If the transaction is rolled back mid-way (e.g., a SQLite UNIQUE constraint failed or database is locked error during one of the batch inserts), the transaction's DB writes are atomically rolled back — but ctx.fileSymbols is a plain JS Map and is not rolled back.

The result: subsequent stages (resolveImports, buildEdges, buildStructure) would consume a fileSymbols map that is partially populated — containing symbols for files that were processed before the failure point, but not for files after it. This means edge resolution would silently skip files that were mid-insert, producing an incomplete and inconsistent graph instead of a clean error.

The fix is to populate ctx.fileSymbols before the transaction, so it is a pure input to (rather than a side-effect of) the DB write:

// Populate fileSymbols before the transaction for (const [relPath, symbols] of allSymbols) { ctx.fileSymbols.set(relPath, symbols); } const insertAll = db.transaction(() => { const phase1Rows = []; for (const [relPath, symbols] of allSymbols) { // No longer mutates ctx inside the transaction phase1Rows.push([relPath, 'file', relPath, 0, null, null]); ... } ... });

If the pipeline catch block re-throws, no downstream stage runs anyway — but this also makes the transaction a pure DB operation, which is easier to reason about and test.

Fixed in d229c87 — moved ctx.fileSymbols population before the db.transaction() call so it is a pure input rather than a side-effect of the DB write. On rollback, the JS Map is fully populated (consistent) while the DB is cleanly rolled back.

…nsaction - Extend try-catch in pipeline.js to cover collectFiles and detectChanges (which run SQLite queries that can throw), with a guard to avoid double-closing on the early-exit path. - Move ctx.fileSymbols population before the db.transaction() in insert-nodes.js so a rollback doesn't leave partial JS state. Impact: 2 functions changed, 1 affected

carlos-alm · 2026-03-13T09:21:57Z

All Greptile feedback on #434 addressed in d229c87:

DB leak if collectFiles/detectChanges throws — Extended try-catch to wrap all stages; added !ctx.earlyExit guard to avoid double-close
fileSymbols mutated inside transaction — Hoisted ctx.fileSymbols population before db.transaction() so rollback doesn't leave partial JS state
Timing crash on undefined — Already fixed in d7355e4 (?? 0 fallback on all timing properties)
DB leak between parseFiles and finalize — Already fixed in d7355e4 (now subsumed by the full-coverage try-catch)

@greptileai

greptile-apps · 2026-03-13T09:34:50Z

src/builder/pipeline.js

+ */
+export async function buildGraph(rootDir, opts = {}) {
+  const ctx = new PipelineContext();
+  ctx.buildStart = performance.now();
+  ctx.opts = opts;
+
+  // ── Setup (creates DB, loads config, selects engine) ──────────────
+  ctx.rootDir = path.resolve(rootDir);
+  ctx.dbPath = path.join(ctx.rootDir, '.codegraph', 'graph.db');
+  ctx.db = openDb(ctx.dbPath);
+  initSchema(ctx.db);
+
+  ctx.config = loadConfig(ctx.rootDir);
+  ctx.incremental =
+    opts.incremental !== false && ctx.config.build && ctx.config.build.incremental !== false;
+
+  ctx.engineOpts = {
+    engine: opts.engine || 'auto',
+    dataflow: opts.dataflow !== false,
+    ast: opts.ast !== false,
+  };
+  const { name: engineName, version: engineVersion } = getActiveEngine(ctx.engineOpts);
+  ctx.engineName = engineName;
+  ctx.engineVersion = engineVersion;
+  info(`Using ${engineName} engine${engineVersion ? ` (v${engineVersion})` : ''}`);
+
+  // Engine/schema mismatch detection
+  ctx.schemaVersion = MIGRATIONS[MIGRATIONS.length - 1].version;
+  ctx.forceFullRebuild = false;
+  if (ctx.incremental) {
+    const prevEngine = getBuildMeta(ctx.db, 'engine');
+    if (prevEngine && prevEngine !== engineName) {


DB leaked if setup phase throws before the try block

ctx.db is opened on line 34, but the try/catch that calls closeDb doesn't begin until line 65. If any of the following lines throw, the database file handle is leaked:

initSchema(ctx.db) (line 35) — runs migrations, can fail on a corrupted/locked DB

getBuildMeta(ctx.db, 'engine') (line 47) — live SQLite query

getBuildMeta(ctx.db, 'schema_version') (line 49) — live SQLite query

loadPathAliases(ctx.rootDir) (line 56) — filesystem I/O

The previous review threads described the fix as "complete" and wrapping "ALL stages including collectFiles and detectChanges", but the setup phase between openDb and the try block is not a stage and remains unprotected.

A minimal fix is to start the try block immediately after openDb:

ctx.db = openDb(ctx.dbPath); try { initSchema(ctx.db); // ... all setup code and stages ... } catch (err) { if (!ctx.earlyExit) closeDb(ctx.db); throw err; }

Fixed in 027af48 — moved the try block to start immediately after openDb, so initSchema, getBuildMeta, loadPathAliases, and all other setup code is now covered. The earlyExit guard still prevents double-closing on the detectChanges early-exit path.

greptile-apps · 2026-03-13T09:34:51Z

src/builder/stages/run-analyses.js

+  } catch (err) {
+    debug(`Unified analysis engine failed: ${err.message}`);
+  }


Analysis engine failures silently swallowed at debug level

When runAnalysesFn throws, the catch block only logs at debug level (line 42). This means failures in AST analysis, complexity metrics, CFG construction, or dataflow analysis are completely invisible to users unless they have debug logging explicitly enabled.

Unlike the pipeline's stage-level catch which re-throws (letting the caller decide), this catch silently discards the error and leaves all timing fields unset (which is why the ?? 0 fallbacks in pipeline.js were needed).

Consider raising the log level to warn so users know when analysis data is missing from their graph:

Suggested change

} catch (err) {

debug(`Unified analysis engine failed: ${err.message}`);

}

} catch (err) {

debug(`Unified analysis engine failed: ${err.message}`);

}

Should be warn so the failure is surfaced without aborting the build:

} catch (err) { warn(`Unified analysis engine failed (AST/complexity/CFG/dataflow may be incomplete): ${err.message}`); }

Fixed in 027af48 — promoted the log from debug to warn so users see when analysis data is incomplete. The message now reads: "Analysis engine failed (AST/complexity/CFG/dataflow may be incomplete): ..."

greptile-apps · 2026-03-13T09:34:52Z

src/builder/stages/build-structure.js

+      const absPath = path.join(rootDir, relPath);
+      try {
+        const content = fs.readFileSync(absPath, 'utf-8');
+        ctx.lineCountMap.set(relPath, content.split('\n').length);
+      } catch {
+        ctx.lineCountMap.set(relPath, 0);
+      }
+    }


Direct fs.readFileSync instead of readFileSafe

Lines 26 and 73 use fs.readFileSync directly to read file contents for line counting, bypassing the readFileSafe retry logic that handles transient errors (EBUSY, EACCES, EPERM) common on Windows or with editors doing non-atomic saves. All other file reads in the pipeline stages use readFileSafe.

Suggested change

const absPath = path.join(rootDir, relPath);

try {

const content = fs.readFileSync(absPath, 'utf-8');

ctx.lineCountMap.set(relPath, content.split('\n').length);

} catch {

ctx.lineCountMap.set(relPath, 0);

}

}

const content = readFileSafe(absPath);

This applies to both line 26 and line 73. readFileSafe is already imported in helpers.js and used throughout the other stages for consistency.

Fixed in 027af48 — replaced both fs.readFileSync calls (lines 26 and 73) with readFileSafe for consistent transient-error retry (EBUSY/EACCES/EPERM) on Windows. Also removed the unused fs import.

…eadFileSafe - Move try-catch to start immediately after openDb so initSchema, getBuildMeta, and loadPathAliases are all covered. - Promote analysis engine failure log from debug to warn so users know when AST/complexity/CFG/dataflow data is incomplete. - Replace direct fs.readFileSync with readFileSafe in build-structure.js for consistent transient-error retry on Windows. Impact: 3 functions changed, 1 affected

carlos-alm · 2026-03-13T09:39:11Z

@greptileai

greptile-apps bot reviewed Mar 13, 2026

View reviewed changes

carlos-alm merged commit 6b0f44d into main Mar 13, 2026
26 checks passed

carlos-alm deleted the refactor/builder-pipeline branch March 13, 2026 10:00

github-actions bot locked and limited conversation to collaborators Mar 13, 2026

	const fs = await import('node:fs');
	import fs from 'node:fs';

		astMs: +ctx.timing.astMs.toFixed(1),
		complexityMs: +ctx.timing.complexityMs.toFixed(1),

Conversation

carlos-alm commented Mar 13, 2026

Summary

New directory structure

Backward compatibility

Test plan

Uh oh!

claude bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Decompose buildGraph() into pipeline stages

Root Cause Analysis

Configuration Changes ✅

Test Integrity ✅

Code Quality Assessment ⭐⭐⭐⭐

Backlog Compliance

Critical Concerns

Final Recommendation

Uh oh!

carlos-alm commented Mar 13, 2026

Uh oh!

greptile-apps bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Comments Outside Diff (1)

Uh oh!

greptile-apps bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

carlos-alm commented Mar 13, 2026

Uh oh!

greptile-apps bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

carlos-alm commented Mar 13, 2026

Uh oh!

greptile-apps bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

carlos-alm commented Mar 13, 2026

Uh oh!

greptile-apps bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

carlos-alm commented Mar 13, 2026

claude bot commented Mar 13, 2026 •

edited

Loading

greptile-apps bot commented Mar 13, 2026 •

edited

Loading