Skip to content

Silent process termination after long-running child_process.spawn() on Windows (v24.13.0, works on v22.22.1) #62125

@tsunosekai

Description

@tsunosekai

Node.js v24.13.0: Silent process crash after long-running child_process operations on Windows

Summary

Node.js v24.13.0 on Windows 11 silently terminates (no error, no exit handler, no diagnostic report) after ~25 minutes of child process execution. The crash occurs consistently at a specific point in the program flow. Downgrading to Node.js v22.22.1 LTS completely resolves the issue with no code changes.

Environment

  • Node.js: v24.13.0 (crashes) / v22.22.1 (works)
  • OS: Windows 11 Home 10.0.22621 (x64)
  • CPU: AMD Ryzen 5 5600G (AMD64 Family 25 Model 33)
  • RAM: 32 GB
  • Shell: Git Bash (MINGW64)
  • Node version manager: nvm-windows

Reproduction scenario

The application is a CLI tool that distributes tasks to remote cloud servers via SSH:

  1. Phase 1: Connects to a remote server via ssh2 (npm) and uploads files via SFTP
  2. Phase 2: Disconnects ssh2, then spawns a system ssh child process (child_process.spawn) to run a long-running remote command (~25 minutes)
  3. Phase 3: Spawns another ssh + tar child process pair (piped) to download results
  4. Phase 4: Cleans up remote files via another spawned ssh process

The crash occurs at the transition from Phase 2 to Phase 3 — after the long-running child process completes and before (or during) the next spawn() call.

Crash characteristics

The crash is not a JavaScript-level error. All of the following were in place, and none of them fired:

// All registered — none triggered on crash
process.on('uncaughtException', handler);   // ✗ not called
process.on('unhandledRejection', handler);  // ✗ not called
process.on('exit', handler);                // ✗ not called
process.on('SIGTERM', handler);             // ✗ not called
process.on('SIGHUP', handler);              // ✗ not called
process.on('SIGBREAK', handler);            // ✗ not called

// Diagnostic reports enabled — no report generated
process.report.reportOnFatalError = true;   // ✗ no report file
process.report.reportOnSignal = true;       // ✗ no report file

Additionally:

  • No Windows Event Log entries related to the crash
  • No crash dump files generated
  • No stderr output at the moment of crash
  • The process simply vanishes — the terminal returns to the prompt with no message

Debugging performed

1. Synchronous breadcrumb logging

Added fs.appendFileSync() calls before and after every operation to trace the exact crash point. This confirmed the crash happens after Phase 2 completes and before Phase 3's spawn() returns.

Example breadcrumb log (last entries before crash on v24):

9:06:32 [worker-1] _sshExec: completed with code 0    ← Phase 2 done
9:06:32 [worker-1] GC complete before Phase 3
9:06:32 [worker-1] Phase 3 START (heapUsed=24MB, rss=99MB)
9:06:32 [worker-1] Phase 3a: before rmSync              ← CRASH HERE

2. Memory analysis (ruled out OOM)

At the moment of crash:

  • heapUsed: 24 MB
  • rss: 99 MB
  • System free memory: ~9 GB (of 32 GB total)

OOM is definitively ruled out.

3. ssh2 library ruled out

Initially suspected the ssh2 npm library's native crypto bindings. However:

  • ssh2 was disconnected and set to null 29 minutes before the crash
  • The native crypto build directory was removed via postinstall script (JS fallback only)
  • global.gc() was explicitly called twice after disconnecting ssh2
  • The crash still occurred even with ssh2 completely out of the picture

4. Systematic elimination

Hypothesis Test Result
ssh2 native crypto Removed native build, used JS fallback Still crashes
ssh2 connection state Disconnected ssh2 29 min before crash Still crashes
OOM Measured: 24MB heap, 99MB rss Not OOM
System memory 9GB free of 32GB Not system OOM
SFTP download Replaced with system ssh+tar pipe Still crashes
rmSync with Unicode paths Removed rmSync before crash point Still crashes
GC timing Explicit global.gc() before Phase 3 Still crashes
Event loop starvation Added 50ms yield before Phase 3 Still crashes
Node.js v24.13.0 Switched to v22.22.1 Fixed

5. Successful run on v22.22.1

With identical code (no changes between test runs other than the Node.js version):

=== Summary ===
Completed: 3/3
Elapsed: 57.4 min
Process exiting with code 0

All three tasks completed, including two that involve ~25 minutes of child process execution followed by download. The crash never occurred.

Minimal reproduction pattern

While I haven't created a minimal reproducer, the pattern is:

  1. Run on Windows 11 with Node.js v24.13.0
  2. Use child_process.spawn() to run a long-lived process (~25 min)
  3. After the spawned process exits, immediately spawn new processes
  4. The process silently terminates at or around the second spawn() call

The application uses:

  • child_process.spawn() for SSH and tar processes
  • stream.pipe() between spawned processes (ssh stdout → tar stdin)
  • setInterval (1s dashboard timer running throughout)
  • --expose-gc flag with explicit global.gc() calls
  • ESM modules ("type": "module")

Possibly related

  • CVE-2025-59466: async_hooks stack overflow causing unrecoverable process termination. Fixed in v24.13.0, but the symptoms (process death bypassing all error handlers) are identical. This application does not explicitly use async_hooks, but dependencies or Node.js internals may.
  • The crash only manifests after long-running operations (~25 min), suggesting a time-dependent or GC-cycle-dependent trigger.

Dependencies

{
  "dotenv": "^16.5.0",
  "js-yaml": "^4.1.0",
  "ssh2": "^1.17.0",
  "yargs": "^17.7.2"
}

Note: ssh2's native crypto build is removed via postinstall (rm -rf node_modules/ssh2/lib/protocol/crypto/build), forcing the pure JS fallback.

Expected behavior

The process should either:

  1. Continue running normally, or
  2. If an error occurs, trigger uncaughtException/unhandledRejection/exit handlers and produce a diagnostic report

Actual behavior

The process silently vanishes with no output, no error handlers triggered, and no diagnostic report generated.

Workaround

Use Node.js v22 LTS instead of v24.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions