-
-
Notifications
You must be signed in to change notification settings - Fork 34.9k
Description
Node.js v24.13.0: Silent process crash after long-running child_process operations on Windows
Summary
Node.js v24.13.0 on Windows 11 silently terminates (no error, no exit handler, no diagnostic report) after ~25 minutes of child process execution. The crash occurs consistently at a specific point in the program flow. Downgrading to Node.js v22.22.1 LTS completely resolves the issue with no code changes.
Environment
- Node.js: v24.13.0 (crashes) / v22.22.1 (works)
- OS: Windows 11 Home 10.0.22621 (x64)
- CPU: AMD Ryzen 5 5600G (AMD64 Family 25 Model 33)
- RAM: 32 GB
- Shell: Git Bash (MINGW64)
- Node version manager: nvm-windows
Reproduction scenario
The application is a CLI tool that distributes tasks to remote cloud servers via SSH:
- Phase 1: Connects to a remote server via ssh2 (npm) and uploads files via SFTP
- Phase 2: Disconnects ssh2, then spawns a system
sshchild process (child_process.spawn) to run a long-running remote command (~25 minutes) - Phase 3: Spawns another
ssh+tarchild process pair (piped) to download results - Phase 4: Cleans up remote files via another spawned
sshprocess
The crash occurs at the transition from Phase 2 to Phase 3 — after the long-running child process completes and before (or during) the next spawn() call.
Crash characteristics
The crash is not a JavaScript-level error. All of the following were in place, and none of them fired:
// All registered — none triggered on crash
process.on('uncaughtException', handler); // ✗ not called
process.on('unhandledRejection', handler); // ✗ not called
process.on('exit', handler); // ✗ not called
process.on('SIGTERM', handler); // ✗ not called
process.on('SIGHUP', handler); // ✗ not called
process.on('SIGBREAK', handler); // ✗ not called
// Diagnostic reports enabled — no report generated
process.report.reportOnFatalError = true; // ✗ no report file
process.report.reportOnSignal = true; // ✗ no report fileAdditionally:
- No Windows Event Log entries related to the crash
- No crash dump files generated
- No stderr output at the moment of crash
- The process simply vanishes — the terminal returns to the prompt with no message
Debugging performed
1. Synchronous breadcrumb logging
Added fs.appendFileSync() calls before and after every operation to trace the exact crash point. This confirmed the crash happens after Phase 2 completes and before Phase 3's spawn() returns.
Example breadcrumb log (last entries before crash on v24):
9:06:32 [worker-1] _sshExec: completed with code 0 ← Phase 2 done
9:06:32 [worker-1] GC complete before Phase 3
9:06:32 [worker-1] Phase 3 START (heapUsed=24MB, rss=99MB)
9:06:32 [worker-1] Phase 3a: before rmSync ← CRASH HERE
2. Memory analysis (ruled out OOM)
At the moment of crash:
heapUsed: 24 MBrss: 99 MB- System free memory: ~9 GB (of 32 GB total)
OOM is definitively ruled out.
3. ssh2 library ruled out
Initially suspected the ssh2 npm library's native crypto bindings. However:
- ssh2 was disconnected and set to
null29 minutes before the crash - The native crypto build directory was removed via postinstall script (JS fallback only)
global.gc()was explicitly called twice after disconnecting ssh2- The crash still occurred even with ssh2 completely out of the picture
4. Systematic elimination
| Hypothesis | Test | Result |
|---|---|---|
| ssh2 native crypto | Removed native build, used JS fallback | Still crashes |
| ssh2 connection state | Disconnected ssh2 29 min before crash | Still crashes |
| OOM | Measured: 24MB heap, 99MB rss | Not OOM |
| System memory | 9GB free of 32GB | Not system OOM |
| SFTP download | Replaced with system ssh+tar pipe | Still crashes |
rmSync with Unicode paths |
Removed rmSync before crash point | Still crashes |
| GC timing | Explicit global.gc() before Phase 3 |
Still crashes |
| Event loop starvation | Added 50ms yield before Phase 3 | Still crashes |
| Node.js v24.13.0 | Switched to v22.22.1 | Fixed |
5. Successful run on v22.22.1
With identical code (no changes between test runs other than the Node.js version):
=== Summary ===
Completed: 3/3
Elapsed: 57.4 min
Process exiting with code 0
All three tasks completed, including two that involve ~25 minutes of child process execution followed by download. The crash never occurred.
Minimal reproduction pattern
While I haven't created a minimal reproducer, the pattern is:
- Run on Windows 11 with Node.js v24.13.0
- Use
child_process.spawn()to run a long-lived process (~25 min) - After the spawned process exits, immediately spawn new processes
- The process silently terminates at or around the second
spawn()call
The application uses:
child_process.spawn()for SSH and tar processesstream.pipe()between spawned processes (ssh stdout → tar stdin)setInterval(1s dashboard timer running throughout)--expose-gcflag with explicitglobal.gc()calls- ESM modules (
"type": "module")
Possibly related
- CVE-2025-59466: async_hooks stack overflow causing unrecoverable process termination. Fixed in v24.13.0, but the symptoms (process death bypassing all error handlers) are identical. This application does not explicitly use async_hooks, but dependencies or Node.js internals may.
- The crash only manifests after long-running operations (~25 min), suggesting a time-dependent or GC-cycle-dependent trigger.
Dependencies
{
"dotenv": "^16.5.0",
"js-yaml": "^4.1.0",
"ssh2": "^1.17.0",
"yargs": "^17.7.2"
}Note: ssh2's native crypto build is removed via postinstall (rm -rf node_modules/ssh2/lib/protocol/crypto/build), forcing the pure JS fallback.
Expected behavior
The process should either:
- Continue running normally, or
- If an error occurs, trigger
uncaughtException/unhandledRejection/exithandlers and produce a diagnostic report
Actual behavior
The process silently vanishes with no output, no error handlers triggered, and no diagnostic report generated.
Workaround
Use Node.js v22 LTS instead of v24.