Why we replaced xterm.js with a native terminal
After months of patching WebGL atlas corruption, scroll jump races, and scrollbar visibility hacks, we replaced xterm.js with an Alacritty-based terminal rendered to Canvas. The result: fewer workaround files than we had bug fixes, and a rendering pipeline we actually control.
The xterm.js era
TUICommander launched with xterm.js. It's the industry standard: VS Code uses it, Hyper uses it, practically every browser-based terminal uses it. For a v1, it was the right call. We got a working terminal fast and could focus on everything else — agent detection, split panes, status dots, the whole orchestration layer.
But xterm.js is designed for browsers. We're a native desktop app wrapping a WebView. That mismatch started small and grew into a constant maintenance burden. Every month brought a new edge case that was technically our problem but practically xterm's.
Death by a thousand workarounds
Here's a partial list of xterm.js issues we patched over the course of six months:
- WebGL atlas corruption. After font changes or DPR changes (moving between monitors), glyphs turned into garbage. We had to detect the condition and do a full atlas rebuild — not
clearTextureAtlas(), which doesn't work reliably, but a complete teardown and reconstruction of the WebGL addon. We shipped a dedicated "Refresh terminal" shortcut (Cmd+Shift+L) just for this. - Scroll position jumps. Seven distinct root causes for the viewport jumping to line 0: escape-sequence jumps, buffer contraction drift,
baseYstaleness on idle sessions, alternate buffer corruption, hidden terminalviewportYdrift, hidden-to-visible transition guards, and WebGL atlas rebuild timing. Each one required a separate fix. We ended up building aViewportLocksystem that intercepts xterm's programmatic scrolls during writes and restores position viascrollToLine(). - Scrollbar visibility. xterm v6 auto-fades the scrollbar. That's fine for a chat widget — useless for a terminal where you need to see at a glance whether there's scrollback. CSS overrides didn't work because xterm's
_hide()early-returns before adding the.fadeclass on terminals that haven't been interacted with. We wrote a MutationObserver that watches xterm's internal DOM and forces inline opacity. A MutationObserver. For a scrollbar. - Double scrollbar jank. xterm v6 renders its own custom scrollbar widget, but a native scrollbar also appeared on the viewport div, updating out of sync. The result was a visible thumb-resize flash on every write.
- Copy trailing whitespace. Every copy path (Cmd+C, copy-on-select, keyboard handler) had to strip trailing spaces that xterm pads to the terminal width. Four copy paths, four places to strip.
- PTY write coalescing. Calling
terminal.write()on every PTY event (hundreds per second during burst output) caused WebGL texture uploads to stack up and drop frames. We built arequestAnimationFrameaccumulator to coalesce writes to ~60/sec. - Bell handling. xterm's built-in
bellStyledoesn't integrate with our notification system. We rewired it to anonBellhandler. - Fit dimensions.
fit()crashes on zero-size terminals. Guard added.
Each fix was individually reasonable. But the trend was clear: we were spending more time fighting the terminal renderer than building features. And every fix lived in a workaround file (webglLifecycle.ts, scrollbarFix.ts, scrollTracker.ts) that only existed because of xterm.
The real problem: we didn't own the grid
xterm.js owns the terminal grid. When you call terminal.write(data), the bytes go into xterm's parser, into xterm's grid model, and out through xterm's renderer. You get callbacks (onData, onRender, onScroll), but you never see the grid state directly. Want to know what's on row 15, column 30? You can ask for a buffer range, but it's a serialized copy — not the live state.
For a tool that builds features on top of terminal content — agent detection, output parsing, suggestion chips, intent extraction, scrollback search — this is backwards. We were feeding bytes to xterm so it could build a grid, then reading the grid back out through a serialization API to do our actual work. Two parsers, two grids, two sources of truth.
Worse: we already had the grid. Our Rust backend runs alacritty_terminal for PTY management and output parsing. The Alacritty grid is the authoritative source for everything — shell state detection, parsed events, VT log buffer, the scrollback overlay. xterm.js was a second grid that we fed the same bytes to, just for rendering.
CanvasTerminal: one grid, one renderer
The replacement is CanvasTerminal — a Canvas 2D renderer that reads directly from the Alacritty grid via Tauri IPC. No JavaScript terminal emulator in the loop. The Rust backend owns the grid, the frontend just paints it.
The rendering pipeline is simple:
- PTY bytes arrive in Rust, feed into
alacritty_terminal::Processor - Alacritty's damage tracker marks which rows changed
- Frontend requests only the damaged rows via IPC
- Canvas 2D draws the cells — text, colors, decorations, cursor
No WebGL. No atlas. No texture uploads. No requestAnimationFrame coalescing of writes. The renderer draws what the grid says, and the grid is always right because there's only one.
What we gained
Eliminated an entire class of bugs. Scroll jump races are gone — there's no xterm viewport state to desync from. Atlas corruption is gone — there's no atlas. Scrollbar hacks are gone — we draw our own scrollbar from the grid's scroll offset. The three workaround files (webglLifecycle.ts, scrollbarFix.ts, scrollTracker.ts) are deleted.
Single source of truth. The output parser, the suggestion system, the scrollback overlay, and the renderer all read from the same Alacritty grid. No more cross-checking between two terminal emulators.
In-band signalling. We patched VTE (the ANSI parser) to handle OSC 7770 — a custom escape sequence for state/suggest/intent signalling. The PTY stream carries structured events that never hit the grid, never need concealment, and never suffer from cross-chunk parsing issues. This was impossible with xterm.js because we couldn't touch its parser.
Native search. Alacritty has a built-in DFA regex engine that searches across grid + scrollback. We get regex search without building a search index or extracting text into JavaScript.
Native selection. selection_to_string() returns the selected text without trailing whitespace padding. One API call, correct output, no per-copy-path stripping.
Simpler codebase. We removed xterm.js, its WebGL addon, the fit addon, and all the lifecycle code around them. The CanvasTerminal component is self-contained — no addon initialization, no renderer fallback chains, no disposal races.
What we patched in Alacritty
We maintain a local fork of alacritty_terminal with a small number of patches. This sounds scary, but the patches are minimal and targeted:
- Resize without reflow. Ink-based TUIs (Claude Code, Codex) use cursor positioning that breaks when reflow merges or splits lines. We added a
reflowflag toresize(). - Public damage tracking. Made
mark_fully_damaged()public so we can force full-frame redraws when needed. - OSC 7770 handler. Our custom protocol for in-band state signalling. Fires an event from VTE through to the application layer without touching the grid.
- OSC 7 and OSC 133 routing. Current working directory and shell integration markers, routed from VTE to application events.
- Named color index mapping. A utility function that eliminates a 30-line match statement in our color serializer.
Five patches in alacritty_terminal, three in VTE. Each is a few lines. We can rebase on upstream releases in minutes. Compare that to the workaround surface area we had with xterm.js — it's not even close.
The tradeoff we accepted
Canvas 2D is slower than WebGL for pure text throughput. A terminal that does nothing but cat a 100MB file will scroll faster in xterm.js with its WebGL renderer. In practice, this doesn't matter for our use case — AI agents produce text at LLM speed (hundreds of tokens per second), not disk speed. The bottleneck is never rendering.
We also took on the responsibility of rendering text correctly. Ligatures, emoji, wide characters, bidirectional text — xterm.js handled all of this. We handle it now. For the subset our users need (code output, no bidi), this is straightforward. If someone needs Arabic terminal output, we'll cross that bridge.
Was it worth it?
Unambiguously yes. The migration took two weeks of focused work. In return, we eliminated a category of bugs that had been generating a new issue every month, removed an entire rendering layer from the architecture, and gained direct access to the grid for features we were already building. The codebase is simpler, the bugs are fewer, and the features we want to build next (OSC 133 semantic zones, prompt-aware rendering, structured output overlays) are now possible without fighting a JavaScript terminal emulator that wasn't designed for what we're doing.
If you're building a terminal-based desktop app and you're using xterm.js because "everyone does" — take a hard look at whether you actually need a browser-based terminal emulator. We didn't, and finding that out was the best engineering decision we've made this year.