← Blog

How we keep it fast with 10 repos and 8 terminals open

A desktop app that manages 10 repos, 8 terminals, and constant git operations can't afford to be slow. Here's the performance work we did under the hood.

The challenge

TUICommander isn't a simple terminal emulator. At any given moment it might be tracking 10 repositories with file watchers, running 8 PTY sessions with real-time output parsing, polling GitHub for PR status, maintaining MCP connections to upstream servers, and rendering a WebGL-accelerated terminal — all in a desktop app built on Tauri.

When we first started scaling to multiple repos and agents, things got sluggish. Branch switching would hang for seconds. Terminal output would stutter during burst output from agents. The sidebar would freeze while git commands ran. We had to rethink how every layer of the stack handles concurrency and I/O.

Async git: stop blocking the event loop

The single biggest improvement came from converting all git operations to async. TUICommander runs about 25 different git commands — status, diff, log, branch list, remote info, stash list, and more. Originally these ran synchronously, which meant a slow git diff on a large repo would block Tokio worker threads and freeze the entire UI.

Now every git command runs via tokio::task::spawn_blocking, keeping the async runtime free for IPC events and UI updates. We also merged sequential subprocess calls where possible — get_changed_files used to spawn 2 git processes in sequence, now it's 1. The result: git operations never block the UI, even on repos with thousands of changed files.

Watcher-driven cache: git results in 0.2ms

Most git information doesn't change between keystrokes. But the old approach was polling with a 5-second TTL — every IPC call would check if the cache was stale, and if so, spawn a git subprocess (~20-30ms). With 10 repos polling independently, that's a lot of unnecessary git processes.

The new approach is event-driven. A file system watcher (repo_watcher) monitors each repo and invalidates the cache immediately when files change. The TTL was raised to 60 seconds as a safety net for missed watcher events. In practice, most IPC calls now hit cache and return in ~0.2ms instead of spawning git. The difference is dramatic — the sidebar and git panel feel instant because they usually are.

PTY write coalescing: 60 renders/sec, not 600

AI agents produce output in bursts — hundreds of write events per second during streaming. The naive approach calls terminal.write() for every PTY event, which triggers a xterm.js render pass and WebGL texture upload each time. At 600 events/sec, the terminal becomes a bottleneck.

We switched to write coalescing via requestAnimationFrame. PTY data accumulates in a buffer and gets flushed once per animation frame (~60/sec). The terminal renders the same content but with dramatically fewer render passes and texture uploads. The visual result is identical — smoother, actually — but CPU and GPU usage during burst output dropped significantly.

Process detection without fork-bombing

TUICommander detects which process is running in each terminal (to show the agent name in the tab). The old approach used ps — literally forking a new process to check each terminal. With 5 terminals polling every few seconds, that's ~100 fork+exec operations per minute, each spawning a shell to parse process tables.

Now we use direct syscalls: proc_pidpath on macOS, /proc/pid/comm on Linux. Zero forks, zero subprocess overhead. The process name comes from the kernel in microseconds instead of milliseconds.

Concurrency done right

Several smaller optimizations compound. The MCP upstream client switched from Mutex to RwLock — tool calls (the common case) now take a read lock and run concurrently, with only reconnects requiring a write lock. PTY parsed events eliminated double serialization — serde_json::to_value runs once and the result is reused for both Tauri IPC and the event bus. The PTY read buffer increased from 4KB to 64KB for natural batching of burst output.

The StatusBar timer (which updates rate limit and PR merge countdowns) only runs when there's something to count down. During normal operation, this eliminates ~60 unnecessary signal writes per minute. The ActivityDashboard removed { equals: false } from its snapshot signal, letting SolidJS's default equality check skip unnecessary re-renders of the session list every 10 seconds.

Frontend: load only what you need

On the frontend side, Vite's manualChunks splits the bundle into separate chunks: xterm, CodeMirror, diff-view, and markdown each load independently. Heavy panels like Settings, ActivityDashboard, and Help are lazy-loaded with SolidJS lazy() and Suspense — they don't add to initial load time unless you open them.

The branch select operation — which used to cause multi-second freezes after hours of use — was optimized in three places: getDiffStats runs fire-and-forget instead of blocking, terminal adoption uses a pre-computed Set for O(N) instead of O(N×M), and saved terminal resume commands run in parallel with Promise.all instead of sequential awaits.

GPU-accelerated rendering

The terminal renderer uses xterm.js with WebGL acceleration. On macOS, Tauri's WKWebView composites through Metal — terminal glyphs are rendered as GPU textures, not CPU-painted bitmaps. This matters when you have 6 split panes all receiving burst output simultaneously. The GPU handles the rendering pipeline while the CPU stays free for git operations and PTY I/O.

When WebGL isn't available (some Linux configurations, remote browser sessions), the renderer falls back gracefully to a canvas-based mode. Performance is lower but still usable — the app adapts instead of crashing. Tab switches trigger clearTextureAtlas followed by a full refresh to prevent gradual recomposition artifacts that would otherwise accumulate over long sessions.

Cross-platform without compromise

TUICommander runs on macOS, Windows, and Linux — and each platform has its own performance characteristics. Process detection uses libproc on macOS and CreateToolhelp32Snapshot on Windows — both native APIs, no shell forks. Shell escaping adapts to cmd.exe on Windows. IDE detection probes .app bundles on macOS, registry entries on Windows, and PATH on Linux.

Local MCP connections use Unix domain sockets on macOS/Linux and named pipes (\\.\pipe\tuicommander-mcp) on Windows — both faster than TCP for local IPC. TCP ports are reserved for remote access only, avoiding unnecessary network stack overhead for local operations.

Credentials are stored in the OS keyring — macOS Keychain, Windows Credential Manager, Linux Secret Service — instead of plaintext files. Tailscale TLS certificate provisioning uses the Local API (Unix socket) on macOS/Linux and falls back to CLI on Windows. Every platform-specific path was chosen for the lowest overhead available on that OS.

Remote access: keeping it fast over the network

When you access TUICommander remotely (via Tailscale or direct connection), performance over the network becomes critical. Terminal output streams through WebSocket with a structured event format — parsed events (status changes, suggestions, questions) are extracted server-side so the client doesn't need to re-parse raw terminal output. This reduces bandwidth and client-side CPU.

The Activity Dashboard throttles its snapshot to every 10 seconds, preventing constant re-renders and WebSocket messages when multiple agents are active. New items and removals trigger immediate refreshes, but steady-state updates batch efficiently. The PTY read buffer at 64KB means burst output is sent in fewer, larger WebSocket frames instead of hundreds of tiny ones.

For the mobile PWA specifically, the VT100-extracted clean line format strips escape codes server-side, sending only the visible text. The log view uses a 500-line rolling buffer — enough to see what's happening without sending the entire scrollback over a cellular connection.

The result

TUICommander today runs comfortably with 10+ repos and 8+ terminals open for full workdays. Branch switching is near-instant. Terminal output stays smooth during agent bursts. The sidebar and git panel respond within a frame. And it stays that way after 8 hours of continuous use — no progressive degradation, no memory creep, no accumulating event listeners.

Performance isn't a feature you ship once. It's a constraint you maintain every day. Every new feature gets measured against the baseline: does the sidebar still respond in one frame? Does branch switching still feel instant? If not, we optimize before we merge.