# Debugging Workflow

> **Core principle**: "修bug最关键的是定位问题" — the user cannot/don't-want-to describe bugs verbally. **Never ask the user to describe a bug.** Instead, run automated diagnostics.

## When a bug is reported

1. **Run `debug-snapshot.sh`** immediately (it exists on the server at `~/files/debug-snapshot.sh` and is symlinked to `~/debug-snapshot.sh`)
2. The script generates `/tmp/hermes-debug-*.tar.gz` containing:
   - System state (CPU, memory, disk)
   - Process health (Caddy, search API PID, port listening)
   - API live checks (health, stats, search test)
   - Config snapshots (Caddyfile, search_api.py, indexer.py — auto-redacted)
   - ChromaDB status (size, structure, collection count)
   - Network state (ports, connectivity to github/baidu)
   - Recent file changes (last 24h)

## When to use debug-snapshot vs higher-level checks

| Level | Tool | When |
|-------|------|------|
| 🔵 Quick health | `curl :8900/health` | Just want to know if API is alive |
| 🟢 Index status | `curl :8900/api/stats` | Check collection count after reindex |
| 🟡 Process check | `pgrep -a caddy && ss -tlnp | grep 8900` | Is the stack even running? |
| 🔴 Full diagnostic | `bash ~/debug-snapshot.sh` | Something is wrong and you don't know what |

## API endpoints (added for monitoring)

```
GET /health  → {"status":"ok","collection":11074,"model":true}
GET /api/stats → {"total_chunks":11074,"status":"ready"}
```

## Common failure patterns identified via snapshot

### Pattern A: API running but collection = 0
→ Indexer hasn't run, or index was deleted. Check `05-chroma-status.txt` for DB file size.

### Pattern B: Port 8900 not listening but python process exists
→ Script crashed during model load. Check `03-api-live-check.txt` / `api-journal.txt` for traceback.

### Pattern C: Caddy running, search returns 404
→ Route ordering issue. Check `04-caddy-config-live.txt` — more specific `handle_path` blocks must come before catch-all `file_server browse`.

### Pattern D: API responding but search returns empty results
→ Embedding dimension mismatch between indexer and API. Both must use all-MiniLM-L6-v2 (384-dim). Check configs.

### Pattern E: ChromaDB "readonly database" error
→ Two indexer processes ran concurrently. `pkill -f indexer.py` then restart.

## Debugging preferences (user: 小天)

- **Never ask "可以描述一下bug吗"** — run `debug-snapshot.sh` first, then analyze
- **If visual layout bugs**: ask user for screenshot or use Excalidraw/p5js to recreate
- **If API returns wrong data**: compare config snapshots vs expected behavior
- **If process won't start**: check logs directly, don't ask what error message says
