Architecture
How ProxMenux Monitor is packaged, what runs inside the AppImage, and how requests flow from the browser through the Flask backend to the host's tooling and SQLite store.
One process, many responsibilities
Request flow
From the browser to the kernel, every dashboard view follows the same path:
Each request is authenticated by JWT (when auth is enabled), dispatched to a blueprint, and answered with data collected on demand from host tooling. If Fail2Ban is installed and the proxmenux jail is active, the middleware also checks the request against the jail's banned IP list. The optional reverse proxy is transparent to Flask — it forwards X-Forwarded-* headers and the app recovers the real client IP from them. State that needs to outlive a request lives in SQLite.
The same process also runs four background threads started at boot — they don't serve HTTP, they push state into SQLite or into the notification queue while the host is up:
| Thread | Cadence | Job |
|---|---|---|
| _temperature_collector_loop | 60 s | Records CPU temperature and a network-latency sample into the history DB so the dashboard graphs have data even when no client is connected. |
| _health_collector_loop | 5 min | Runs the full Health Monitor cycle (10 categories), persists active errors, dismissals and disk observations, and feeds new events into the notification engine. |
| _vital_signs_sampler | ~1 s | High-frequency CPU + temperature sampler used for live widgets in the Overview panel. |
| notification_manager.start() | event-driven | Spawns the journal / task / hook watchers (JournalWatcher, TaskWatcher, ProxmoxHookWatcher) and dispatches to configured channels with optional AI rewriting. |
systemd unit
The installer drops a unit at /etc/systemd/system/proxmenux-monitor.service. Default content:
[Unit]
Description=ProxMenux Monitor - Web Dashboard
After=network.target
[Service]
Type=simple
User=root
WorkingDirectory=/opt/proxmenux-monitor
ExecStart=/opt/proxmenux-monitor/ProxMenux-Monitor.AppImage
Restart=on-failure
RestartSec=10
Environment="PORT=8008"
[Install]
WantedBy=multi-user.targetUser=root— required: SMART,pvesh, journal scopes, ZFS commands and the web terminal all need root.Restart=on-failurewith a 10-second back-off — non-zero exits relaunch automatically.After=network.target— waits for the host network stack to be online.
Inspect the live unit
systemctl cat proxmenux-monitor.service # show the unit content systemctl status proxmenux-monitor.service # state + recent log journalctl -u proxmenux-monitor.service -f # follow live
What the AppImage contains
The AppImage is a self-mounting filesystem. AppRun at the root sets up the environment and execs flask_server.py:
AppDir/
├── AppRun # entrypoint: sets PATH/LD_LIBRARY_PATH, exec flask_server
├── usr/
│ ├── bin/
│ │ ├── flask_server.py # main process
│ │ ├── flask_*_routes.py # Flask blueprints (auth, health, terminal, …)
│ │ ├── auth_manager.py # JWT + TOTP + API tokens
│ │ ├── health_monitor.py # 10-category checker
│ │ ├── health_persistence.py # SQLite layer
│ │ ├── notification_manager.py # orchestrator
│ │ ├── notification_channels.py # Telegram, Discord, Email, …
│ │ ├── notification_templates.py # message rendering + AI hook
│ │ ├── notification_events.py # JournalWatcher, TaskWatcher, …
│ │ ├── ai_providers/ # OpenAI · Anthropic · Gemini · Groq · Ollama · OpenRouter
│ │ ├── proxmox_storage_monitor.py # storage pool inspection
│ │ ├── hardware_monitor.py # CPU/PCIe/GPU enumeration
│ │ ├── ipmitool, sensors, upsc # vendored hardware tools
│ │ └── …
│ ├── lib/python3/ # vendored Python deps (Flask, JWT, psutil, …)
│ └── share/ # icons + .desktop file
└── web/ # Next.js static export
├── index.html
├── _next/ # JS / CSS chunks
└── manifest.json # PWA manifestTwo consequences of this layout:
- No host Python pollution. The vendored interpreter and packages are isolated inside the AppImage — upgrading the host's system Python doesn't affect the Monitor and vice-versa.
- Hardware tools are bundled too.
ipmitool,lm-sensorsandupscship inside the AppImage so the dashboard can read out-of-band sensors and UPS state without forcing the user to install Debian packages.
Flask app structure
flask_server.py creates a single Flask(__name__) instance, enables CORS, and registers six blueprints plus a WebSocket initializer:
| Blueprint / module | Routes prefix | Owns |
|---|---|---|
| flask_server.py | /api/system /api/storage /api/network /api/vms /api/hardware /api/logs /api/prometheus | Core data endpoints + static dashboard serving + optional Fail2Ban app-level check (active only when Fail2Ban is installed on the host with the proxmenux jail). |
| flask_auth_routes.py | /api/auth/* | Login, JWT issuing, TOTP setup/verify, password change, API token generation. |
| flask_health_routes.py | /api/health/* | Public health probe, detailed status, active / dismissed errors, suppression settings. |
| flask_terminal_routes.py | /api/terminal/* + WS | PTY allocation per session and WebSocket pipe to xterm.js in the browser. |
| flask_notification_routes.py | /api/notifications/* | Channel CRUD, test-send, AI provider config, history, manual sends. |
| flask_security_routes.py | /api/security/* | Authentication failures and, when Fail2Ban is installed, jail status, ban events and manual unban. |
| flask_proxmenux_routes.py | /api/proxmenux/* | Reads which ProxMenux post-install optimizations are installed on the host. |
| flask_oci_routes.py | /api/oci/* | OCI / container app deployment helpers (Proxmox VE 9.1+). |
The full endpoint list with request / response shapes is in API Reference.
Data sources
Nothing is collected from a custom agent — the Monitor reads the same files and runs the same commands a human admin would:
| Source | Used for |
|---|---|
| psutil | CPU load, memory, swap, mountpoint usage, NIC counters, process list. |
| pvesh / qm / pct | Proxmox node info, VM and CT inventory and config, storage pools, task history. |
| smartctl | SATA / NVMe attributes, SMART health, wear / lifetime, model and serial. |
| zpool / zfs | Pool state (ONLINE / DEGRADED / FAULTED / UNAVAIL), scrub progress, dataset usage. |
| journalctl | System logs, OOM kills, ATA / NVMe / dm errors, security events, custom service units. |
| ip / iproute2 | Interfaces, addresses, bridges, bonds, OVS-managed devices. |
| nvidia-smi · intel_gpu_top | GPU utilisation, VRAM, temperature, encoder / decoder load. |
| lspci · lscpu · dmidecode | PCIe topology, CPU model and topology, board and BIOS info. |
| ipmitool · sensors | Out-of-band sensors, fan speeds, board temperatures (when supported). |
| upsc (NUT) | UPS battery state, load, runtime — when a NUT server is configured on the host. |
Output is cached — not every request hits the host
smartctl -a, pvesh get) are wrapped in time-bound caches inside the Flask process so a busy dashboard tab doesn't hammer the disk or the cluster API. The cache TTLs are tuned per source (a few seconds for live metrics, several minutes for SMART).Persistence
Two filesystem locations split state by sensitivity:
| Path | Owner | Contents |
|---|---|---|
| /usr/local/share/proxmenux/health_monitor.db | root:root | SQLite DB. Tables: errors, events, disk_registry, disk_observations, user_settings, notification_history, excluded_storages, excluded_interfaces. WAL journal mode. |
| /usr/local/share/proxmenux/.notification_key | root 0600 | 32-byte XOR key used to encrypt sensitive notification settings before storing them in the DB (Telegram tokens, AI API keys, etc.). |
| /root/.config/proxmenux-monitor/auth.json | root:root | Authentication state: enabled flag, username, SHA-256 password hash, TOTP secret, backup codes, list of issued API tokens, list of revoked token hashes. |
| /var/log/proxmenux-auth.log | root:root | Plain-text auth event log. Always written. If Fail2Ban is installed with the [proxmenux] jail, the jail reads this file to ban brute-force attempts; if not, the file simply accumulates the log entries. |
Back up auth.json before reinstalling
/root/.config/proxmenux-monitor/auth.json and /usr/local/share/proxmenux/health_monitor.db intact. If you restore from a host backup, keep both files together — the API tokens stored in auth.json are validated against JWT_SECRET; if the DB and auth.json get out of sync, dismissed errors and stored tokens may misbehave.Health Monitor cycle
Every 5 minutes health_monitor.py runs a deterministic cycle across the ten categories shown on the dashboard:
- Critical PVE services (
pveproxy,pvedaemon,pvestatd,pve-cluster). - Proxmox storage pools (
pvesh get /storage+ per-storage availability). - Disks and filesystems: SMART, dmesg I/O errors, ZFS pool health, mountpoint capacity.
- VMs and CTs: failed starts, crashed guests, QMP errors, shutdown failures.
- Network: bridge / bond status, link state, latency to the gateway.
- Updates: pending package upgrades and security patches.
- Logs: persistent / spike / cascade pattern detection in the system journal.
- Memory: OOM killer activity, sustained high pressure.
- Temperature: CPU / chassis sensors against vendor thresholds.
- Security: authentication failures, ban events, fail2ban jail status.
Each finding is normalised into a stable error_key + category + severity. The persistence layer deduplicates against the existing errors table — repeated events update last_seen and the occurrence counter without spamming notifications.
The cycle also auto-resolves stale errors using the per-category Suppression Duration setting, cleans up errors for resources that no longer exist (deleted VMs / removed disks / unmounted storages), and prunes the events log older than 30 days. The full catalogue of categories and the dashboard view that surfaces them is documented in Dashboard → Health Monitor.
Notification engine
notification_manager.py is the orchestrator. It loads the configured channels, owns the delivery queue, and exposes both a Python API (for Flask routes and the Health Monitor cycle) and a CLI entrypoint (for the .sh hook scripts shipped with ProxMenux).
- Watchers push events:
JournalWatchertails the system journal,TaskWatcherpolls the Proxmox task list,ProxmoxHookWatcherreacts to backup / replication / snapshot hooks, andPollingCollectorhandles slow data sources. - Templates turn an event into a (title, body) pair. The same template can run through the configured AI provider (OpenAI / Anthropic / Gemini / Groq / Ollama / OpenRouter) to produce a plain-language rewrite; both versions are stored in
notification_history. - Channels deliver messages: Telegram, Discord, Email, Gotify and Apprise (multi-channel). Each is implemented in
notification_channels.pybehind the samecreate_channel()/send()interface, so adding a new channel is a single class. - Encryption. Sensitive settings (
telegram.token,discord.webhook_url,ai_api_key_*,email.password) are XOR-encrypted with the key in.notification_keybefore being written to the DB. Plaintext never touches disk.
Per-event toggles, channel overrides and AI configuration are surfaced in Settings → Notifications and Settings → AI Assistant.
WebSocket terminal
The Terminal tab in the dashboard is a thin xterm.js client wired to a server-side PTY through a WebSocket. Two transport modes:
- HTTP mode (default): Flask's development server with
flask-sockhandles upgrade requests. Good enough for LAN / direct access. - HTTPS / WSS mode: when an SSL certificate is configured, the process switches to
gevent.pywsgi.WSGIServerwithgeventwebsocket.handler.WebSocketHandler, so WebSockets work over TLS without polyfills.
The PTY is a child of the Flask process, so it inherits User=root from the unit. Every terminal request goes through JWT auth; the user must already be logged in to the dashboard before a PTY is allocated.
If you access the Monitor through a reverse proxy, make sure WebSocket forwarding is enabled (the Upgrade and Connection headers). Without it the terminal won't work.
Reverse proxy & Fail2Ban
Two safeguards make sure security works the same way whether the dashboard is hit directly or through a reverse proxy:
- Real client IP recovery. A
before_requesthook readsX-Forwarded-ForandX-Real-IPin that order, falling back torequest.remote_addr. The recovered address is what auth logging and rate limiting see. This is always on. - Application-level Fail2Ban check (optional). When the dashboard sits behind a proxy, the kernel firewall can't block the real attacker IP — the connection always comes from the proxy. To plug that gap, the same hook above queries the
proxmenuxFail2Ban jail every 30 seconds, caches the banned IP set, and short-circuits requests from those IPs with HTTP 403 inside Flask.
Fail2Ban is not bundled
fail2ban-client binary or the proxmenux jail is absent, the call fails silently and requests are not gated — auth still applies, but no IP-level banning.Reverse-proxy snippets (Nginx / Caddy / Traefik) and the Fail2Ban jail walkthrough are in Access & Authentication and Security → Fail2Ban.
Where to next
- Access & Authentication — first-launch setup, password + TOTP 2FA, reverse-proxy snippets, Fail2Ban jail.
- API Reference — every endpoint, token management, security best-practices.
- Settings → ProxMenux Monitor — the in-menu service toggle and status verification flow inside the ProxMenux TUI.