Incident Intelligence

Trovix Trace©

Automated root cause analysis for your infrastructure.

When something breaks — a site goes down, CPU spikes, a PHP error floods the log — Trace investigates autonomously. It collects the evidence, reasons through it with AI, and delivers a structured root cause analysis with immediate remediation steps. No manual log-hunting. No guesswork.

Built for
SREs · DevOps engineers · Platform teams · CTOs who are also SREs
Deployment
Python agent on your Linux server · Apache · No cloud account required
Pricing
From £150/month · ~2p per investigation
Setup time
20 minutes · One command · One API key

The problem Trace solves

When your site goes down at 2am, the investigation is manual, sequential and slow. You check the error log. You check CPU. You check the database. You check Apache. You cross-reference timestamps. By the time you know what happened, you have spent forty minutes and the incident has already resolved itself — leaving you with no documented root cause and no prevention plan.

Trace runs that investigation automatically, in parallel, the moment a trigger fires. It collects evidence from every relevant source simultaneously, sends it to an AI model that reasons across the whole picture, and produces a structured report — severity rating, root cause, immediate commands to run, and a prevention recommendation — in under 60 seconds.

How Trovix Trace© works

01

Detect

Trace monitors HTTP endpoints, server metrics (CPU, memory, disk), Apache error logs, PHP fatal errors, MySQL connectivity and service health. Any threshold breach or error pattern triggers an investigation.

02

Collect

In parallel: tail error logs, snapshot top processes, check all HTTP endpoints, query service status, read recent syslog entries. Everything timestamped and correlated.

03

Reason

The collected evidence is structured into a prompt and sent to an AI model. The model reasons across metrics, logs, service states and timing to identify the most probable root cause — not just the loudest symptom.

04

Report

A structured RCA report is written to your dashboard: severity (CRITICAL / HIGH / MEDIUM / LOW / HEALTHY), summary, root cause with evidence references, immediate remediation commands, and a prevention recommendation.

05

Learn

Every investigation is stored. Over time, Trace builds a record of every incident your infrastructure has experienced — what happened, why, and what was done about it.

What Trace monitors

HTTP Endpoints

Checks your URLs every 60 seconds. Detects 4xx/5xx errors, timeouts and slow responses (>5s). Fires an investigation the moment an endpoint returns an error.

Server Metrics

CPU percentage, memory usage, disk usage. Configurable thresholds. CPU is checked with a streak requirement (2 consecutive readings above threshold) to avoid noise.

Apache Error Log

Tails the Apache error log in real time. Detects PHP Fatal errors, PHP Parse errors, segmentation faults, permission denials and disk quota errors.

PHP Errors

Separate tracking of PHP fatal and parse errors from the Joomla/PHP layer. Timestamp-correlated with HTTP failures to identify cause vs symptom.

MySQL / MariaDB

Connectivity check on every investigation. Log tail for query errors, replication failures and connection exhaustion.

Service Health

Checks `systemctl is-active` for Apache, MySQL/MariaDB and PHP-FPM. A stopped service is always an immediate CRITICAL trigger.

Kubernetes / Docker

Optional. If configured, checks pod health, crash loops and recent events across namespaces.

Cloud Providers

Optional. AWS CloudWatch, GCP Monitoring, Azure Monitor can be connected via environment variables for cross-layer correlation.

The Trace dashboard

Every investigation is stored and accessible from a secure web dashboard on your own domain. No third-party SaaS. No data leaving your server. The dashboard auto-refreshes every 60 seconds and shows the full AI analysis for each incident.

Severity timeline

Every investigation rated CRITICAL / HIGH / MEDIUM / LOW / HEALTHY. At a glance, see whether your infrastructure health is trending better or worse.

Full AI analysis

Click any incident to read the complete structured root cause analysis — what broke, why, what to do now, and how to prevent recurrence.

Live metrics

Latest CPU, memory and disk readings from the most recent investigation displayed at the top of the dashboard.

Manual investigations

Trigger an investigation on demand from the command line whenever you suspect a problem — even if automated thresholds haven't fired yet.

Trace in the Trovix product family

AI Governance

Trovix Audit©

Where Trace investigates infrastructure incidents, Audit monitors the performance and governance of your AI systems. Together they cover the full operational stack.

Open Audit© →
Regulatory Intelligence

Trovix Watch©

Watch monitors regulators and policy changes at the firm level. Trace monitors your infrastructure at the server level. Both alert you before a problem becomes a crisis.

Open Watch© →
AI Products

OpsPilot & AgentOps

Trace is built on the same agent architecture as the OpsPilot and AgentOps modules in the Trovix AI catalogue. Those modules extend the concept to enterprise Kubernetes and multi-cloud environments.

Browse AI catalogue →

Stop hunting through logs at 2am.

Trace installs in 20 minutes and investigates your first real incident before the end of today. Setup call not required — but available if you want one.