Skip to content

The audit log as evidence

The audit log is the central piece of evidence the system produces. Every API call lands as one JSON record, with no bearer in plain text and no PHI by default. This page explores what claims the log can support and which it can’t — useful framing whether you’re the one writing it, the one reading it, or the one auditing the team that does both.

Every invocation of api_call writes one record on the mcsinglewire.audit logger at INFO. The shape and the field semantics live in Read the audit log; this page is about what those fields, taken together, demonstrate.

A complete record demonstrates four things:

  1. An attempt was made. The fact that a record exists — at all — is evidence that the LLM (and through it, the human user) tried to invoke this operation.
  2. The attempt was authorised. The presence of a non-empty bearer_fp proves a bearer was held; an attempt without one would log bearer_fp: "anonymous" and the OAuth Proxy wouldn’t have routed the call to api_call in the first place.
  3. The attempt either succeeded or was refused — and which. status: 200 with error: null is success. status: "rejected" with a populated error field names the gate that caught it (e.g. "operation_denylisted"). status: "write_blocked" means the transport-layer last line of defence fired.
  4. It happened at a specific time. The container log line prefix carries the timestamp; the JSON record itself is the payload.

These four together are enough to answer the straightforward compliance questions: who tried what, when, what came back.

The list of things the log explicitly omits is as important as the list of things it includes. None of these are accidental gaps; each is deliberate.

  • The bearer token. Only a SHA-256 fingerprint (first 16 hex chars) appears. The token never reaches the log. This means an audit log file is not a credential — losing it doesn’t compromise the system.
  • Response bodies. The log says “operation X returned status 200, took 138ms” — not “and the response was {...}”. Response data may include patient identifiers, device-specific config, and similar; logging it would turn the audit log into PHI by default, which is the opposite of what an audit log should be.
  • Anything PHI in query_params (when redaction is enabled). Setting SINGLEWIRE_AUDIT_REDACT_QUERY=1 keeps the parameter keys but replaces the values with "<redacted>". None of the bundled spec’s GET endpoints accept PHI as a query parameter, but the redaction is there for deployments that customize.
  • The LLM’s interpretation. The audit log shows that getIpSpeakers was called and returned 87 devices. It does not show that the LLM then summarised them as “everything is green”. The summary is the LLM’s output, not ours.
  • The user’s question. The log records the operationId chosen by the LLM, not the natural-language question that prompted it. “Find offline speakers” and “show me devices that haven’t checked in” both result in the same log entry if the LLM picks the same operation.

The last two together are the most important to internalise: the audit log evidences the API surface, not the conversation surface.

Treating the audit log as evidence makes it useful for some compliance questions and inadequate for others. Knowing the difference saves arguments later.

This question the log can answer definitively. A complete record set with no error: "method_not_allowed" and no status: "write_blocked" lines is empirical evidence that no write reached the upstream API.

If those error codes do appear, that’s also useful information: it means the LLM tried a write and the system caught it. The attempts are bounded; the success rate is zero.

This question the log answers partially. It tells you which operations were called and by which bearer fingerprint. It does not tell you whether each call was necessary, justified, or responsive to a real ops question.

To get from “this call happened” to “this call was appropriate”, you need a second source — the chat transcript on the client side, the operator’s notes, or the case the audit accompanies. The log is one piece of evidence, not the only piece.

This is why Trusting the interpreter is a separate page: the cultural and procedural surface around the log matters as much as the log itself.

The server writes records to stdout. What happens after that depends on your container logging pipeline. The default json-file driver works for development but doesn’t rotate aggressively, doesn’t ship anywhere, and isn’t a credible long-term archive.

For production, you want one of two patterns:

  • Ship to a SIEM. Use a Docker logging driver that forwards to Splunk, Datadog, Elastic, or whatever your security team already reads. Configured per-service in docker-compose.yml.
  • Periodic export to immutable storage. A daily cron job that pulls the previous day’s records, writes them to an append-only bucket (S3 with object lock, etc.), and timestamps the result. Cheap and forensically credible.

The default deployment doesn’t do either. That’s a deliberate choice — production logging pipelines vary too much to ship a default that’s right for everyone — but it’s a gap the deployer needs to close before the audit log can serve as long-term evidence.

Suppose a clinical-engineering tech says they audited the IP speaker fleet last Tuesday. You want to verify.

  • Did they make any calls? grep mcsinglewire.audit over the Tuesday window. Records present? Yes/no.
  • From which session? Pick one record. The bearer_fp is stable per Singlewire user; the mcp_client_id distinguishes Claude Code installations.
  • What did they look at? jq -r '.operation_id' on the filtered set. You’ll see things like getIpSpeakers, openapi_describe, health — the discovery workflow plus the actual reads.
  • Did anything fail? Filter to select(.error != null). If it’s empty, every call succeeded.

What you cannot answer from the log alone:

  • What did they conclude? That’s in the chat transcript on their side, or in the report they wrote.
  • Did they look at all 87 IP speakers, or just the first page? You see the parameters passed; you don’t see what the LLM did with the response.

For most compliance flows, “the log shows what API surface was read, by whom, at what time” is enough. For deeper audits, pair the log with the human’s notes. The two together are the picture.