The audit log as evidence
The audit log is the central piece of evidence the system produces. Every API call lands as one JSON record, with no bearer in plain text and no PHI by default. This page explores what claims the log can support and which it can’t — useful framing whether you’re the one writing it, the one reading it, or the one auditing the team that does both.
What the log records
Section titled “What the log records”Every invocation of api_call writes one record on the
mcsinglewire.audit logger at INFO. The shape and the field
semantics live in
Read the audit log; this page is
about what those fields, taken together, demonstrate.
A complete record demonstrates four things:
- An attempt was made. The fact that a record exists — at all — is evidence that the LLM (and through it, the human user) tried to invoke this operation.
- The attempt was authorised. The presence of a non-empty
bearer_fpproves a bearer was held; an attempt without one would logbearer_fp: "anonymous"and the OAuth Proxy wouldn’t have routed the call toapi_callin the first place. - The attempt either succeeded or was refused — and which.
status: 200witherror: nullis success.status: "rejected"with a populatederrorfield names the gate that caught it (e.g."operation_denylisted").status: "write_blocked"means the transport-layer last line of defence fired. - It happened at a specific time. The container log line prefix carries the timestamp; the JSON record itself is the payload.
These four together are enough to answer the straightforward compliance questions: who tried what, when, what came back.
What the log does not record
Section titled “What the log does not record”The list of things the log explicitly omits is as important as the list of things it includes. None of these are accidental gaps; each is deliberate.
- The bearer token. Only a SHA-256 fingerprint (first 16 hex chars) appears. The token never reaches the log. This means an audit log file is not a credential — losing it doesn’t compromise the system.
- Response bodies. The log says “operation X returned status
200, took 138ms” — not “and the response was
{...}”. Response data may include patient identifiers, device-specific config, and similar; logging it would turn the audit log into PHI by default, which is the opposite of what an audit log should be. - Anything PHI in
query_params(when redaction is enabled). SettingSINGLEWIRE_AUDIT_REDACT_QUERY=1keeps the parameter keys but replaces the values with"<redacted>". None of the bundled spec’s GET endpoints accept PHI as a query parameter, but the redaction is there for deployments that customize. - The LLM’s interpretation. The audit log shows that
getIpSpeakerswas called and returned 87 devices. It does not show that the LLM then summarised them as “everything is green”. The summary is the LLM’s output, not ours. - The user’s question. The log records the operationId chosen by the LLM, not the natural-language question that prompted it. “Find offline speakers” and “show me devices that haven’t checked in” both result in the same log entry if the LLM picks the same operation.
The last two together are the most important to internalise: the audit log evidences the API surface, not the conversation surface.
Two kinds of compliance question
Section titled “Two kinds of compliance question”Treating the audit log as evidence makes it useful for some compliance questions and inadequate for others. Knowing the difference saves arguments later.
”Did anything mutate?”
Section titled “”Did anything mutate?””This question the log can answer definitively. A complete
record set with no error: "method_not_allowed" and no
status: "write_blocked" lines is empirical evidence that no
write reached the upstream API.
If those error codes do appear, that’s also useful information: it means the LLM tried a write and the system caught it. The attempts are bounded; the success rate is zero.
”Was every read appropriate?”
Section titled “”Was every read appropriate?””This question the log answers partially. It tells you which operations were called and by which bearer fingerprint. It does not tell you whether each call was necessary, justified, or responsive to a real ops question.
To get from “this call happened” to “this call was appropriate”, you need a second source — the chat transcript on the client side, the operator’s notes, or the case the audit accompanies. The log is one piece of evidence, not the only piece.
This is why Trusting the interpreter is a separate page: the cultural and procedural surface around the log matters as much as the log itself.
Persistence is your problem
Section titled “Persistence is your problem”The server writes records to stdout. What happens after that
depends on your container logging pipeline. The default
json-file driver works for development but doesn’t rotate
aggressively, doesn’t ship anywhere, and isn’t a credible
long-term archive.
For production, you want one of two patterns:
- Ship to a SIEM. Use a Docker logging driver that forwards to
Splunk, Datadog, Elastic, or whatever your security team already
reads. Configured per-service in
docker-compose.yml. - Periodic export to immutable storage. A daily cron job that pulls the previous day’s records, writes them to an append-only bucket (S3 with object lock, etc.), and timestamps the result. Cheap and forensically credible.
The default deployment doesn’t do either. That’s a deliberate choice — production logging pipelines vary too much to ship a default that’s right for everyone — but it’s a gap the deployer needs to close before the audit log can serve as long-term evidence.
A worked example
Section titled “A worked example”Suppose a clinical-engineering tech says they audited the IP speaker fleet last Tuesday. You want to verify.
- Did they make any calls?
grep mcsinglewire.auditover the Tuesday window. Records present? Yes/no. - From which session? Pick one record. The
bearer_fpis stable per Singlewire user; themcp_client_iddistinguishes Claude Code installations. - What did they look at?
jq -r '.operation_id'on the filtered set. You’ll see things likegetIpSpeakers,openapi_describe,health— the discovery workflow plus the actual reads. - Did anything fail? Filter to
select(.error != null). If it’s empty, every call succeeded.
What you cannot answer from the log alone:
- What did they conclude? That’s in the chat transcript on their side, or in the report they wrote.
- Did they look at all 87 IP speakers, or just the first page? You see the parameters passed; you don’t see what the LLM did with the response.
For most compliance flows, “the log shows what API surface was read, by whom, at what time” is enough. For deeper audits, pair the log with the human’s notes. The two together are the picture.