← FailMemory home · Dashboard · Quickstart

Normalization (v1)

FailMemory caches failure patterns by a SHA-256 hash of a canonical form of the request. Two logically-identical requests must hash to the same value; two different requests must not. The normalization rules below are the contract that makes this work.

Normalization is versioned. The current version is norm_v1. Every pattern record stores its norm_version so a future v2 can coexist with v1 entries during migration without nuking the cache.

Why normalize at all

Two forces are in tension:

  1. Agents retrying the "same" request often vary trivial details — parameter order, cookie headers, auth tokens, a trailing slash — and the cache has to recognize them as the same thing or the hit rate tanks.
  2. PII and secrets must never enter the database. Once they do, GDPR obligations attach and the whole service turns into a compliance liability. PII is stripped at write time, not read time (D-038) — it never enters the system.

The rules (v1)

Applied in this order to every request before hashing:

  1. Lowercase the HTTP method. GET, get, Get all become get.
  2. Canonical URL encoding (RFC 3986). Percent-encoded characters are normalized to their canonical form.
  3. Strip trailing slash from the path. The root path / is preserved as a special case.
  4. Strip PII query parameters. Any query parameter whose name or value matches one of the PII patterns below is removed entirely:
    • email — values that look like [email protected]
    • phone — phone number values, bracketed or unbracketed, with or without country code
    • UUID — values matching the UUID v1–v5 pattern, case-insensitive
    • JWT — three base64url segments separated by dots
    • session tokens — param names like session_id, sessionToken, sess, or any value that is 32+ hex characters regardless of the param name
    • API keys — param names like api_key, access_token, apiToken, apikey
  5. Sort remaining query parameters alphabetically by name.
  6. Strip sensitive headers. Authorization, Cookie, and X-API-Key are removed from any captured header set before hashing.
  7. Sort JSON object keys recursively. Objects, nested objects, and objects inside arrays all have their keys sorted alphabetically at every level.
  8. Non-JSON payloads become the empty string. A raw text body, a binary blob, a null, or an unparseable JSON string are all coerced to "" before hashing. The cache key for "unknown payload" is stable.

The canonical form is then method + "\n" + url + "\n" + payload, and the hash is sha256(canonical).

Worked examples

Example 1 — email in query value is stripped

Input:

GET https://api.example.com/[email protected]&page=2

Canonical form:

get
https://api.example.com/users?page=2

The email param matches the email PII pattern and is removed entirely. The remaining page=2 is already the only param, so no re-sort is required. Method is lowercased.

Example 2 — trailing slash stripped and params sorted

Input:

GET https://api.example.com/api/v2/products/?page=1&limit=10

Canonical form:

get
https://api.example.com/api/v2/products?limit=10&page=1

Trailing slash on products/ is removed. page=1&limit=10 is re-sorted alphabetically to limit=10&page=1.

Example 3 — mixed PII and non-PII params

Input:

GET https://api.example.com/search?q=cats&api_key=secret&page=1&sort=asc

Canonical form:

get
https://api.example.com/search?page=1&q=cats&sort=asc

The api_key param is stripped (PII: API key). The remaining three params are sorted alphabetically.

Example 4 — UUID param stripped entirely

Input:

DELETE https://api.example.com/sessions?session_ref=123e4567-e89b-12d3-a456-426614174000

Canonical form:

delete
https://api.example.com/sessions

The UUID value triggers the UUID PII pattern. Since it was the only query param, the normalized URL has no query string at all.

Example 5 — recursive JSON key sort in the payload

Input:

POST https://api.example.com/nested
Body: { "z_top": { "z_inner": 1, "a_inner": 2 }, "a_top": { "b": 3, "a": 4 } }

Canonical form:

post
https://api.example.com/nested
{"a_top":{"a":4,"b":3},"z_top":{"a_inner":2,"z_inner":1}}

Keys are sorted at every level of the object tree. Arrays of objects get the same treatment — each element's keys are sorted in place, but array order is preserved.

Why these particular rules

The rules were picked to satisfy two tests:

Rules that don't clearly help one of these tests are out of scope for v1. Domain-specific normalizers (/foo/:id style placeholders) and user-configurable rules have been considered and rejected — they push complexity into the hot path without a commensurate hit-rate win.

norm_version and migrations

Every pattern record stores its norm_version (e.g. norm_v1). Lookups are scoped to the current version. When a rule changes, the version bumps and new reports land under norm_v2. Old norm_v1 entries continue to serve traffic during a migration window until they either expire naturally (via the 24-hour TTL) or are re-confirmed under the new rules. Callers never see a partially-migrated cache.