← FailMemory home · Dashboard · Quickstart
Normalization (v1)
FailMemory caches failure patterns by a SHA-256 hash of a canonical form of the request. Two logically-identical requests must hash to the same value; two different requests must not. The normalization rules below are the contract that makes this work.
Normalization is versioned. The current version is norm_v1. Every
pattern record stores its norm_version so a future v2 can coexist with
v1 entries during migration without nuking the cache.
Why normalize at all
Two forces are in tension:
- Agents retrying the "same" request often vary trivial details — parameter order, cookie headers, auth tokens, a trailing slash — and the cache has to recognize them as the same thing or the hit rate tanks.
- PII and secrets must never enter the database. Once they do, GDPR obligations attach and the whole service turns into a compliance liability. PII is stripped at write time, not read time (D-038) — it never enters the system.
The rules (v1)
Applied in this order to every request before hashing:
- Lowercase the HTTP method.
GET,get,Getall becomeget. - Canonical URL encoding (RFC 3986). Percent-encoded characters are normalized to their canonical form.
- Strip trailing slash from the path. The root path
/is preserved as a special case. - Strip PII query parameters. Any query parameter whose name or
value matches one of the PII patterns below is removed entirely:
- email — values that look like
[email protected] - phone — phone number values, bracketed or unbracketed, with or without country code
- UUID — values matching the UUID v1–v5 pattern, case-insensitive
- JWT — three base64url segments separated by dots
- session tokens — param names like
session_id,sessionToken,sess, or any value that is 32+ hex characters regardless of the param name - API keys — param names like
api_key,access_token,apiToken,apikey
- email — values that look like
- Sort remaining query parameters alphabetically by name.
- Strip sensitive headers.
Authorization,Cookie, andX-API-Keyare removed from any captured header set before hashing. - Sort JSON object keys recursively. Objects, nested objects, and objects inside arrays all have their keys sorted alphabetically at every level.
- Non-JSON payloads become the empty string. A raw text body, a
binary blob, a
null, or an unparseable JSON string are all coerced to""before hashing. The cache key for "unknown payload" is stable.
The canonical form is then method + "\n" + url + "\n" + payload, and
the hash is sha256(canonical).
Worked examples
Example 1 — email in query value is stripped
Input:
GET https://api.example.com/[email protected]&page=2
Canonical form:
get
https://api.example.com/users?page=2
The email param matches the email PII pattern and is removed entirely.
The remaining page=2 is already the only param, so no re-sort is
required. Method is lowercased.
Example 2 — trailing slash stripped and params sorted
Input:
GET https://api.example.com/api/v2/products/?page=1&limit=10
Canonical form:
get
https://api.example.com/api/v2/products?limit=10&page=1
Trailing slash on products/ is removed. page=1&limit=10 is re-sorted
alphabetically to limit=10&page=1.
Example 3 — mixed PII and non-PII params
Input:
GET https://api.example.com/search?q=cats&api_key=secret&page=1&sort=asc
Canonical form:
get
https://api.example.com/search?page=1&q=cats&sort=asc
The api_key param is stripped (PII: API key). The remaining three
params are sorted alphabetically.
Example 4 — UUID param stripped entirely
Input:
DELETE https://api.example.com/sessions?session_ref=123e4567-e89b-12d3-a456-426614174000
Canonical form:
delete
https://api.example.com/sessions
The UUID value triggers the UUID PII pattern. Since it was the only query param, the normalized URL has no query string at all.
Example 5 — recursive JSON key sort in the payload
Input:
POST https://api.example.com/nested
Body: { "z_top": { "z_inner": 1, "a_inner": 2 }, "a_top": { "b": 3, "a": 4 } }
Canonical form:
post
https://api.example.com/nested
{"a_top":{"a":4,"b":3},"z_top":{"a_inner":2,"z_inner":1}}
Keys are sorted at every level of the object tree. Arrays of objects get the same treatment — each element's keys are sorted in place, but array order is preserved.
Why these particular rules
The rules were picked to satisfy two tests:
- The PII test. Every rule in the "strip" category eliminates a
common class of secret or personal data that an agent might otherwise
paste into a URL without thinking. The header strip catches
AuthorizationandCookiebecause those are the top two sources of accidental auth leakage. - The hit-rate test. Every rule in the "canonicalize" category (method case, param order, trailing slash, JSON key sort) eliminates a source of false cache misses where the agent sent "the same" request and the cache failed to recognize it.
Rules that don't clearly help one of these tests are out of scope for
v1. Domain-specific normalizers (/foo/:id style placeholders) and
user-configurable rules have been considered and rejected — they push
complexity into the hot path without a commensurate hit-rate win.
norm_version and migrations
Every pattern record stores its norm_version (e.g. norm_v1).
Lookups are scoped to the current version. When a rule changes, the
version bumps and new reports land under norm_v2. Old norm_v1
entries continue to serve traffic during a migration window until they
either expire naturally (via the 24-hour TTL) or are re-confirmed
under the new rules. Callers never see a partially-migrated cache.