Field notes

What changed after testing a prompt secret scanner against realistic false positives.

Prompt Leak Guard started as a small local scanner for API keys in AI prompts. The hard part was not adding more patterns. The hard part was deciding which matches should stay quiet.

The useful version is conservative.

A prompt leak scanner lives in an uncomfortable spot. If it is too quiet, it misses the exact mistake it exists to catch. If it is too loud, people stop trusting it and paste around it.

The current build is intentionally pattern-based and local. It checks text in the browser for common high-risk shapes: AI provider keys, cloud credentials, package tokens, webhook URLs, private key blocks, validated JWTs, credential-bearing database URLs, signed URLs, payment cards, emails, phone numbers, and dashed US SSNs.

The product claim is deliberately narrow: it reduces obvious accidental prompt leaks. It is not formal DLP, not a secret inventory, and not a substitute for rotating exposed credentials.

The false positives mattered more than the pattern count.

Early versions were too eager around generic token-looking strings. That sounds harmless until you paste a normal trace ID, UUID, documentation example, placeholder key, or known test card and the scanner treats it like production risk.

What got suppressed

  • Placeholder values like your_api_key_here, <token>, ${TOKEN}, changeme, and masked passwords.
  • Official documentation examples and synthetic sample values that are useful for testing but should not scare a user.
  • UUID-shaped request IDs and trace IDs that can look token-ish but are not secrets by themselves.
  • Invalid JWT-shaped strings and incomplete SAS-looking fragments.
  • Known Stripe test card values and bare database URLs that do not include credentials.

Conservative does not mean silent. Credential-bearing database URLs still flag. Valid JWTs still flag. Private keys still flag. The goal is fewer noisy warnings without hiding the obvious bad cases.

The scanner stays local on purpose.

The core tradeoff is privacy versus intelligence. A remote model or cloud API could reason about more context, but the text being scanned may already contain the sensitive value. Sending that text somewhere else would undermine the point.

Prompt Leak Guard uses static JavaScript and offline license activation. The scanner itself has no backend service, no analytics SDK, no remote model call, and no upload of prompt text. The free web demo uses the same local detector style so people can test the workflow before touching checkout.

What I would still like tested.

The useful feedback is specific: a real-looking pattern that should have fired, or a normal string that should have stayed clear. Broad requests to "detect everything" are not actionable, because secret formats are messy and many random strings look suspicious out of context.

  • Cloud/provider tokens that have stable prefixes or reliable surrounding context.
  • Common false positives from logs, CI output, support tickets, and AI debugging prompts.
  • Cases where sanitized output keeps too little or too much structure for the prompt to remain useful.
  • Supported-site warning behavior in real prompt boxes across ChatGPT, Claude, Gemini, and Perplexity.

The demo is here: Prompt Leak Guard demo. The installable launch build is here: Prompt Leak Guard product page.