OpenAI Privacy Filter
OpenAI · OpenAI Privacy Filter
OpenAI's Apache 2.0 open-weight model for local PII detection and redaction workflows.
Overview
Freshness note: Model capabilities, limits, and availability can change quickly. This profile is a point-in-time snapshot last verified on April 24, 2026.
OpenAI Privacy Filter is an open-weight model for detecting and redacting personally identifiable information in text. OpenAI released it on April 22, 2026 under the Apache 2.0 license for teams that want privacy filtering to run locally before data enters training, indexing, logging, support review, or other AI pipelines.
This is not a general chat model. It is a specialized token-classification model for privacy infrastructure.
Capabilities
Privacy Filter is designed for high-throughput PII detection in messy unstructured text. OpenAI highlights context-aware detection, local execution, long inputs, single-pass processing, and configurable operating points for recall and precision.
The model predicts spans across categories such as private persons, addresses, emails, phone numbers, URLs, private dates, account numbers, and secrets. That makes it relevant for log scrubbing, dataset preparation, support-ticket review, enterprise search preprocessing, and guardrails around AI ingestion.
Technical Details
OpenAI describes Privacy Filter as a bidirectional token-classification model with span decoding. It begins from an autoregressive pretrained checkpoint, then is adapted into a token classifier over a fixed privacy-label taxonomy. It labels tokens in one pass and decodes spans with a constrained Viterbi procedure.
The released model supports up to 128,000 tokens of context and has 1.5B total parameters with 50M active parameters. In this repository, maxOutput: 0 is intentional because the model outputs labeled spans rather than free-form generated text.
Pricing & Access
Privacy Filter is available under Apache 2.0 on Hugging Face and GitHub. It is intended for experimentation, customization, fine-tuning, and commercial deployment.
There is no normal hosted OpenAI API pricing for this model in the sources used for this snapshot. Treat it as a local or self-managed open-weight privacy component unless OpenAI later exposes a hosted endpoint.
Best Use Cases
Use Privacy Filter when the first requirement is reducing personal-data exposure before documents, logs, or datasets move into broader AI systems. It is strongest as one layer in a privacy-by-design pipeline.
Do not treat it as a compliance certificate, a full anonymization system, or a substitute for domain policy. OpenAI explicitly notes that high-sensitivity legal, medical, and financial workflows still need human review, evaluation, and often domain-specific tuning.
Comparisons
- Rule-based PII detection: Simpler and predictable for known formats, but weaker on context-dependent private references.
- General LLM redaction prompts: More flexible, but usually slower, costlier, and harder to run before data leaves a local environment.
- Self-hosted open-weight filters: Privacy Filter is notable because it combines open weights, long context, and a narrow privacy taxonomy from OpenAI.