OpenAI Privacy Filter

OpenAI · OpenAI Privacy Filter

OpenAI's Apache 2.0 open-weight model for local PII detection and redaction workflows.

Type
language
Context
128K tokens
Max Output
N/A
Status
preview
API Access
No
License
Apache 2.0
privacy pii-detection redaction open-weights token-classification security
Released April 2026 · Updated April 24, 2026

Overview

Freshness note: Model capabilities, limits, and availability can change quickly. This profile is a point-in-time snapshot last verified on April 24, 2026.

OpenAI Privacy Filter is an open-weight model for detecting and redacting personally identifiable information in text. OpenAI released it on April 22, 2026 under the Apache 2.0 license for teams that want privacy filtering to run locally before data enters training, indexing, logging, support review, or other AI pipelines.

This is not a general chat model. It is a specialized token-classification model for privacy infrastructure.

Capabilities

Privacy Filter is designed for high-throughput PII detection in messy unstructured text. OpenAI highlights context-aware detection, local execution, long inputs, single-pass processing, and configurable operating points for recall and precision.

The model predicts spans across categories such as private persons, addresses, emails, phone numbers, URLs, private dates, account numbers, and secrets. That makes it relevant for log scrubbing, dataset preparation, support-ticket review, enterprise search preprocessing, and guardrails around AI ingestion.

Technical Details

OpenAI describes Privacy Filter as a bidirectional token-classification model with span decoding. It begins from an autoregressive pretrained checkpoint, then is adapted into a token classifier over a fixed privacy-label taxonomy. It labels tokens in one pass and decodes spans with a constrained Viterbi procedure.

The released model supports up to 128,000 tokens of context and has 1.5B total parameters with 50M active parameters. In this repository, maxOutput: 0 is intentional because the model outputs labeled spans rather than free-form generated text.

Pricing & Access

Privacy Filter is available under Apache 2.0 on Hugging Face and GitHub. It is intended for experimentation, customization, fine-tuning, and commercial deployment.

There is no normal hosted OpenAI API pricing for this model in the sources used for this snapshot. Treat it as a local or self-managed open-weight privacy component unless OpenAI later exposes a hosted endpoint.

Best Use Cases

Use Privacy Filter when the first requirement is reducing personal-data exposure before documents, logs, or datasets move into broader AI systems. It is strongest as one layer in a privacy-by-design pipeline.

Do not treat it as a compliance certificate, a full anonymization system, or a substitute for domain policy. OpenAI explicitly notes that high-sensitivity legal, medical, and financial workflows still need human review, evaluation, and often domain-specific tuning.

Comparisons

  • Rule-based PII detection: Simpler and predictable for known formats, but weaker on context-dependent private references.
  • General LLM redaction prompts: More flexible, but usually slower, costlier, and harder to run before data leaves a local environment.
  • Self-hosted open-weight filters: Privacy Filter is notable because it combines open weights, long context, and a narrow privacy taxonomy from OpenAI.