PII Redaction Pipeline
by Microsoft · unknown · Last verified 2026-03-17
An automated pipeline that leverages Microsoft Presidio to identify and remove personally identifiable information (PII) from text and structured data. It supports configurable entity recognizers for GDPR and HIPAA compliance and features a reversible pseudonymization capability with a secure vault for authorized re-identification.
https://github.com/microsoft/presidio ↗B
B—Above Average
Adoption: B+Quality: AFreshness: ACitations: BEngagement: F
Specifications
- License
- MIT
- Pricing
- unknown
- Capabilities
- PII detection in unstructured text, PII detection in structured data, Data redaction (replacement with a placeholder), Data pseudonymization (replacement with a fake but consistent value), Reversible de-identification via a secure vault, Configurable entity recognition (e.g., names, addresses, SSN), Support for GDPR and HIPAA specific entities, Secure key management for re-identification, Batch and stream processing of data, Customizable redaction and masking rules
- Integrations
- [object Object], [object Object], [object Object]
- Use Cases
- [object Object], [object Object], [object Object], [object Object], [object Object]
- API Available
- Yes
- Language
- python
- Dependencies
- presidio-analyzer, presidio-anonymizer, spacy, fastapi, psycopg2
- Environment
- Python 3.10+
- Est. Runtime
- 1-5 minutes per 100k records
- Tags
- pii-redaction, data-masking, data-anonymization, pseudonymization, microsoft-presidio, privacy-enhancing-technology, pet, gdpr-compliance, hipaa-compliance, nlp, entity-recognition, data-pipeline
- Added
- 2026-03-17
- Completeness
- 0.85%
Index Score
61.7Adoption
72
Quality
87
Freshness
88
Citations
62
Engagement
0