PRIVACY-SAFE DATA FOR AI
Make sensitive data safe to use for AI
GraphReplica replaces the sensitive PII entities in your data with realistic synthetic data.
The same person or entity gets the same replacement across your tables, documents, and images in one pass. One replacement, matched across all data formats.
Original
- Name
- Sofia Martinez
- sofia.martinez@example.edu
- Zip
- 94107
- School
- Bayview University
- Employer
- Northstar Robotics
- Role
- ML Infrastructure Intern
Replica
- Name
- Elena Cruz
- elena.cruz@example.edu
- Zip
- 94110
- School
- Pacifica University
- Employer
- Orion Robotics
- Role
- Data Infrastructure Intern
Proven privacy
Zero leakage you can prove
Train a model on raw enterprise data and it memorizes what it sees. In stress tests, unprotected models leaked about 27.5% of injected sensitive values. Data built with GraphReplica leaked none. Every run ships an audit-ready report you can hand to your legal and security teams.
See how it worksPII leakage stress test
Inject unique values, then measure leakage
0%
PII leaked across stress tests
Use cases
Put safe data to work
GraphReplica unblocks the work that real data used to block. One safe replica that stays realistic and usable.
Train and evaluate AI agents
Build realistic environments to train, evaluate and red-team agents on data that behaves like production.
Unblock coding agents and BI
Point coding tools and BI at a safe replica instead of waiting on privacy review.
Safe demos, QA and staging
Stand up demo, test and staging data without exposing real customers.
License and sell data
Sell or license datasets to AI labs and partners without exposing PII, PHI or IP.
Keep joins across tables
Keep foreign keys and relationships intact across many tables and files.
Resolve identity across docs
Match the same real entity across email, PDFs, spreadsheets and notes.
Test for PII and PHI leakage
Prove whether sensitive data leaked with membership inference and canary tests.
Move past privacy review
Ship AI work in days instead of waiting on long privacy reviews.
Try GraphReplica
Step through a graph-preserved replica, then move the slider to see why masking breaks data and replicas keep it usable.
Input formats
Sofia Martinez · sofia.martinez@example.edu · 94107
Sofia Martinez · Bayview University · Northstar Robotics
{ candidate: "Sofia Martinez", zip: "94107" }
Sofia Martinez, strong infra skills. ID cand_0192
Privacy-aware entity graph
Entity graph appears after extraction
Original vs replica
| Original | Replica |
|---|---|
| Sofia Martinez | pending |
| sofia.martinez@example.edu | pending |
| 94107 | pending |
| Bayview University | pending |
| Northstar Robotics | pending |
| ML Infrastructure Intern | pending |
Why GraphReplica
Why teams choose GraphReplica
Flagship
Same entity, same stand-in, everywhere
GraphReplica finds the sensitive entities in your data and replaces only those. The same real person, customer, employee or account gets the same stand-in across every file, table and document. This holds across millions of records and years of history. Random replacement breaks this. Masking leaves nothing usable.
Person
Elena Cruz
Joins and relationships stay intact after replacement
Joins survive
Foreign keys and relationships stay intact across many tables and documents. Your downstream joins and queries still work.
Runs in your environment
A container that runs in your cloud, data center or Databricks. Your data never leaves. Every run is air-gapped and audit-ready.
100M+
records held consistent in testing, with zero false identity merges.
Works on free text too
Toggle between the source and the safe replica. Hover any value to trace the same entity across the whole conversation.
Hi, this is Sofia Martinez. My member ID is MEM-4471 and my date of birth is 1986-04-12.
Thanks Sofia. I found your plan under sofia.martinez@example.com. How can I help?
I have a question about claim CLM-22817 for my cardiology follow-up with Dr. Anya Sharma.
I see the visit on 2025-03-09. Is your phone still +1 555-010-2211?
Yes. And I moved to 42 Pine Ave, 94107.
Same entity, same stand-in. Non-sensitive text like the cardiology follow-up stays exactly as it was.
Deployment
Runs inside your environment
Your data never leaves
GraphReplica ships as a container that runs in your cloud, data center or Databricks. Your data never reaches Secludy. Every run is air-gapped. Input is read-only and output is yours. Scale it with Kubernetes.
# every run ships this gate { "gate_passed": true, "total_leak_count": 0 }
Release blocked if any original value survives
GDPR
Ready
CCPA
Ready
HIPAA
Ready
Compliance
Built to meet your requirements
Meet data protection requirements across the EU, US and APAC with one integration. Built to meet GDPR, CCPA and HIPAA requirements. No customer data is retained after processing. Inputs stay read-only and outputs stay in your environment.
Talk to our teamQuick start
Run it where your data lives
Mount a folder. Run one command. Get safe, source-shaped files plus audit reports. Setup takes about an hour.
One command in your environment
Mount an input folder and an output folder. GraphReplica writes masked, source-shaped files plus audit-ready reports. It runs as a batch job. No web server. No data leaves.
Key points
- Runs in your cloud, data center or Databricks
- Supports CSV, JSON, Parquet, XLSX, SQLite, DOCX and email
- Source-shaped output keeps your rows, columns and formats
- Every run ships an audit-ready report
# run GraphReplica on your datadocker run --rm \-v ./input:/input:ro \-v ./output:/output \secludy/graphreplica:latest \--input-dir /input \--output-dir /output \--profile pilot_safe# leak-check gate{ "gate_passed": true, "total_leak_count": 0 }
How it works
From messy data to a safe replica
GraphReplica runs five stages and gates every release on a leak check. No release ships if an original value survives.
Detect
Find sensitive entities across messy multi-format sources.
Resolve
Group the records that refer to the same real entity. Surface conflicts.
Replace
Swap only the sensitive entities for consistent realistic stand-ins.
Validate
Run a leak check and a consistency check on the output.
Report
Produce audit-ready detection, replacement and risk reports.
PII leaked in stress tests
records held consistent
detection and replacement
of real-data utility
Start here
Pilot
Run GraphReplica on your own data. Get your first safe dataset in under a week. Setup takes about an hour.
Start a pilot- Runs in your environment
- Source-shaped outputs
- Audit-ready reports
- Hands-on setup support
Production scale
Enterprise
Production scale across your data estate. Holds consistency across 100M+ records. Scale with Kubernetes.
Talk to sales- 100M+ record scale
- Databricks and Kubernetes
- Custom business-ID policies
- Priority support
Support
Frequently asked questions
Everything you need to know about GraphReplica and how it keeps your data safe and usable
GraphReplica finds the sensitive entities in your data and replaces only those with realistic stand-ins. Everything that is not sensitive stays exactly as it was. The same real entity gets the same stand-in across every file, table and document.
Masking removes values and leaves your data unusable. GraphReplica swaps sensitive values for realistic stand-ins so the data still reads naturally and your downstream work still runs.
Random replacement turns one person into different people across files and breaks your joins. GraphReplica keeps the same stand-in for the same entity everywhere so relationships survive.
No. GraphReplica runs as a container in your cloud, data center or Databricks. Your data never reaches Secludy. Every run is air-gapped.
CSV, JSON, JSONL, TXT, Markdown, Parquet, XLSX, SQLite, DOCX, EML and MBOX. PDFs are read for detection but are not rewritten in place.
GraphReplica resolves which records refer to the same real entity and builds an entity graph. It merges only on strong evidence and surfaces conflicts. The same entity then receives the same stand-in across long context and many file types.
Every run includes a leak check that blocks the release if any original value survives. You also get membership inference and canary tests plus audit-ready reports for your legal and security teams.
GraphReplica holds consistency across 100M+ records and many file types. It scales with Kubernetes on CPU or GPU.
GraphReplica is built to meet GDPR, CCPA and HIPAA requirements. No customer data is retained after processing.
Book a demo and we will run GraphReplica on a sample of your data. Setup takes about an hour and you get your first safe dataset in under a week.
Still have questions? We are here to help.
Book a demoSee GraphReplica on your data
Book a demo and we will run a safe replica on a sample. Setup takes about an hour.
