PRIVACY-SAFE DATA FOR AI

Make sensitive data safe to use for AI

GraphReplica replaces the sensitive PII entities in your data with realistic synthetic data.

The same person or entity gets the same replacement across your tables, documents, and images in one pass. One replacement, matched across all data formats.

TablesDocumentsImages
GraphReplica
0% leakage

Original

Name
Sofia Martinez
Email
sofia.martinez@example.edu
Zip
94107
School
Bayview University
Employer
Northstar Robotics
Role
ML Infrastructure Intern

Replica

Name
Elena Cruz
Email
elena.cruz@example.edu
Zip
94110
School
Pacifica University
Employer
Orion Robotics
Role
Data Infrastructure Intern
0%
PII Leakage
98%
Join Consistency
94%
Task Utility

Proven privacy

Zero leakage you can prove

Train a model on raw enterprise data and it memorizes what it sees. In stress tests, unprotected models leaked about 27.5% of injected sensitive values. Data built with GraphReplica leaked none. Every run ships an audit-ready report you can hand to your legal and security teams.

See how it works

PII leakage stress test

Inject unique values, then measure leakage

Unprotected model27.5% leaked
GraphReplica0% leaked

0%

PII leaked across stress tests

Use cases

Put safe data to work

GraphReplica unblocks the work that real data used to block. One safe replica that stays realistic and usable.

Train and evaluate AI agents

Build realistic environments to train, evaluate and red-team agents on data that behaves like production.

Unblock coding agents and BI

Point coding tools and BI at a safe replica instead of waiting on privacy review.

Safe demos, QA and staging

Stand up demo, test and staging data without exposing real customers.

License and sell data

Sell or license datasets to AI labs and partners without exposing PII, PHI or IP.

Keep joins across tables

Keep foreign keys and relationships intact across many tables and files.

Resolve identity across docs

Match the same real entity across email, PDFs, spreadsheets and notes.

Test for PII and PHI leakage

Prove whether sensitive data leaked with membership inference and canary tests.

Move past privacy review

Ship AI work in days instead of waiting on long privacy reviews.

Interactive demo

Try GraphReplica

Step through a graph-preserved replica, then move the slider to see why masking breaks data and replicas keep it usable.

Input formats

XLSXExcel / CSV

Sofia Martinez · sofia.martinez@example.edu · 94107

PDFResume

Sofia Martinez · Bayview University · Northstar Robotics

JSONLRecruiter notes

{ candidate: "Sofia Martinez", zip: "94107" }

TXTInterview feedback

Sofia Martinez, strong infra skills. ID cand_0192

Privacy-aware entity graph

Entity graph appears after extraction

Original vs replica

OriginalReplica
Sofia Martinezpending
sofia.martinez@example.edupending
94107pending
Bayview Universitypending
Northstar Roboticspending
ML Infrastructure Internpending
100%
PII leakage
Join consistency
Task utility

Why GraphReplica

Why teams choose GraphReplica

Flagship

Same entity, same stand-in, everywhere

GraphReplica finds the sensitive entities in your data and replaces only those. The same real person, customer, employee or account gets the same stand-in across every file, table and document. This holds across millions of records and years of history. Random replacement breaks this. Masking leaves nothing usable.

Person

Elena Cruz

School
Employer
ZIP
Candidate ID

Joins and relationships stay intact after replacement

Joins survive

Foreign keys and relationships stay intact across many tables and documents. Your downstream joins and queries still work.

Runs in your environment

A container that runs in your cloud, data center or Databricks. Your data never leaves. Every run is air-gapped and audit-ready.

100M+

records held consistent in testing, with zero false identity merges.

Unstructured data

Works on free text too

Toggle between the source and the safe replica. Hover any value to trace the same entity across the whole conversation.

11 PII and PHI values detected
Customer

Hi, this is Sofia Martinez. My member ID is MEM-4471 and my date of birth is 1986-04-12.

Agent

Thanks Sofia. I found your plan under sofia.martinez@example.com. How can I help?

Customer

I have a question about claim CLM-22817 for my cardiology follow-up with Dr. Anya Sharma.

Agent

I see the visit on 2025-03-09. Is your phone still +1 555-010-2211?

Customer

Yes. And I moved to 42 Pine Ave, 94107.

Same entity, same stand-in. Non-sensitive text like the cardiology follow-up stays exactly as it was.

Deployment

Runs inside your environment

Your data never leaves

GraphReplica ships as a container that runs in your cloud, data center or Databricks. Your data never reaches Secludy. Every run is air-gapped. Input is read-only and output is yours. Scale it with Kubernetes.

release_leak_check.json
# every run ships this gate
{
  "gate_passed": true,
  "total_leak_count": 0
}

Release blocked if any original value survives

GDPR

Ready

CCPA

Ready

HIPAA

Ready

Compliance

Built to meet your requirements

Meet data protection requirements across the EU, US and APAC with one integration. Built to meet GDPR, CCPA and HIPAA requirements. No customer data is retained after processing. Inputs stay read-only and outputs stay in your environment.

Talk to our team

Quick start

Run it where your data lives

Mount a folder. Run one command. Get safe, source-shaped files plus audit reports. Setup takes about an hour.

One command in your environment

Mount an input folder and an output folder. GraphReplica writes masked, source-shaped files plus audit-ready reports. It runs as a batch job. No web server. No data leaves.

Key points

  • Runs in your cloud, data center or Databricks
  • Supports CSV, JSON, Parquet, XLSX, SQLite, DOCX and email
  • Source-shaped output keeps your rows, columns and formats
  • Every run ships an audit-ready report
Talk to our team
# run GraphReplica on your data
docker run --rm \
-v ./input:/input:ro \
-v ./output:/output \
secludy/graphreplica:latest \
--input-dir /input \
--output-dir /output \
--profile pilot_safe
# leak-check gate
{ "gate_passed": true, "total_leak_count": 0 }

How it works

From messy data to a safe replica

GraphReplica runs five stages and gates every release on a leak check. No release ships if an original value survives.

01

Detect

Find sensitive entities across messy multi-format sources.

02

Resolve

Group the records that refer to the same real entity. Surface conflicts.

03

Replace

Swap only the sensitive entities for consistent realistic stand-ins.

04

Validate

Run a leak check and a consistency check on the output.

05

Report

Produce audit-ready detection, replacement and risk reports.

0%

PII leaked in stress tests

100M+

records held consistent

0.9 F1

detection and replacement

Within 5%

of real-data utility

Start here

Start here

Pilot

Run GraphReplica on your own data. Get your first safe dataset in under a week. Setup takes about an hour.

Start a pilot
  • Runs in your environment
  • Source-shaped outputs
  • Audit-ready reports
  • Hands-on setup support

Production scale

Enterprise

Production scale across your data estate. Holds consistency across 100M+ records. Scale with Kubernetes.

Talk to sales
  • 100M+ record scale
  • Databricks and Kubernetes
  • Custom business-ID policies
  • Priority support

Support

Frequently asked questions

Everything you need to know about GraphReplica and how it keeps your data safe and usable

GraphReplica finds the sensitive entities in your data and replaces only those with realistic stand-ins. Everything that is not sensitive stays exactly as it was. The same real entity gets the same stand-in across every file, table and document.

Masking removes values and leaves your data unusable. GraphReplica swaps sensitive values for realistic stand-ins so the data still reads naturally and your downstream work still runs.

Random replacement turns one person into different people across files and breaks your joins. GraphReplica keeps the same stand-in for the same entity everywhere so relationships survive.

No. GraphReplica runs as a container in your cloud, data center or Databricks. Your data never reaches Secludy. Every run is air-gapped.

CSV, JSON, JSONL, TXT, Markdown, Parquet, XLSX, SQLite, DOCX, EML and MBOX. PDFs are read for detection but are not rewritten in place.

GraphReplica resolves which records refer to the same real entity and builds an entity graph. It merges only on strong evidence and surfaces conflicts. The same entity then receives the same stand-in across long context and many file types.

Every run includes a leak check that blocks the release if any original value survives. You also get membership inference and canary tests plus audit-ready reports for your legal and security teams.

GraphReplica holds consistency across 100M+ records and many file types. It scales with Kubernetes on CPU or GPU.

GraphReplica is built to meet GDPR, CCPA and HIPAA requirements. No customer data is retained after processing.

Book a demo and we will run GraphReplica on a sample of your data. Setup takes about an hour and you get your first safe dataset in under a week.

Still have questions? We are here to help.

Book a demo

See GraphReplica on your data

Book a demo and we will run a safe replica on a sample. Setup takes about an hour.