Skip to main content

Redaction

PDFDancer provides secure redaction capabilities for permanently removing sensitive content from PDFs. Unlike simple overlays that can be removed, redaction actually replaces the underlying content with placeholder text or shapes.


Redacting Text

Paragraphs

Redact paragraphs to replace text content with a placeholder string.

from pdfdancer import PDFDancer

with PDFDancer.open("document.pdf") as pdf:
# Find sensitive content
paragraphs = pdf.page(1).select_paragraphs_starting_with("SSN:")

if paragraphs:
# Redact with default replacement "[REDACTED]"
paragraphs[0].redact()

# Or use custom replacement text
paragraphs[0].redact(replacement="[CONFIDENTIAL]")

pdf.save("redacted.pdf")

Text Lines

Individual text lines can also be redacted.

with PDFDancer.open("document.pdf") as pdf:
# Find text lines matching a pattern
lines = pdf.page(1).select_text_lines_matching(r"\d{3}-\d{2}-\d{4}")

for line in lines:
line.redact(replacement="XXX-XX-XXXX")

pdf.save("redacted.pdf")

Redacting Images

When redacting images, the image is replaced with a solid color placeholder rectangle in the same position and size.

from pdfdancer import Color, PDFDancer

with PDFDancer.open("document.pdf") as pdf:
images = pdf.page(1).select_images()

for image in images:
# Redact image (replaced with black rectangle by default)
image.redact()

pdf.save("redacted.pdf")

Redacting Vector Paths

Vector graphics and paths can be redacted similarly to images.

with PDFDancer.open("document.pdf") as pdf:
paths = pdf.page(1).select_paths()

for path in paths:
path.redact()

pdf.save("redacted.pdf")

Redacting Form Fields

Form fields containing sensitive data can also be redacted.

with PDFDancer.open("form.pdf") as pdf:
# Find form field by name
fields = pdf.select_form_fields_by_name("social_security")

if fields:
fields[0].redact(replacement="[REMOVED]")

pdf.save("redacted.pdf")

Batch Redaction

For redacting multiple objects at once, use the document-level redact() method. This is more efficient than redacting objects one by one.

from pdfdancer import Color, PDFDancer

with PDFDancer.open("document.pdf") as pdf:
# Collect objects to redact
objects_to_redact = []

# Add sensitive paragraphs
ssn_paragraphs = pdf.select_paragraphs_matching(r"SSN.*\d{3}-\d{2}-\d{4}")
objects_to_redact.extend(ssn_paragraphs)

# Add sensitive images
page_images = pdf.page(1).select_images()
objects_to_redact.extend(page_images)

# Batch redact all objects
result = pdf.redact(
objects_to_redact,
replacement="[REDACTED]",
placeholder_color=Color(0, 0, 0) # Black for images/paths
)

print(f"Redacted {result.count} objects")
print(f"Success: {result.success}")

pdf.save("redacted.pdf")

Redaction Response

The redaction methods return a response object with information about the operation:

PropertyDescription
successWhether the redaction completed successfully
countNumber of objects that were redacted
warningsAny warnings generated during redaction

Important Notes

  • Permanent removal: Redaction permanently removes the original content from the PDF. Once the operation succeeds, the redacted data cannot be recovered. Always check result.success before assuming data was redacted:
# Python
result = paragraph.redact()
if result:
print("Content permanently redacted")
else:
print("Redaction failed - original content preserved")
// TypeScript
const result = await paragraph.redact();
if (result.success) {
console.log("Content permanently redacted");
} else {
console.log("Redaction failed - original content preserved");
}
// Java
boolean success = paragraph.redact();
if (success) {
System.out.println("Content permanently redacted");
} else {
System.out.println("Redaction failed - original content preserved");
}
  • Text replacement: For text objects, the original text is replaced with the placeholder string.
  • Image/path replacement: For non-text objects, a solid color rectangle replaces the original content.
  • Save required: Remember to save the document after redacting to persist changes.