What types of keywords work best for splitting?

Consistent, unique phrases that appear exactly once per section work best. Examples include 'Invoice Number', 'Page 1 of', 'EXHIBIT', 'Dear', or document ID prefixes. Avoid very common words that appear many times per section.

Does the tool work on scanned PDFs?

Split by Text requires machine-readable text. Scanned image PDFs must be processed with OCR first to extract text. Use dokk.ai's OCR tool on the scanned PDF before applying Split by Text.

Can I use a regular expression as the split keyword?

Yes. Enable the regex option and enter a pattern such as 'Invoice #\d+' to match any invoice number, or '^(January|February|March)' to split at month names at the start of a line.

What happens if the keyword does not appear in the document?

If no matches are found, the tool returns the original PDF unchanged and shows a warning indicating that the keyword was not detected. Check spelling and ensure the PDF contains machine-readable text.

What is the difference between Split by Text and Split by Bookmarks?

Split by Bookmarks uses the structural outline embedded in the PDF (created by the document author). Split by Text uses the actual page content to find split points dynamically. Use Bookmarks for structured documents with a formal outline; use Split by Text for auto-generated batch exports where consistent keywords are present but bookmarks may be absent.

Can I include or exclude the keyword page from the output?

Yes. You can configure whether the page containing the keyword becomes the first page of the next output file or the last page of the previous output file. You can also choose to discard separator pages entirely if they contain no meaningful content.

How many split points can the tool handle?

There is no hard limit on split points. The tool has been tested on documents with over 1,000 keyword occurrences, producing over 1,000 output files in a single ZIP archive.

Are the output files named automatically?

Output files are named sequentially by default (e.g., 'split_001.pdf', 'split_002.pdf'). If the keyword match contains a unique identifier (such as an invoice number), that value can optionally be used in the filename.

Can I extract pages from only some keyword matches?

The standard workflow splits at every match. For selective extraction, use Extract Pages after identifying the page ranges you need from the split preview, or use Split by Page Range for manual control.

Is there a file size limit?

dokk.ai accepts PDFs up to 200 MB. For larger batch exports, consider splitting the source file in half first and then applying Split by Text to each half.

Totes les eines

Divideix per text

Divideix quan el text canvia entre pàgines

1Puja

2Configura

3Processa

Drop file here

PDF, Word, Excel, PowerPoint, images up to 25 MB

Característiques principals

Splits PDF at every page containing a specified keyword or phrase
Case-insensitive text matching by default
Optional regular expression pattern support for variable markers
Choice to include or exclude the keyword page in the preceding or following output file
Works with native text PDFs and OCR-processed scanned documents
Outputs sequentially numbered files or a ZIP archive
Handles PDFs with hundreds of split points
Preserves all content including images
fonts
and annotations
Browser-based with no installation required
Secure TLS upload and automatic deletion within 60 minutes

Casos d'ús

Splitting a batch invoice export into individual invoice PDFs
Dividing a bulk form scan at each 'Form ID' separator page
Splitting a daily report bundle at each 'Date:' header
Extracting individual patient letters from a mail-merge export
Dividing a legal transcript at each 'EXHIBIT' marker
Splitting a training manual at each 'Module' heading
Isolating individual shipment records from a logistics manifest PDF
Splitting a scanned bank statement batch at each account number
Dividing a merged test result PDF at each student name
Extracting individual policies from a combined insurance document batch

Com s'utilitza

1Upload the PDF that contains repeated text markers you want to use as split points.
2Enter the keyword or phrase to split on. Enable case-insensitive matching if the capitalisation varies, or enter a regular expression for variable patterns.
3Choose whether the page containing the keyword starts the next output file or ends the previous one — this determines where separator pages land.
4Click Process. dokk.ai scans every page, identifies all matches, and splits the document at each occurrence.
5Download the individual split files or a ZIP archive. Files are named sequentially and each corresponds to one section between keyword occurrences.

When a PDF is a batch export containing multiple documents concatenated together — hundreds of invoices in a single file, a day's worth of scanned forms, or an auto-generated report where each section starts with a known heading — splitting it by a fixed keyword is far faster than manually identifying page ranges. Split by Text scans each page for a phrase you specify and creates a new output file every time that phrase appears, effectively using the document's own content as its split map. This is the tool that completes accounts payable automation, form processing pipelines, and bulk document distribution workflows. An accounting system exports 500 invoices as a single PDF; Split by Text finds 'INVOICE NUMBER' on each separator page and produces 500 individual invoice files. A medical records system batches patient letters; the tool splits at 'Dear Patient' to produce one letter per patient. A logistics company receives daily manifests where each shipment begins with a barcode label containing 'SHIPMENT ID'; the tool isolates each shipment into its own file for downstream processing. You can choose whether the keyword page itself is included in the output file or discarded — useful for separator pages that carry no meaningful content of their own. Case-insensitive matching ensures you do not need to worry about capitalisation variations in auto-generated documents. Regular expression patterns are supported for advanced use cases where the split marker is variable, such as 'Invoice #\d+' matching any invoice number. Split by Text complements Split by Bookmarks for documents that lack a formal outline but have consistent textual markers instead. If your documents have both, bookmarks are usually more reliable since they are structural rather than content-based. For maximum flexibility, combine the two approaches: split by bookmarks at the chapter level, then split by text within chapters to isolate individual records. All file processing occurs on dokk.ai's secure infrastructure. Files are deleted within 60 minutes and never used for machine learning or shared with third parties. The output files are standard PDFs compatible with every reader, printer, and document management system.

Preguntes freqüents

Seguretat i privadesa

The document text is scanned only to find the split keyword and is not stored or indexed. All files are transferred over TLS and deleted within 60 minutes of processing. dokk.ai is GDPR compliant and never uses document content for training or analytics.