Skip to content
Totes les eines

Divideix per text

Divideix quan el text canvia entre pàgines

1Puja
2Configura
3Processa

Drop file here

PDF, Word, Excel, PowerPoint, images up to 25 MB

Característiques principals

  • Splits PDF at every page containing a specified keyword or phrase
  • Case-insensitive text matching by default
  • Optional regular expression pattern support for variable markers
  • Choice to include or exclude the keyword page in the preceding or following output file
  • Works with native text PDFs and OCR-processed scanned documents
  • Outputs sequentially numbered files or a ZIP archive
  • Handles PDFs with hundreds of split points
  • Preserves all content including images
  • fonts
  • and annotations
  • Browser-based with no installation required
  • Secure TLS upload and automatic deletion within 60 minutes

Casos d'ús

  • Splitting a batch invoice export into individual invoice PDFs
  • Dividing a bulk form scan at each 'Form ID' separator page
  • Splitting a daily report bundle at each 'Date:' header
  • Extracting individual patient letters from a mail-merge export
  • Dividing a legal transcript at each 'EXHIBIT' marker
  • Splitting a training manual at each 'Module' heading
  • Isolating individual shipment records from a logistics manifest PDF
  • Splitting a scanned bank statement batch at each account number
  • Dividing a merged test result PDF at each student name
  • Extracting individual policies from a combined insurance document batch

Com s'utilitza

  1. 1Upload the PDF that contains repeated text markers you want to use as split points.
  2. 2Enter the keyword or phrase to split on. Enable case-insensitive matching if the capitalisation varies, or enter a regular expression for variable patterns.
  3. 3Choose whether the page containing the keyword starts the next output file or ends the previous one — this determines where separator pages land.
  4. 4Click Process. dokk.ai scans every page, identifies all matches, and splits the document at each occurrence.
  5. 5Download the individual split files or a ZIP archive. Files are named sequentially and each corresponds to one section between keyword occurrences.

When a PDF is a batch export containing multiple documents concatenated together — hundreds of invoices in a single file, a day's worth of scanned forms, or an auto-generated report where each section starts with a known heading — splitting it by a fixed keyword is far faster than manually identifying page ranges. Split by Text scans each page for a phrase you specify and creates a new output file every time that phrase appears, effectively using the document's own content as its split map. This is the tool that completes accounts payable automation, form processing pipelines, and bulk document distribution workflows. An accounting system exports 500 invoices as a single PDF; Split by Text finds 'INVOICE NUMBER' on each separator page and produces 500 individual invoice files. A medical records system batches patient letters; the tool splits at 'Dear Patient' to produce one letter per patient. A logistics company receives daily manifests where each shipment begins with a barcode label containing 'SHIPMENT ID'; the tool isolates each shipment into its own file for downstream processing. You can choose whether the keyword page itself is included in the output file or discarded — useful for separator pages that carry no meaningful content of their own. Case-insensitive matching ensures you do not need to worry about capitalisation variations in auto-generated documents. Regular expression patterns are supported for advanced use cases where the split marker is variable, such as 'Invoice #\d+' matching any invoice number. Split by Text complements Split by Bookmarks for documents that lack a formal outline but have consistent textual markers instead. If your documents have both, bookmarks are usually more reliable since they are structural rather than content-based. For maximum flexibility, combine the two approaches: split by bookmarks at the chapter level, then split by text within chapters to isolate individual records. All file processing occurs on dokk.ai's secure infrastructure. Files are deleted within 60 minutes and never used for machine learning or shared with third parties. The output files are standard PDFs compatible with every reader, printer, and document management system.

Preguntes freqüents

Seguretat i privadesa

The document text is scanned only to find the split keyword and is not stored or indexed. All files are transferred over TLS and deleted within 60 minutes of processing. dokk.ai is GDPR compliant and never uses document content for training or analytics.