Adobe Tech Blog

News, updates, and thoughts related to Adobe, developers, and technology.

Follow publication

Split PDFs Based on Content with Adobe PDF Extract Service with Microsoft Power Automate

Ben Vanderberg
Adobe Tech Blog
Published in
12 min readMar 21, 2022

--

Adobe PDF Services connector in Microsoft Power Automate allows you to extract data content from PDF and split PDF documents.

What you are going to need

Scenario

Starting trigger

Using Manually trigger a flow trigger in Microsoft Power Automate with a File Content upload prompt.

Getting PDF content

PDF Extract action in Microsoft Power Automate allows you to extract content from your PDF documents.
Parse JSON action allows you to take the content from PDF Extract action and make those fields easily accessible later in the flow.

Using Search Array to find content in PDF

Array Filter action in Microsoft Power Automate allows you to filter content based on parameters.
{     "Path": "//Document/H1",     "HasClip": false,     "Bounds": [        36,        727.099853515625,        197.7562255859375,        758.5664367675781     ],     "TextSize": 22.020004272460938,     "Lang": "en",     "Page": 0,     "Text": "Halliby: Invoice ",     "attributes": {        "LineHeight": 26.375,        "SpaceAfter": 18     },     "CharBounds": [        [           36,           727.099853515625,           50.42308044433594,           758.5664367675781        ],        ...,     "Font": {        "name": "MCHVUF+CenturyGothic-Bold",        "weight": 700,        "alt_family_name": "Century Gothic",        "monospaced": false,        "encoding": "WinAnsiEncoding",        "embedded": true,        "family_name": "Century Gothic",        "font_type": "TrueType",        "italic": false,        "subset": true    }}

Path

"Path": "//Document/H1"

Text

"Text": "Halliby: Invoice "
Filtering an array in Microsoft Power Automate allows you to find content elements from the PDF content based on certain criteria.

Font and styling

"Font": {   "name": "MCHVUF+CenturyGothic-Bold",   "weight": 700,   "alt_family_name": "Century Gothic",    "monospaced": false,   "encoding": "WinAnsiEncoding",   "embedded": true,   "family_name": "Century Gothic",   "font_type": "TrueType",   "italic": false,   "subset": true}

Creating Variables for Loop

Variables in the flow help manage different data as part of the flow.
Variables in the flow help manage different data as part of the flow.

Loop to find page ranges

PDF Extract provides all of the content in PDF as elements in an array. When you filter that array, you can create a loop to cycle through the elements to determine the page ranges.
Body variable from Array Filter will provide an array of all the elements that were filtered based on criteria.
Overview of the loop to extract the page ranges to split the PDF.

Expression for startPoint

Expression for endPoint

Increment the iterator

Compensating for single pages

If the last element

Split PDF

Saving files to storage

Give the flow a try

Final thoughts

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Published in Adobe Tech Blog

News, updates, and thoughts related to Adobe, developers, and technology.

No responses yet