Tagging PDF Documents for Accessibility at Scale with Adobe Acrobat PDF Services

Tagging PDF Documents for Accessibility with Adobe Acrobat Services

︎

Creating and sharing accessible documents is challenging, but also legally required in many cases. With the variety of ways that PDFs are created, ensuring they are properly tagged for accessibility is vitally important. With this in mind, Adobes Acrobat PDF Services now support a new API that helps with this work. The Adobe PDF Accessibility Auto-Tag API (now generally available to developers) can greatly help with this process and be a great partner with your existing accessibility teams. Do note though the API by itself will not create instantly PDF documents instantly compliant with Section 508 or PDF/UA. However, the API’s tagging can help speed up the process when used along with human review. Let’s take a look at how the API operates.

The Auto-Tag API

As an API, the Auto-Tag feature is fairly simple and can be used with just a source PDF file. There are two optional arguments that can be used:

shiftHeadings: The API will automatically look for the title of the document and tag it as an H1. If this argument is set to true, all additional headings beneath the title will automatically be set one level lower, so an H1 will be changed to H2. This argument defaults to false.
generateReport: By default, the result of the operation is a tagged PDF. If this argument is set to true, the result will contain both the tagged PDF as well as an Excel report documenting the operation. (You will see an example of this shortly.) This argument defaults to false.

Let’s take a look at an example of this using the REST APIs. (Our SDKs have been updated to support the new operation as well.)

Our input document

We’ll begin by looking at a PDF document that is not tagged. We can see this by opening up the document in Adobe Acrobat and clicking the “Tags” icon in the left-hand gutter. This will show that no tags exist for the PDF at all:

︎

You can download this source PDF document here.

Now let’s look at how to run this through the Auto-Tag API.

Sample code

Using the REST API for any of our services follows a similar pattern:

Authenticate with Adobe to get an access token.
Upload your input as an asset.
Kick off the job, in this case, Auto-Tag.
Poll for the result.
When complete, download and enjoy.

We first covered our REST APIs last year (Announcing the New Adobe Document Services REST APIs) so we’ll be a bit brief in this example. Let’s demonstrate this bit by bit. (We’ll share the entire sample at the end.)

The code below will be in Node.js, but remember that as a REST API, you can use any language you wish.

First, we’ll require some libraries, define some variables, and so forth:

https://gist.github.com/cfjedimaster/417d6dd8678342891e7551678009b0b2#file-script-js

For the above, the only really important bit is inputFile and outputFile.

Next, let’s get an access token. This, and all the operations in our demo, will all make use of utility functions that you are free to copy.

https://gist.github.com/cfjedimaster/6ddce10416cdd85efea55862321df62e#file-script-js

As a quick aside, this is making use of our new shiny simpler authentication system. You can read more about that here, “Important Update for Developers Using Adobe APIs”.

The next two steps handle uploading our asset. This is done by first creating an asset and then actually uploading it. Here are the calls:

https://gist.github.com/cfjedimaster/74633eaf94de2aed5587c5123a2662d6#file-script-js

And here are their supporting functions:

https://gist.github.com/cfjedimaster/9d40ac7b42c12ab758596481f28e66fd#file-script-js

Ok, at this point we’ve created an asset and uploaded our source PDF. Now it’s time to kick off the job. This time, let’s start with the utility function:

https://gist.github.com/cfjedimaster/532536696ee06378ea3ab541f25f2b74#file-script-js

This function takes the asset and authentication information as required parameters, while the two optional arguments match the optional arguments in the API. This will start the process and return a URL we can check to see how it’s doing. Here’s our call:

https://gist.github.com/cfjedimaster/9f600bf607c17c8bcbd06afc5c319c9a#file-script-js

Now we need to poll the job and wait for it to be complete. This is fairly boilerplate, so for now, we’ll show the call, and save the actual implementation to the complete source code listing at the end:

https://gist.github.com/cfjedimaster/6cbaf247b0db8d0786f26926f38c482a#file-script-js

As an aside, this demo does not have great error handling, so that’s something to keep in mind if you use this in production.

Finally, we can use our result. The result from the function contains the URL for the tagged pdf:

https://gist.github.com/cfjedimaster/4c9174d18e2469d3a4d81da63b6ae2e4#file-script-js

As before, downloadFile isn't anything special so we'll hold off showing it for now.

After running our script on the PDF, the result is evident:

︎

As you can see, the document now has been tagged, and while this still needs to go before human review, a good amount of work (and time) has been saved!

While we didn’t generate a report in our example, if it’s requested, then it can be downloaded as well. Here’s the report generated by our input PDF.

︎

Next steps

If you want to get started, be sure to check our docs first (including the REST endpoint) and sign up for a free set of credentials to start working with it now. Finally, be sure to hit up our forums with your questions — we’re here to help!

Here’s the complete source of the sample application. It will require credentials stored in a .env file or set as environment variables.

https://gist.github.com/cfjedimaster/966835334b12411c46ec2a7c913dfb3b#file-script-js