Summary: It's not every day, but clients do ask us frequently enough how they can search engine optimize PDF documents that I felt a blog post dedicated to the topic was warranted at this time. We also wanted to touch on what you can do if Google indexes a PDF you intended to keep private.
Can you search engine optimize PDF documents?
It's a common misconception that search engines ignore the content contained within PDFs, but a somewhat understandable one - a few years ago they weren't that good at crawling PDFs, but they've gotten a lot better and continue to improve all the time.
PDF SEO is an often overlooked strategy with great potential. I can't count the number of times I've seen, for example, a menu link on a restaurant website open up to a PDF and noticed that the PDF simply wasn't optimized for search - a huge missed opportunity given that people often search Google for specific restaurant menus. If that's exactly what they're looking for, you don't want them to have to hunt around your site to find a link to it - you want it to be #1 in search results.
put together a detailed YouMoz tutorial Moz.com
which you need to read beginning to end, but his PDF SEO checklist should get you started:
Checklist for PDF Optimization:
- Search-friendly filenames
- Keyword-optimized titles
- Informative, concise descriptions
- Company name in "Author" field
- Use several relevant keywords in "Keyword" field
- Make sure to fill out all available fields - there is an option to view "Additional Metadata" (in Adobe Acrobat)
- Add tags to and accessibility options to your document
- Don't forget about Alt tags for images
- Add links back to relevant pages on the main website
- Write-protect the document
- Offer HTML version of the document
Alex breaks each item down in detail which is why I recommend you read the entire piece, but it's worth noting that he uses the full, paid version of Adobe Acrobat for many of these edits. You can download any number of free PDF editors and most should allow you to make the same changes, but it's worth noting that you don't edit these SEO elements of a document with the same HTML tags used on normal web pages.
The basic rule when performing SEO for PDFs is to follow the same strategies you would when optimizing web pages - you just have to go about editing titles, descriptions, etc. a different way. This includes, to my surprise, image alt tags:
Alt Tags for Images
Yes, it's possible to include "alt" tags on images within your PDF, so don't forget about those. Approach them the same way you would any on-site image tagging.
I guess it only stands to reason that if you're mirroring your SEO strategy on everything else, you'd include images as well particularly as Google increasingly learns how to crawl and index PDFs.
Links are another item that might be easy to overlook when optimizing a PDF:
Don't Forget About Links
Remember that you should be treating your PDF document much like you would treat any HTML page. Make sure to add a link back to your website for crawlers and users alike. Another benefit is that other websites may host copies of your PDF too, if it's useful, and a link in the document will give you a nice backlink. This can be especially useful for deep links, since PDFs are often relevant to specific pages or products on your site.
For more information on SEO for PDFs, check out this LunaMetrics post
and SEJ's 8 Tips to Make Your PDF Page SEO Friendly
. Certainly some overlap, but if getting your PDFs indexed and ranking is a critical part of your online marketing strategy, this is all vital information.
Hiding PDFs from Search Engines
Even though the purpose of this post is to help you better understand SEO for PDFs, I'd be remiss if I didn't tell you how to prevent Google and the other SEs from indexing PDFs as well. Why would one ever want to hide a PDF document from the search engines, you ask? Well, there could be any number of reasons, but the scenario we most commonly see is a situation where PDFs are only meant to be accessible after a user has converted (filled out a form of some kind).
A good example would be an educational material supplier offers free samples to interested parties, but users must first fill out a form. Once the form has been submitted, users are taken to a "thank you" page that contains links to the PDFs. This is a somewhat unsophisticated way to protect these documents, but also a rather common one. The problem is, if Google somehow stumbles on these documents (because they aren't locked behind a "members only" login system), it will index them. This can happen any number of ways, but often it's as simple as a webmaster mistakenly including ALL documents in the XML sitemap they submit to Webmaster Tools.
However it happened, if you find that the search engines have indexed a PDF you don't want indexed, there is a way to get it removed. It won't happen instantly, but Google's Webmaster Central Blog
offers this best practice action you should take in this FAQ:
Q: How can I prevent my PDF files from appearing in search results; or if they already do, how can I remove them?
A: The simplest way to prevent PDF documents from appearing in search results is to add an X-Robots-Tag: noindex in the HTTP header used to serve the file. If they’re already indexed, they’ll drop out over time if you use the X-Robot-Tag with the noindex directive. For faster removals, you can use the URL removal tool in Google Webmaster Tools.
It's worth noting that the post is over 2 years so some of the other information may be outdated, but this X-Robots-Tag: noindex is still the best way to tell Google and the other search engines, "hey, stop showing this PDF to people!"
So there you have it - yes, PDFs are indexed by search engines and can rank well, provided you optimize the same way you do normal HTML documents - you just need an editor capable of editing the necessary fields. And should you find yourself in a situation where you want one of your PDFs removed from search results, there are ways to accomplish that as well.