{"id":847,"date":"2010-01-18T12:10:05","date_gmt":"2010-01-18T17:10:05","guid":{"rendered":"http:\/\/adam.rosi-kessel.org\/weblog\/?p=847"},"modified":"2010-01-18T12:10:05","modified_gmt":"2010-01-18T17:10:05","slug":"free-tip-how-to-resize-scanned-pdfs-with-ghostscript-for-adobe-acrobat-ocr","status":"publish","type":"post","link":"https:\/\/adam.rosi-kessel.org\/weblog\/2010\/01\/18\/free-tip-how-to-resize-scanned-pdfs-with-ghostscript-for-adobe-acrobat-ocr","title":{"rendered":"Free Tip: How to resize scanned PDFs with ghostscript for Adobe Acrobat OCR"},"content":{"rendered":"<p>I&#8217;m unaware of any free tool to perform OCR on a PDF and embed the resulting data in the PDF itself so it is text-searchable. If anyone knows of one, let me know. In the meantime, I use Acrobat Professional for this essential functionality.<\/p>\n<p>High resolution PDFs produced by my scanner (HP Officejet Pro L7700) usually give the following error when I try to perform Acrobat OCR:<\/p>\n<p><code>This page is larger than the maximum page size of 45 inches by 45 inches.<\/code><\/p>\n<p>Surprisingly, there doesn&#8217;t seem to be any way to resize the page size of a PDF within Acrobat. It&#8217;s possible to print to a new PDF of the correct size, but this operation cannot easily be batched. If I apply the &#8220;crop&#8221; tool to resize the page in Acrobat, I get this error:<\/p>\n<p><code>Page size may not be reduced.<\/code><\/p>\n<p>Many report these issues in Adobe&#8217;s forums. The most common responses suggest reconfiguring the scanner or buying a new one.<\/p>\n<p>I found nothing quick and easy after some googling for a simple ghostscript recipe to perform the batch pre-processing necessary to allow Acrobat to do the OCR. It&#8217;s not hard to do, just a bit of a trial-and-error pain to get the right switches.<\/p>\n<p>For posterity, then, here is a simple command-line to make this happen (here under Windows, but could obviously easily be adapted for any other platform). First, download the <a href=\"http:\/\/pages.cs.wisc.edu\/~ghost\/doc\/GPL\/index.htm\">latest ghostscript<\/a> for your platform (at this time, <a href=\"http:\/\/mirror.cs.wisc.edu\/pub\/mirrors\/ghost\/GPL\/gs864\/gs864w32.exe\">8.64 for Windows<\/a>). Then:<\/p>\n<p><code>gswin32c -dQUIET -dNOPAUSE -dBATCH -sPAPERSIZE=letter -sDEVICE=pdfwrite -sOutputFile=OUTPUT.pdf -dPDFFitPage INPUT.pdf<\/code><\/p>\n<p>And a simple inelegant script to batch process (again, under Windows\/cygwin, but easily adaptable). Feel free to make more elegant:<\/p>\n<pre>#!\/bin\/bash\r\nfor x in \"$@\"\r\ndo\r\necho -n Processing $x ...\r\nif [ ! -e \"$x\" ]\r\nthen\r\necho File $x missing. Exiting.\r\nexit 1\r\nfi\r\nif [ -e gs_shrink_to_letter.pdf ]\r\nthen\r\necho Tempfile gs_shrink_to_letter.pdf exists. Exiting.\r\nexit 1\r\nfi\r\nif ( gswin32c -dQUIET -dNOPAUSE -dBATCH -sPAPERSIZE=letter -sDEVICE=pdfwrite -sOutputFile=gs_shrink_to_letter.pdf -dPDFFitPage \"$x\" )\r\nthen\r\necho Success.\r\nmv gs_shrink_to_letter.pdf \"$x\"\r\nelse\r\necho Error occurred, exiting.\r\nexit $?\r\nfi\r\ndone\r\n<\/pre>\n<p>&nbsp;<br \/>\nAfter converting your PDFs as above, you can then apply Acrobat batch OCR without a hitch.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I&#8217;m unaware of any free tool to perform OCR on a PDF and embed the resulting data in the PDF itself so it is text-searchable. If anyone knows of one, let me know. In the meantime, I use Acrobat Professional for this essential functionality. High resolution PDFs produced by my scanner (HP Officejet Pro L7700) [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[17,15,26],"tags":[144,143,145,141,140,142],"_links":{"self":[{"href":"https:\/\/adam.rosi-kessel.org\/weblog\/wp-json\/wp\/v2\/posts\/847"}],"collection":[{"href":"https:\/\/adam.rosi-kessel.org\/weblog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/adam.rosi-kessel.org\/weblog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/adam.rosi-kessel.org\/weblog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/adam.rosi-kessel.org\/weblog\/wp-json\/wp\/v2\/comments?post=847"}],"version-history":[{"count":12,"href":"https:\/\/adam.rosi-kessel.org\/weblog\/wp-json\/wp\/v2\/posts\/847\/revisions"}],"predecessor-version":[{"id":859,"href":"https:\/\/adam.rosi-kessel.org\/weblog\/wp-json\/wp\/v2\/posts\/847\/revisions\/859"}],"wp:attachment":[{"href":"https:\/\/adam.rosi-kessel.org\/weblog\/wp-json\/wp\/v2\/media?parent=847"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/adam.rosi-kessel.org\/weblog\/wp-json\/wp\/v2\/categories?post=847"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/adam.rosi-kessel.org\/weblog\/wp-json\/wp\/v2\/tags?post=847"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}