How to extract text from a PDF Since today I know it the best thing for text extraction from PDFs is TET, the text extraction toolkit. TET is part of the PDFlib. PDFlib. com is Thomas Merzs company. In case you dont recognize his name Thomas Merz is the author of the Post. Script and PDF Bible. TETs first incarnation is a library. WWKd2eWr' alt='Adobe Pdf Ifilter Windows 7' title='Adobe Pdf Ifilter Windows 7' />That one can probably do everything Budda. Oh, and it can also extract images. It recombines images which are fragmented into pieces. TET plugin for Acrobat. And the third incarnation is the PDFlib TET i. Filter. This is a standalone tool for user desktops. Both these are free as in beer to use for private, non commercial purposes. And its really powerful. Way better than Adobes own text extraction. It extracted text for me where other tools including Adobes do spit out garbage only. I just tested the desktop standalone tool, and what they say on their webpage is true. It has a very good commandline. Some of my problematic PDF test files the tool handled to my full satisfaction. This thing will from now on be my recommendation for every sophisticated and challenging PDF text extraction requirements. TET is simply awesome. It detects tables. Settlers Of Catan Gallery Edition Expansion Tanks there. Inside tables, it identifies cells spanning multiple columns. It identifies table rows and contents of each table cell separately. It deals very well with hyphenations it removes hyphens and restores complete words. It supports non ASCII languages including CJK, Arabic and Hebrew. When encountering ligatures, it restores the original characters.