I know this question rather belongs to the Adobe Dev. Forum, but I will give it a try here:
Anyone knows a way to extract text from a specific area of a page (defined by coordinates, not keywords) of a PDF file?
I know there are lots of commercial dlls available, but they cost about $4000 for a distr. license.
- Changed type Martin Xie - MSFT Tuesday, December 23, 2008 11:05 AM Discussion.
This is probably not helpful or what your looking for, however...
You could have a look at Poppler which is a Unix based toolset. The pdftotext executable
from that allows you to grab specific pages from within a PDF file, although not from specific coordinates.
I've managed to compile it under MS Interix (SFU) and on Cygwin more or less out of the box.
It works really quickly and on every PDF version I've used it on.
Hope this is some help.
Thank you, I've found something too, but I didnt test it yet.
I decided to work with XPS rather than PDF, actually simply because the DEV kits for XPS such as virtual print driver etc. are cheaper.
Anyway, if anyone wants to try it out, maybe can also post his experiences with this bunch of code here from this site. They also have a forum on that site, but I think it's only in German language.
I post it again here, because if this code really works, it should not be hidden somewhere on the web at all: