Answered by:
Api for OCR and reading .pdf and .doc(x) files

Question
-
Hi Guys,
Happy new year to you all!
I am trying to write an app that will read the text in an image, a .pdf or .doc(x) file and extract some information from it. I want this app to be a universal windows app. Is there an api for performing these actions?
Thanks and God bless.
The best things in life are free, but the most valuable ones are costly...use opportunities well for there are others like you who deserves them, but don't have them...
- Moved by Rob Caplan [MSFT]Microsoft employee, Moderator Monday, January 5, 2015 6:24 PM
Monday, January 5, 2015 4:41 PM
Answers
-
For OCR see the WindowsPreview.Media.Ocr namespace. I'm not aware of any API which will interpret PDF or doc files in a Windows Runtime app. For the Windows side you could use the in-box PDF API to generate a bitmap and then OCR it.
- Marked as answer by Matt SmallMicrosoft employee, Moderator Tuesday, January 6, 2015 3:00 PM
Monday, January 5, 2015 6:23 PMModerator
All replies
-
No built-in APIs, but I am sure there are third-party APIs for this.
Matt Small - Microsoft Escalation Engineer - Forum Moderator
If my reply answers your question, please mark this post as answered.
NOTE: If I ask for code, please provide something that I can drop directly into a project and run (including XAML), or an actual application project. I'm trying to help a lot of people, so I don't have time to figure out weird snippets with undefined objects and unknown namespaces.Monday, January 5, 2015 6:02 PMModerator -
For OCR see the WindowsPreview.Media.Ocr namespace. I'm not aware of any API which will interpret PDF or doc files in a Windows Runtime app. For the Windows side you could use the in-box PDF API to generate a bitmap and then OCR it.
- Marked as answer by Matt SmallMicrosoft employee, Moderator Tuesday, January 6, 2015 3:00 PM
Monday, January 5, 2015 6:23 PMModerator -
Sorry about missing the OCR part.
Matt Small - Microsoft Escalation Engineer - Forum Moderator
If my reply answers your question, please mark this post as answered.
NOTE: If I ask for code, please provide something that I can drop directly into a project and run (including XAML), or an actual application project. I'm trying to help a lot of people, so I don't have time to figure out weird snippets with undefined objects and unknown namespaces.Monday, January 5, 2015 6:58 PMModerator -
Thanks. I will test the conversion of PDF to image and then use the OCR engine.
The best things in life are free, but the most valuable ones are costly...use opportunities well for there are others like you who deserves them, but don't have them...
Tuesday, January 27, 2015 7:10 PM