locked
Azure Search Index for PDFs without installable product RRS feed

  • Question

  • Guys,

    Here are my needs for an Azure Search solution

    • Make PDFs searchable, via full text search.
    • Allow the text to be located within the PDF
    • Want to move Azure VMs out of my solution and do only shared services (Webjobs, Azure Search, etc) to save costs
    • Currently, my solution has a VM running SharePoint, which we are using strictly for its search capabilities.

    General approach

    • I'd like to write a WebJob to load my index.
    • I want to shut down my SharePoint server, and simply use Azure Search
    • I understand that search is not crawl-based, that I need to build an index.  I don't know how I can build that index without a 3rd party product such as Adobe ifilter, iFoxIT, Apache Tika.  The first 2 require an installation, Tike likely depends on the a JVM installation.   My understanding is that WebJobs would not be a place where I could leverage these tools to load my index.  Thus, I'd need a VM.

    Any ideas?  The reason I'd like to move the client off the VMs to a fully shared-service model is to save my client about $3000/month in subscription fees, help them get the most out of their Azure.

    -Tom


    • Edited by Tom Cole Tuesday, November 4, 2014 2:22 PM
    Tuesday, November 4, 2014 2:21 PM

Answers

  • Hi Tom,

    I think your summary is correct.  One option I will throw out there (and I have no idea if these 3rd party products work well in this environment either) is to consider Azure Web Sites in conjunction with WebJobs or a Scheduler service.  If you only need to run this on a scheduled basis, perhaps you could add one of these 3rd party indexers to an Azure Web Site (maybe in the form of a Controller in an MVC application) which would be called from the Scheduler to execute the indexing.

    Please keep us up to date on your progress here as I know there are a number of people who have a similar interest.

    Liam


    Sr. Program Manager, SQL Azure Strategy - Blog

    Tuesday, November 4, 2014 5:20 PM