Implement Python script to retrieve original pdf structure


  • For my research I've written a Python code that can get information out of the original pdf structure. I am looking for a way to implement my script in datalake analytics, so I can get the needed information out of the pdf structure. 

    Question is : How can I implement my code that doesn't use an existing extractor but will use my python code, direct on the datalakestore filemap. 

    Is it possible to use my python script as an extractor?

    i tried a few things but it seems that you have to give in some dataframes in all the examples I found. 

    Hope somebody can help me,

    Thanks in advance,


    Thursday, February 15, 2018 6:16 AM

All replies

  • Currently we only support Reducer UDO for python. However, we are working on new python udo model which will added capability for user to author extractor, processor, reducer and outputter user define operators in python.  
    Thursday, February 15, 2018 10:44 PM