none
Opening a file from data lake store in custom processors.

    Question

  • Hi! Is this possible to open a small file from Data Lake Store in IProcessor implementation?

    For example in constructor:

                using (StreamReader sr = new StreamReader("adl://datalakestore.azuredatalakestore.net/BinaryAll/file.txt"))
                {
    	        // Read data from file
                }



    Friday, May 6, 2016 4:37 PM

Answers

  • Hello, you cannot access an ADL resource from your custom code because we use secure APIs to access data in Azure Data Lake Store. However, if you have a small file as you mentioned, you can deploy it as a resource to your nodes and then use it in your custom code.

    Hope this helps, please do let me know if you have any questions.

    In U-SQL

    DECLARE @Res_Lookup string = @"<your filename>";
    DEPLOY RESOURCE @Res_Lookup;

    //Pass it as a parameter to your UDO

    PROCESS data USING MyProcessor( lookupdata : @Res_Lookup);

    In your UDO

    var file_mode = System.IO.FileMode.Open;

    var file_name = this.lookupdata;
    var file_access = System.IO.FileAccess.Read;
    var file_share = System.IO.FileShare.Read | System.IO.FileShare.Delete;


    using (var sr = new System.IO.StreamReader(System.IO.File.Open(file_name, file_mode, file_access, file_share)))
    {
         //...
    }



    Monday, May 9, 2016 5:04 AM

All replies

  • Hello, you cannot access an ADL resource from your custom code because we use secure APIs to access data in Azure Data Lake Store. However, if you have a small file as you mentioned, you can deploy it as a resource to your nodes and then use it in your custom code.

    Hope this helps, please do let me know if you have any questions.

    In U-SQL

    DECLARE @Res_Lookup string = @"<your filename>";
    DEPLOY RESOURCE @Res_Lookup;

    //Pass it as a parameter to your UDO

    PROCESS data USING MyProcessor( lookupdata : @Res_Lookup);

    In your UDO

    var file_mode = System.IO.FileMode.Open;

    var file_name = this.lookupdata;
    var file_access = System.IO.FileAccess.Read;
    var file_share = System.IO.FileShare.Read | System.IO.FileShare.Delete;


    using (var sr = new System.IO.StreamReader(System.IO.File.Open(file_name, file_mode, file_access, file_share)))
    {
         //...
    }



    Monday, May 9, 2016 5:04 AM
  • Hi! I have problem with your solution. Should I set for <your filename> a path to file in DL store? If i run my code locally, everything works fine, but when I run my code using DLA account, script throws exception "The given path`s format is not supported". For MyProccessor constructor, lookupdata variable should be a string argument?

    DLA script:
    DECLARE @Res_Lookup string = @"adl://mgrdatalakestore.azuredatalakestore.net/Adaptors/filename-PE.fa";
    DEPLOY RESOURCE @Res_Lookup;
    
    Local script:
    DECLARE @Res_Lookup string = @"H:\Adaptors\filename-PE.fa";
    DEPLOY RESOURCE @Res_Lookup;

    PROCESS statement:

    @SRR988073_result =
        PROCESS @SRR988073
        PRODUCE data
        USING new NGSQualityControl.Domain.Processors.FastqPairedEndTrimmerProcessor("COMMAND", false, false, resource:@Res_Lookup);

    Constructor of processor:

            public BaseFastqTrimmerProcessor(String command, Boolean isQualityAvgColumn, String resource, QualityEncodingType qualityType)
            {
                _isQualityAvgColumn = isQualityAvgColumn;
                _qualityType = qualityType;
                _resource = resource;
                _trimmersList = CreateTrimmers(command);
            }

    File reading:

            private List<FastaObject> ReadFile(String resource)
            {
                var file_mode = FileMode.Open;
                var file_name = resource;
                var file_access = FileAccess.Read;
                var file_share = FileShare.Read | System.IO.FileShare.Delete;
    
                List<FastaObject> fastaAdaptors = new List<FastaObject>();
                FastaObject fastaObject = null;
    
                using (var sr = new System.IO.StreamReader(File.Open(file_name, file_mode, file_access, file_share)))
                {
                    //read file line by line
                }
    
                return fastaAdaptors;
            }

    EDIT:

    I found the cause of my problem. I have changed the full path to relative path from store and in the ReadFile function I have changed line:

    var file_name = resource;

    to:

    var file_name = Path.GetFileName(resource);

    Thank You for your help. That was what I need.



    Monday, May 9, 2016 11:35 AM
  • Glad to hear it worked and thanks for fixing the issue.
    Monday, May 9, 2016 3:26 PM