How to read plain text files in R Script node in R Machine Learning? RRS feed

  • Question

  • What if you had a source that was not a structured source such as plain text or HTML pages with tables that need to be parsed? 

    Azure ML's R Script node cannot connect to the Internet at all ( in another thread, a moderator told me it was because R runs in a secure sandbox because it's a multi-tenant architecture). So R cannot be used to get this Internet file.

    The Reader module in Azure ML only allows for structured files and data sources to be read. So that's out of the question as well.

    What other options exist (other than downloading the information manually, parsing it on a local computer, creating an Azure ML compatible "dataset" and using it)? I mean, if I have to do all this myself locally, I can't truly create a "cloud" solution. 

    Thursday, November 6, 2014 6:06 PM

All replies

  • Hi Suresh,

    Using the URL you provided in the other question, I ran a Reader module in Http mode, with the Format set to CSV.

    This produces the page in question's HTML semi-structured in a Dataset.

    This Dataset can be directly piped to Execute R Script for processing.

    Let me know if you have any other questions!


    Thursday, November 6, 2014 7:19 PM
  • Hi,

    I am working on a similar case and followed your recommendation.

    The "Reader" was able to import a plain text file from a Public Dropbox folder. I configured the output as a CSV file, and it was exactly what "Reader" did, it converted the text file to a CSV file with columnames and many rows as NaN. In the conversion process the no NA rows have words and values that were not related at all with the text file.

    So, it seems that this way to import a text file is not suitable.

    The alternative to upload the text file is not working for the reason you mentioned. You can not feed an R connection.



    Monday, December 8, 2014 9:52 AM
  • Hi Carlos!

    If all you need is to import a Plaintext file, please Zip up this file and upload the Zip file to Azure ML as a Dataset.

    You can then attach this Zip to the Execute R Script module where you can massage the data as needed before exporting a tabular structure.



    Monday, December 8, 2014 1:04 PM
  • Thanks!

    It worked perfectly!.



    Monday, December 8, 2014 5:18 PM