locked
Date variable writing out as blank from execute r module in Azure Machine Learning RRS feed

  • Question

  • All of these dates that I’ve manipulated in Execute R module in Azure Machine Learning write out as blank in the output – that is, these date columns exist, but there is no value in those columns. 


    There are two variables I’m reading in in the data frame, and they have two different date formats. 

    usage$Date=c(‘8/6/2015’   ‘8/20/2015’  ‘7/9/2015’)

    usage$Date2=c(‘4/16/2015 0:00’,  ‘7/1/2015 0:00’, ‘7/1/2015 0:00’) 

    I manipulate them successfully in R studio via the code below.

    usage$Date<-as.character(usage$Date)

    usage$Date<-as.Date(as.POSIXct(strptime(usage$Date, "%m/%d/%Y"), "GMT"))

    usage$Date<-format(tusage$Date, "%m/%d/%Y")

    usage$Date<-as.Date(usage$Date "%m/%d/%Y")

     

    usage$Date2<- as.POSIXct(strptime(usage$Date2, "%m/%d/%Y"), "GMT")

    usage$Date2<- format(usage$Date2,"%m/%d/%Y")

    usage$Date2<-as.Date(usage$Date2, "%m/%d/%Y")

    After this – in R studio – they both end up in this format

    usage$Date= c("2015-09-17", "2015-09-03", "2015-09-03")

    usage$Date2 =c(‘"2015-04-16", "2015-04-16", "2015-07-01")

    However, when I run this code in Azure Machine Learning all of these dates that I’ve manipulated write out as blank in the data frame – that is, the variable exists, but there is no value in the usage$Date and usage$Date2 columns in the output data frame.  Also, any variables derived using these variables are also output as blanks.

    There is no error in Azure ML when it runs the experiment, it just writes out blanks for those variables.  Help! Has anyone run into a problem with dates in Azure Machine Learning?  If so, what is the work around?

    Thanks!


    Thursday, October 29, 2015 12:13 AM

Answers

  • Yay!  I was able to solve for this. 

    My problem was that the date/time variable was recognized within Azure ML in a different format than it appeared in the csv.  Although it looked like this in the csv

    usage$Date=c(‘8/6/2015’,   ‘8/20/2015’,  ‘7/9/2015’)

    it was actually recognized by AzureML in this format after being read in:

    usage$Date=c('2015-08-06T00:00:00', '2015-08-20T00:00:00', '2015-07-09T00:00:00')


    I took two troubleshooting steps -

    I connected a 'writer' module directly from the reader to see what the format was that was read in.

    Then I connected the 'convert to dataset' module between the reader and the execute r module, so that I could visualize the format of the data at that point.


    I changed the formatting script for 'usage$date' to the following, and the date information works beautifully, and writes out to the output. 


    usage$Date<-as.Date(as.POSIXct(usage$Date, format = "ymd", tz = "EST5EDT"))

    I was able to isolate the problem to the input format based on your repro AK.  Thank you!

    Thanks everyone!

    Patty


    Monday, November 9, 2015 11:32 PM

All replies

  • Hi Patty,

    What version of R and RStudio are you using. AzureML Currently supports version 3.2.

    Also print out your steps to see if your data is not being stripped off, or the conversion invalid with the print() statement. You can see your print statements in the output.log of Azure ML.


    Sr. Enterprise Architect | Trainer | Consultant | MCT | MCSD | MCPD | SharePoint TS | MS Virtual TS |Windows 8 App Store Developer | Linux Gentoo Geek | Raspberry Pi Owner | Micro .Net Developer | Kinect For Windows Device Developer |blog: http://dgoins.wordpress.com

    Thursday, October 29, 2015 6:38 AM
  • Hi Patty,

    Could you try this script within your 'Execute R Script' module?

    
    Date=c("8/6/2015", "8/20/2015", "7/9/2015")  
    Date<-as.character(Date) 
    Date<-as.Date(as.POSIXct(strptime(Date, "%m/%d/%Y"), "GMT")) 
    Date<-format(Date, "%m/%d/%Y") 
    Date<-as.Date(Date, "%m/%d/%Y") 
    data.set <- as.data.frame(Date) 
    # Select data.frame to be sent to the output Dataset port
    maml.mapOutputPort("data.set");

    It seems to work for me as shown below and gives the following output:

    Hope this helps.

    Regards,
    Jaya.

    Thursday, October 29, 2015 12:50 PM
  • I am running R version 3.2, the version that is supported.

    I inspected the log file in AML, and indeed, it needed the default time zone set.  (I'm presuming local R Studio pulls this from your local system, hence it doesn't get confused.

    The log file warnings specifically:

    [ModuleOutput] 1: In strptime(x, format, tz = tz) :

    [ModuleOutput] unable to identify current timezone 'C':

    [ModuleOutput] please set environment variable 'TZ' [ModuleOutput]

    [ModuleOutput] 2: In strptime(x, format, tz = tz) : unknown timezone 'localtime'

    I referred to an answer regarding setting default time zone for strptime here

    http://stackoverflow.com/questions/4047188/unknown-timezone-name-in-r-strptime-as-posixct

    I changed my code to exlicitly define the global environment time variable.

    Further, I defined the time zone explicit in each line related to reformatting date as follows:

     HOWEVER, the error persists -it still does not write these variables out, and still is unable to identify the current timezone. 

    Warning messages: [ModuleOutput] [ModuleOutput] 1: In strptime(x, format, tz = tz) : [ModuleOutput] [ModuleOutput] unable to identify current timezone 'C': [ModuleOutput] [ModuleOutput] please set environment variable 'TZ' [ModuleOutput] [ModuleOutput] 2: In strptime(x, format, tz = tz) : unknown timezone 'localtime' [ModuleOutput] [ModuleOutput] 3: package 'base64enc' was built under R version 3.2.2 [ModuleOutput] [ModuleOutput] 4: In strptime(x, format, tz = tz) : unknown timezone 'localtime' [ModuleOutput] [ModuleOutput] 5: In strptime(x, format, tz = "GMT") : unknown timezone 'localtime' [ModuleOutput]

    Sys.setenv(TZ='GMT')
    Sys.getenv("TZ")



    ####Data frame usage cleanup, format and labeling
    usage<-as.data.frame(usage)
    usage$Date1<-as.character(usage$Date1)
    usage$Date1<-as.POSIXct(usage$Date1, "%m/%d/%Y",tz="GMT")
    usage$Date1<-format(usage$Date1, "%m/%d/%Y")
    usage$Date1<-as.Date(usage$Date1, "%m/%d/%Y")
    usage<-as.data.frame(usage)

    usage$Date2<- as.POSIXct(usage$Date2, "%m/%d/%Y",tz="GMT")
    usage$Date2<- format(usage$Date2,"%m/%d/%Y")
    usage$Date2<-as.Date(usage$Date2, "%m/%d/%Y")
    usage<-as.data.frame(usage)

    Please advise!

    Thursday, October 29, 2015 4:45 PM
  • After reading two blog posts on this problem, I realize that AzureML does not support some date/time formats. 

    http://blogs.msdn.com/b/andreasderuiter/archive/2015/02/03/troubleshooting-error-1000-rpackage-library-exception-failed-to-convert-robject-to-dataset-when-running-r-scripts-in-azure-ml.aspx

    http://www.mikelanzetta.com/2015/01/data-cleaning-with-azureml-and-r-dates/

    So I tried to convert to POSIXct before sending it to the output stream, which I've done as follows:

    tenantusage$Date1 = as.POSIXct(tenantusage$Date1 , "%m/%d/%Y",tz = "EST5EDT");

    tenantusage$Date2 = as.POSIXct(tenantusage$Date2 , "%m/%d/%Y",tz = "EST5EDT");

    But encounter the same problem.  The information in these variables refuses to write out.

    Please advise!

    thanks

    Thursday, October 29, 2015 9:57 PM
  • Hey Patty!

    Sorry if this sounds like a dumb question but could we see your entire script please? Do you at one point have the maml.mapOutputPort("<your data.frame name>") call anywhere there to map an R df to the output port?

    Regards,

    AK

    Friday, October 30, 2015 4:02 PM
  • Thanks for the question - I send over my entire script here.I'm also listing below some examples of the data items in the Date formats in question.    This writes out blanks for 'FactDate' and 'Date2' and the derivative dates.

    For good measure, I'm listing below the class of all the variables once they are imported into the Execute R measure.  


    ###########

    # Map 1-based optional input ports to variables
    usage <- maml.mapInputPort(1) # class: data.frame

    ## locale-specific version of date()
    Sys.time<-format(Sys.time(), "%a %b %d %X %Y %Z, %C")
    ## time to sub-second accuracy (if supported by the OS)
    Sys.time<-format(Sys.time(), "%H:%M:%OS3")

    ##all environment setting commands I can find
    Sys.setenv(TZ='EST5EDT')
    Sys.setenv(tz='EST5EDT')
    Sys.setlocale("LC_TIME", "C")

    Sys.setlocale("LC_COLLATE", "C")
    Sys.setlocale("LC_TIME", "English")


    valid_column_names <- make.names(names=names(usage), unique=TRUE, allow_ = TRUE)
    names(usage) <- valid_column_names

    ##Install packages
    install.packages("src/RCurl_1.95-4.7.zip", lib= ".", repos = NULL, verbose = TRUE)
    install.packages("src/base64enc_0.1-3.zip", lib= ".", repos = NULL, verbose = TRUE)
    install.packages("src/dplyr_0.4.3.zip", lib= ".", repos = NULL, verbose = TRUE)
    library(base64enc, lib.loc=".", verbose=TRUE)
    library(RCurl)
    library(dplyr)

    ###create data set of the class of the variables
    dataclasses<-lapply(tenantusage,class)
    dataclasses<-as.data.frame(dataclasses)


    ####Data frame usage cleanup, format and labeling
    usage<-as.data.frame(usage)
    usage$FactDate<-as.character(usage$FactDate)
    usage<-usage[!is.na(usage$FactDate),]
    usage$FactDate<-as.POSIXct(usage$FactDate, "%m/%d/%Y",tz = "EST5EDT")
    usage$FactDate<-format(usage$FactDate, "%m/%d/%Y", tz = "EST5EDT")
    usage$FactDate<-as.Date(usage$FactDate, "%m/%d/%Y",tz = "EST5EDT")
    usage<-as.data.frame(usage)

    usage$Date2<-as.character(usage$Date2)
    usage<-usage[!is.na(usage$Date2),]
    usage$Date2 <- as.POSIXct(usage$Date2, "%m/%d/%Y",tz = "EST5EDT")
    usage$Date2 <- format(usage$Date2 ,"%m/%d/%Y",tz = "EST5EDT")
    usage$Date2<-as.Date(usage$Date2, "%m/%d/%Y",tz = "EST5EDT")
    usage<-as.data.frame(usage)



    ###Date Specific New Variables
    usage<-as.data.frame(usage)
    usage$maxdate<-as.Date(max(usage$FactDate),"%m/%d/%Y",tz= "EST5EDT")
    usage$maxless7days<- usage$maxdate - as.difftime(7, unit="days" )
    usage$maxless14days<- usage$maxdate - as.difftime(14, unit="days")
    usage$maxless30days<- usage$maxdate - as.difftime(30, unit="days" )
    usage$maxless60days<- usage$maxdate - as.difftime(60, unit="days" )
    usage$maxless90days<- usage$maxdate - as.difftime(90, unit="days" )
    usage<-as.data.frame(usage)



    # Select data.frame to be sent to the output Dataset port
    maml.mapOutputPort("usage");


    ########

    these are the data that is being imported:

    usage$FactDate=c(‘8/6/2015’   ‘8/20/2015’  ‘7/9/2015’)

    usage$Date2=c(‘4/16/2015 0:00’,  ‘7/1/2015 0:00’, ‘7/1/2015 0:00’)



    I've done a run where I output 'dataclasses' that shows that FactDate is recognized as POSIXct at the point where 'dataclasses' is defined.


    I found this link with some details on date/time in Azure ML, but nothing specific enough to help me solve this problem https://msdn.microsoft.com/en-us/library/azure/dn905952.aspx

    thanks in advance for your help!  I much appreciate it!








    • Edited by Patty Ryan Tuesday, November 3, 2015 2:21 AM
    Monday, November 2, 2015 6:24 PM
  • Hey Patty,

    Sorry for the inconvenience but could you please take a look at my attempted repro of your problem? I'm able to see content output from Execute R script but I may have misunderstood. Please let me know what I might have gotten wrong!

    http://gallery.azureml.net/Details/0fcbe15f37c84075b74b98223a2a913f

    Regards,

    AK

    Monday, November 9, 2015 5:04 PM
  • AK -

    thanks for the repro. I see that it works for you.

    In fact, I copied your repro script in its entirety and exactly, and run it against my data set, and my experiment still writes out blanks to date fields.

    So I believe then that the data input format must be different than what I have described to you, as this is the only remaining difference between your experiment and mine.  I'm looking at this a little more closely. I will provide more detail.


    Thanks,

    Patty


    • Edited by Patty Ryan Monday, November 9, 2015 7:58 PM
    Monday, November 9, 2015 7:53 PM
  • Hi

    I noticed that if in you CSV file ( entry dataset ) the date is not string ( no "")  then everything works fine.

    If its a string I spent 3 days trying to write it back to Date without success. I just changed the formating of my CSV file(no ""anymore for the date field) , and then everything worked fine

    Reg

    Monday, November 9, 2015 8:26 PM
  • regarding the data input format:

    using the writer module, I wrote out the data directly from the input module.

    This is the format of the date variable as it is when it is passed to the execute R module:

    2015-09-03T00:00:00

    I am going to truncate after the 10th character, after I convert to character and see if that helps.

    Monday, November 9, 2015 8:37 PM
  • Yay!  I was able to solve for this. 

    My problem was that the date/time variable was recognized within Azure ML in a different format than it appeared in the csv.  Although it looked like this in the csv

    usage$Date=c(‘8/6/2015’,   ‘8/20/2015’,  ‘7/9/2015’)

    it was actually recognized by AzureML in this format after being read in:

    usage$Date=c('2015-08-06T00:00:00', '2015-08-20T00:00:00', '2015-07-09T00:00:00')


    I took two troubleshooting steps -

    I connected a 'writer' module directly from the reader to see what the format was that was read in.

    Then I connected the 'convert to dataset' module between the reader and the execute r module, so that I could visualize the format of the data at that point.


    I changed the formatting script for 'usage$date' to the following, and the date information works beautifully, and writes out to the output. 


    usage$Date<-as.Date(as.POSIXct(usage$Date, format = "ymd", tz = "EST5EDT"))

    I was able to isolate the problem to the input format based on your repro AK.  Thank you!

    Thanks everyone!

    Patty


    Monday, November 9, 2015 11:32 PM