none
SQL Server R Services Tutorials - errors

    Question

  • Hello guys,

    I am super keep to start using R with SQL Server (so far worked with the latter, the former is not very new to me, but still need to learn how to connect the two). I started with the following MS tutorial: https://msdn.microsoft.com/en-us/library/mt612857.aspx

    Setup is rather simple: SQL Server 2016 + Visual Studio with R tool + Microsoft R stand alone on the same machine with full admin rights.

    When running a code from the samples I hit a few problem. Will be grateful for your help.

    1) There is no ggmaps package

    Lesson 2 is about data visualization. When I try to run the following:

    mapPlot <- function(inDataSource, googMap){
        library(ggmap)
        library(mapproj)
    
    # Open Source R functions require data to be brought back in memory into data frames. Use rxImport to bring in data. 
    # Remember: This whole function runs in the SQL Server Context.
        ds <- rxImport(inDataSource)
    
        p<-ggmap(googMap)+
        geom_point(aes(x = pickup_longitude, y =pickup_latitude ), 
                          data=ds, alpha =.5, color="darkred", size = 1.5)
    
        return(list(myplot=p))
    }
    
    library(ggmap)
    library(mapproj)
    # Get the map with Times Square, NY as the center. This is run on the R Client
    gc <- geocode("Times Square", source = "google")
    googMap <- get_googlemap(center = as.numeric(gc), zoom = 12, maptype = 'roadmap', color = 'color')
    # Run the points plotting on SQL server. Passing in the map data as arg to remotely executed function. 
    # The points are in the database and will be plotted on the map
    myplots <- rxExec(mapPlot, inDataSource, googMap, timesToRun = 1)
    plot(myplots[[1]][["myplot"]])

    The following line:

    myplots <- rxExec(mapPlot, inDataSource, googMap, timesToRun = 1)

    results in error, i.e. library(ggmap)': there is no package named 'ggmap'.

    Any ideas how to solve? Not critical as I'm not be using mapping at my normal projects.

    2) New environment

    In lesson three new custom function is being created:

    env <- new.env()
    env$ComputeDist <- function(pickup_long, pickup_lat, dropoff_long, dropoff_lat) {

    Why do I need first line that creates new environment?

    3) DataStep error

    This is the most serios one to me. rxDataStep is used for create new table on the server, so operation that you are normally using super often when preparing data for modelling:

    env <- new.env()
    
    env$ComputeDist <- function(pickup_long, pickup_lat, dropoff_long, dropoff_lat){
      R <- 6371/1.609344 #radius in mile
      delta_lat <- dropoff_lat - pickup_lat
      delta_long <- dropoff_long - pickup_long
      degrees_to_radians = pi/180.0
      a1 <- sin(delta_lat/2*degrees_to_radians)
      a2 <- as.numeric(a1)^2
      a3 <- cos(pickup_lat*degrees_to_radians)
      a4 <- cos(dropoff_lat*degrees_to_radians)
      a5 <- sin(delta_long/2*degrees_to_radians)
      a6 <- as.numeric(a5)^2
      a <- a2+a3*a4*a6
      c <- 2*atan2(sqrt(a),sqrt(1-a))
      d <- R*c
      return (d)
    }
    
    #Define the featureDataSource to be used to store the features, specify types of some variables as numeric
    featureDataSource = RxSqlServerData(table = "features", 
                                        colClasses = c(pickup_longitude = "numeric", pickup_latitude = "numeric", 
                                                   dropoff_longitude = "numeric", dropoff_latitude = "numeric",
                                                      passenger_count  = "numeric", trip_distance  = "numeric",
                                                       trip_time_in_secs  = "numeric", direct_distance  = "numeric"),
                                        connectionString = connStr)
    
    # Create feature (direct distance) by calling rxDataStep() function, which calls the env$ComputeDist function to process records
    # And output it along with other variables as features to the featureDataSource
    # This will be the feature set for training machine learning models
    start.time <- proc.time()
    rxDataStep(inData =   inDataSource, outFile = featureDataSource,  overwrite = TRUE, 
                               varsToKeep=c("tipped", "fare_amount", "passenger_count","trip_time_in_secs", 
                                            "trip_distance", "pickup_datetime", "dropoff_datetime", "pickup_longitude",
                                            "pickup_latitude","dropoff_longitude", "dropoff_latitude"),
                               transforms = list(direct_distance=ComputeDist(pickup_longitude, pickup_latitude, dropoff_longitude, 
                                                                  dropoff_latitude)),
                               transformEnvir = env, rowsPerRead=500, reportProgress = 3)
    used.time <- proc.time() - start.time
    print(paste("It takes CPU Time=", round(used.time[1]+used.time[2],2), 
                " seconds, Elapsed Time=", round(used.time[3],2), " seconds to generate features.", sep=""))

    And strange error for rxDataStep:

    ======  HIKARI_SURFACE  ( process  1 ) has started run at  2017-02-05 12:58:31.65  ====== 
     
    'varsToKeep' and 'varsToDrop' not supported for this data source type. 
     
    ERROR: The sample data set for the analysis has no variables. 
    Caught exception in file: CxAnalysis.cpp, line: 3756. ThreadID: 16036 Rethrowing. 
    Caught exception in file: CxAnalysis.cpp, line: 5249. ThreadID: 16036 Rethrowing. 
    Caught exception in file: CxAnalysis.cpp, line: 3756. ThreadID: 16036 Rethrowing. 
    Caught exception in file: CxAnalysis.cpp, line: 5249. ThreadID: 16036 Rethrowing. 
    B│╣d w poleceniu 'doTryCatch(return(expr), name, parentenv, handler)': 
      ERROR: The sample data set for the analysis has no variables. 
    Wywo│ania: source ... tryCatch -> tryCatchList -> tryCatchOne -> doTryCatch -> .Call 
    Wykonywanie wstrzymane 
    Error in rxCompleteClusterJob(hpcServerJob, consoleOutput, autoCleanup) : 
      No results available - final job state: failed

    varsToKeep and varsToDrop not supported? Why?

    Sample data has no variables? SQL connection string works fine (can read variables form SQL Server)?

    Any ideas?

    Thank you for your support.

    Sunday, February 05, 2017 12:00 PM