none
An error while trying to use Sqoop on HDInsight to import data from SQL server to HDFS

    Question

  • while trying to use Sqoop to import data from SQL Server to HDFS - I get some errors.

    - I created my sqoop command by referring to http://mssqldude.wordpress.com/2012/11/12/big-data-with-sql-server-part-2-sqoop/

    -I am using SQL Server authentication, and yes the username and password are correct, tested that :)

    Here's the command:

    c:\Hadoop\sqoop-1.4.2\bin>sqoop import --connect "jdbc:sqlserver://localhost;dat
    abase=UniversityDB;username=sqoop;password=password" --table student --m 1

    Here is the message:

    13/01/04 15:32:07 INFO SqlServer.MSSQLServerManagerFactory: Using Microsoft's SQ
    L Server - Hadoop Connector
    13/01/04 15:32:07 INFO manager.SqlManager: Using default fetchSize of 1000
    13/01/04 15:32:07 INFO tool.CodeGenTool: Beginning code generation
    13/01/04 15:32:08 INFO manager.SqlManager: Executing SQL statement: SELECT TOP 1
     * FROM [student]
    13/01/04 15:32:09 INFO manager.SqlManager: Executing SQL statement: SELECT TOP 1
     * FROM [student]
    13/01/04 15:32:09 INFO orm.CompilationManager: HADOOP_HOME is c:\Hadoop\hadoop-1
    .1.0-SNAPSHOT
    Note: \tmp\sqoop-Paras\compile\3a98bb4b97772681ceaf5eb95ecbb095\student.java use
    s or overrides a deprecated API.
    Note: Recompile with -Xlint:deprecation for details.
    13/01/04 15:32:13 ERROR orm.CompilationManager: Could not rename \tmp\sqoop-Para
    s\compile\3a98bb4b97772681ceaf5eb95ecbb095\student.java to c:\Hadoop\sqoop-1.4.2
    \bin\.\student.java
    org.apache.commons.io.FileExistsException: Destination 'c:\Hadoop\sqoop-1.4.2\bi
    n\.\student.java' already exists
            at org.apache.commons.io.FileUtils.moveFile(FileUtils.java:2378)
            at org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.ja
    va:230)
            at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:83)
            at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:390)
            at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:476)
            at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
            at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
            at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
            at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
            at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
            at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
            at com.cloudera.sqoop.Sqoop.main(Sqoop.java:57)
    13/01/04 15:32:13 INFO orm.CompilationManager: Writing jar file: \tmp\sqoop-Para
    s\compile\3a98bb4b97772681ceaf5eb95ecbb095\student.jar
    13/01/04 15:32:13 INFO mapreduce.ImportJobBase: Beginning import of student
    13/01/04 15:32:14 INFO manager.SqlManager: Executing SQL statement: SELECT TOP 1
     * FROM [student]
    13/01/04 15:32:17 INFO ipc.Client: Retrying connect to server: localhost/127.0.0
    .1:50300. Already tried 0 time(s).
    13/01/04 15:32:18 ERROR security.UserGroupInformation: PriviledgedActionExceptio
    n as:Paras cause:java.io.IOException: Call to localhost/127.0.0.1:50300 failed o
    n local exception: java.io.IOException: An existing connection was forcibly clos
    ed by the remote host
    13/01/04 15:32:18 ERROR tool.ImportTool: Encountered IOException running import
    job: java.io.IOException: Call to localhost/127.0.0.1:50300 failed on local exce
    ption: java.io.IOException: An existing connection was forcibly closed by the re
    mote host
            at org.apache.hadoop.ipc.Client.wrapException(Client.java:1107)
            at org.apache.hadoop.ipc.Client.call(Client.java:1075)
            at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
            at org.apache.hadoop.mapred.$Proxy1.getProtocolVersion(Unknown Source)
            at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
            at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
            at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:480)

            at org.apache.hadoop.mapred.JobClient.init(JobClient.java:474)
            at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:457)
            at org.apache.hadoop.mapreduce.Job$1.run(Job.java:513)
            at java.security.AccessController.doPrivileged(Native Method)
            at javax.security.auth.Subject.doAs(Subject.java:396)
            at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInforma
    tion.java:1135)
            at org.apache.hadoop.mapreduce.Job.connect(Job.java:511)
            at org.apache.hadoop.mapreduce.Job.submit(Job.java:499)
            at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
            at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:14
    1)
            at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java
    :201)
            at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:464)
            at com.microsoft.sqoop.SqlServer.MSSQLServerManager.importTable(MSSQLSer
    verManager.java:151)
            at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:403)
            at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:476)
            at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
            at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
            at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
            at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
            at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
            at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
            at com.cloudera.sqoop.Sqoop.main(Sqoop.java:57)
    Caused by: java.io.IOException: An existing connection was forcibly closed by th
    e remote host
            at sun.nio.ch.SocketDispatcher.read0(Native Method)
            at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:25)
            at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
            at sun.nio.ch.IOUtil.read(IOUtil.java:171)
            at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
            at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputS
    tream.java:55)
            at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.ja
    va:142)
            at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:1
    55)
            at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:1
    28)
            at java.io.FilterInputStream.read(FilterInputStream.java:116)
            at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.j
    ava:342)
            at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
            at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
            at java.io.DataInputStream.readInt(DataInputStream.java:370)
            at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:8
    04)
            at org.apache.hadoop.ipc.Client$Connection.run(Client.java:749)

    -----------------

    And for a command for a different table:

    13/01/04 15:43:25 INFO SqlServer.MSSQLServerManagerFactory: Using Microsoft's SQ
    L Server - Hadoop Connector
    13/01/04 15:43:25 INFO manager.SqlManager: Using default fetchSize of 1000
    13/01/04 15:43:25 INFO tool.CodeGenTool: Beginning code generation
    13/01/04 15:43:27 INFO manager.SqlManager: Executing SQL statement: SELECT TOP 1
     * FROM [club]
    13/01/04 15:43:28 INFO manager.SqlManager: Executing SQL statement: SELECT TOP 1
     * FROM [club]
    13/01/04 15:43:28 INFO orm.CompilationManager: HADOOP_HOME is c:\Hadoop\hadoop-1
    .1.0-SNAPSHOT
    Note: \tmp\sqoop-Paras\compile\a97191b5d55a03da31d9f514c9a664ab\club.java uses o
    r overrides a deprecated API.
    Note: Recompile with -Xlint:deprecation for details.
    13/01/04 15:43:31 INFO orm.CompilationManager: Writing jar file: \tmp\sqoop-Para
    s\compile\a97191b5d55a03da31d9f514c9a664ab\club.jar
    13/01/04 15:43:31 INFO mapreduce.ImportJobBase: Beginning import of club
    13/01/04 15:43:31 INFO manager.SqlManager: Executing SQL statement: SELECT TOP 1
     * FROM [club]
    13/01/04 15:43:34 INFO ipc.Client: Retrying connect to server: localhost/127.0.0
    .1:50300. Already tried 0 time(s).
    13/01/04 15:43:37 ERROR security.UserGroupInformation: PriviledgedActionExceptio
    n as:Paras cause:java.io.IOException: Call to localhost/127.0.0.1:50300 failed o
    n local exception: java.io.IOException: An existing connection was forcibly clos
    ed by the remote host
    13/01/04 15:43:37 ERROR tool.ImportTool: Encountered IOException running import
    job: java.io.IOException: Call to localhost/127.0.0.1:50300 failed on local exce
    ption: java.io.IOException: An existing connection was forcibly closed by the re
    mote host
            at org.apache.hadoop.ipc.Client.wrapException(Client.java:1107)
            at org.apache.hadoop.ipc.Client.call(Client.java:1075)
            at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
            at org.apache.hadoop.mapred.$Proxy1.getProtocolVersion(Unknown Source)
            at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
            at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
            at org.apache.hadoop.mapred.JobClient.createRPCProxy(JobClient.java:480)

            at org.apache.hadoop.mapred.JobClient.init(JobClient.java:474)
            at org.apache.hadoop.mapred.JobClient.<init>(JobClient.java:457)
            at org.apache.hadoop.mapreduce.Job$1.run(Job.java:513)
            at java.security.AccessController.doPrivileged(Native Method)
            at javax.security.auth.Subject.doAs(Subject.java:396)
            at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInforma
    tion.java:1135)
            at org.apache.hadoop.mapreduce.Job.connect(Job.java:511)
            at org.apache.hadoop.mapreduce.Job.submit(Job.java:499)
            at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
            at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:14
    1)
            at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java
    :201)
            at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:464)
            at com.microsoft.sqoop.SqlServer.MSSQLServerManager.importTable(MSSQLSer
    verManager.java:151)
            at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:403)
            at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:476)
            at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
            at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
            at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
            at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
            at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
            at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
            at com.cloudera.sqoop.Sqoop.main(Sqoop.java:57)
    Caused by: java.io.IOException: An existing connection was forcibly closed by th
    e remote host
            at sun.nio.ch.SocketDispatcher.read0(Native Method)
            at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:25)
            at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
            at sun.nio.ch.IOUtil.read(IOUtil.java:171)
            at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
            at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputS
    tream.java:55)
            at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.ja
    va:142)
            at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:1
    55)
            at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:1
    28)
            at java.io.FilterInputStream.read(FilterInputStream.java:116)
            at org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.j
    ava:342)
            at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
            at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
            at java.io.DataInputStream.readInt(DataInputStream.java:370)
            at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:8
    04)
            at org.apache.hadoop.ipc.Client$Connection.run(Client.java:749)

    What can i try to solve these errors?


    Paras Doshi (Blog: ParasDoshi.com | Twitter: @Paras_Doshi )

    Friday, January 4, 2013 9:47 PM

Answers

  • You need to enable the TCP/IP protocol for SQL Server (from SQL Server Configuration Manager)

    Restart the SQL Server service

    Start ‘SQL Server Browser’ service (change start mode to 'Automatic' and restart the service)

    Please let me know if that solves your problem.

    Thanks, Aviad


    aviade

    Sunday, January 6, 2013 7:22 AM
  • You need to enable the TCP/IP protocol for SQL Server (from SQL Server Configuration Manager)

    Restart the SQL Server service

    Start ‘SQL Server Browser’ service (change start mode to 'Automatic' and restart the service)

    Please let me know if that solves your problem.

    Thanks, Aviad


    aviade

    • Marked as answer by Paras Doshi Sunday, January 6, 2013 8:13 PM
    Sunday, January 6, 2013 7:22 AM
  • Thanks, that helped. I started SQL Browser.

    Also, for some reason I had to debug my instillation because datanode was not running for some reason. I opened the HDInsight Dashboard and saw the message "waiting for the cluster to startup.."

    So after SQL Browser & debugging installation, I was able to import data into HDFS from SQL Server!

    Thanks, appreciate your help! :)


    Paras Doshi (Blog: ParasDoshi.com | Twitter: @Paras_Doshi )

    • Marked as answer by Paras Doshi Sunday, January 6, 2013 8:16 PM
    Sunday, January 6, 2013 8:13 PM

All replies

  • You need to enable the TCP/IP protocol for SQL Server (from SQL Server Configuration Manager)

    Restart the SQL Server service

    Start ‘SQL Server Browser’ service (change start mode to 'Automatic' and restart the service)

    Please let me know if that solves your problem.

    Thanks, Aviad


    aviade

    • Marked as answer by Paras Doshi Sunday, January 6, 2013 8:13 PM
    Sunday, January 6, 2013 7:22 AM
  • You need to enable the TCP/IP protocol for SQL Server (from SQL Server Configuration Manager)

    Restart the SQL Server service

    Start ‘SQL Server Browser’ service (change start mode to 'Automatic' and restart the service)

    Please let me know if that solves your problem.

    Thanks, Aviad


    aviade

    Sunday, January 6, 2013 7:22 AM
  • Thanks, that helped. I started SQL Browser.

    Also, for some reason I had to debug my instillation because datanode was not running for some reason. I opened the HDInsight Dashboard and saw the message "waiting for the cluster to startup.."

    So after SQL Browser & debugging installation, I was able to import data into HDFS from SQL Server!

    Thanks, appreciate your help! :)


    Paras Doshi (Blog: ParasDoshi.com | Twitter: @Paras_Doshi )

    • Marked as answer by Paras Doshi Sunday, January 6, 2013 8:16 PM
    Sunday, January 6, 2013 8:13 PM