locked
Can ADF use Databricks tables as a source? RRS feed

  • Question

  • This should be real simple, but I am having trouble finding the answer....

    I have some data in a Databricks table. This is a persisted table stored in the default Databricks File System. I want to read that table and transform the data into CSV, probably using the ADF Copy Data tool, although I am open to another ADF method. 

    I don't want ADF to call a Databricks notebook to do this. I don't want to write PySpark code. I want Data Factory to just read the table.

    Does Data Factory have this ability?

    Friday, January 10, 2020 10:14 PM

Answers

  • Here is the status for anyone else who bumps into this...

    - If the data is in the default DataBricks File System (DBFS), then Data Factory cannot see it.

    - If the data is stored by Databricks in Azure Data Lake, then Data Factory can see it there, bypassing Databricks. 

    The second scenario is actually pretty common. You create an "external" table in Databricks specifying the storage location as a Data Lake folder. 

    We just successfully used Data Factory to transform a Databricks table (in Delta/Parquet/Snappy format) into CSV files.

    • Marked as answer by Chuck Connell Monday, January 13, 2020 9:04 PM
    Monday, January 13, 2020 9:04 PM

All replies