locked
query_custom or override the default query between Spark and CosmosDB RRS feed

  • Question

  • Hello,

    I am querying CosmosDB from Spark and I expect by overriding the default query with query_custom to have only one record - I know it because the c.sectionA.id is unique. By executing the query itself against the CosmosDB (in the browser) there is one document in the result.
    But the code below returns more than one, so it means my query (or query_custom) is never executed.

    How is it possible to control/override what query is executed ? What is the default query ? (my guess: select * from c)

    What I have found about query_custom :

    query_custom: Sets the CosmosDB query to override the default query when fetching data from CosmosDB. Note that when this is provided, it will be used in place of a query with pushed down predicates as well.

    val query = "SELECT * FROM c where c.sectionA.id ='123456789ABCDEF'"
    
    val readConfig = Config(Map("Endpoint" -> "https://abc-cosmos.documents.azure.com:443/",
                               "Masterkey" -> "123456789ABCDEF==",
                               "Database" -> "data",
                               "Collection" -> "collectionABC", 
         /*"query_custom" -> "SELECT * FROM c where c.sectionA.id ='123456789ABCDEF', */
                              "query_custom" -> query,  
                              "SamplingRatio" -> "1.0"))
    
    val coll = spark.sqlContext.read.cosmosDB(readConfig)
    coll.createOrReplaceTempView("tempView")
    
    val q1 = "select * from tempView"
    val df = spark.sql(q1)
    
    df.show()
    

    Thank you for your help!

    PS: I have posted it first to Stackoverflow...

    Friday, November 29, 2019 6:59 AM

All replies

  • Hi Dagriq,

    Can reformat your code per the following, to alleviate any issues with the multiple variables with the code example you have posted:

    // Import Necessary Libraries
    import com.microsoft.azure.cosmosdb.spark.schema._
    import com.microsoft.azure.cosmosdb.spark._
    import com.microsoft.azure.cosmosdb.spark.config.Config
    
    // Read Configuration
    val readConfig = Config(Map(
      "Endpoint" -> "https://abc-cosmos.documents.azure.com:443/",
      "Masterkey" -> "123456789ABCDEF==",
      "Database" -> "data",
      "Collection" -> "collectionABC",
      "query_custom" -> "SELECT * FROM c where c.sectionA.id ='123456789ABCDEF'"
    ))
    
    // Connect via azure-cosmosdb-spark to create Spark DataFrame
    val query = spark.read.cosmosDB(readConfig)
    query.count()

    I don't think the Spark DataFrame is being implemented in your case, so I wanted to simplify and go from there.

    More information can be found: Batch reads from Cosmos DB.

    Updated note: Yes, the default query is select * from c

    Thanks,

    Mike

    Linking Stack Overflow thread: how to use query_custom or override the default query
    Friday, November 29, 2019 7:02 PM
  • Please let me know if you are still experiencing issues? 

    Regards,

    Mike

    Friday, December 6, 2019 3:45 AM
  •  Hello Mike,

    Yes, it is the same issue... the statement defined in the [query_custom] key does not get executed at all. Does it work for you ? 

    Thank you!

    Friday, December 6, 2019 4:58 AM
  • Hi Dagriq,

    Let's get this issue escalated to the Cosmos DB product group. Can you create an Azure Support Request? If you do not have an Azure Support Plan, I can have a one-time support request created to have this specific issue investigated. Can you please send me your Azure Subscription ID. I will then return next steps. With a Support Requested created I can escalate this to the product group to have this looked at. Please the requested information to AzCommunity

    Regards,

    Mike


    Monday, December 9, 2019 8:19 PM