locked
Create CosmosDB graph in python in Scale RRS feed

  • Question

  • I am trying to import a local graph into the Cosmos DB graph database in python with Gremlin API. I followed the instructions under the instruction in create-graph-python in Cosmos DB

    A couple of question when I am trying to upload a large database(~15k vertices)

    • I can upload 15-20 nodes, however the uploading will stop with more nodes(by stopping I mean the client don't respond anymore and there is no errors too)
    • I thought a time.sleep(1) between two queries, but the issue in the first point is still the same
    • How can I check the backlog of the Cosmos DB in the portal?
    • Why there is no error in client side event if the query is not valid....which is frustrating during debugging phase.

    Thank you for your help!


    Thursday, March 14, 2019 3:44 PM

All replies

  • Hi Shawn,

    When you are using the Azure Portal, you are making REST calls to your database instance via the applicable driver depending on the API Service you are subscribed. This interface has error handling built in. With your client, you may not have the necessary logic to capture specific failures. What is the the specific error you are seeing?

    If you are seeing RU limit errors (429 responses), the best course of action is to raise the RU threshold for your database or container to allow the load to complete uninterrupted, followed by reducing the RU threshold limits to their initial value. 

    Please let us know what is the specific issue.

    Thanks,

    Mike

    P.S. If this answered or resolved your issue, please mark as answered. If not, please provide additional details or detail the resolution steps you implemented and mark that as answered. It helps others who are experiencing the same issue. 

    Thursday, March 14, 2019 5:05 PM
  • Hi Mike,

    Thanks for the prompt resopnse. The error that I mentioned in my question is actually the wrong syntax of my query.

    I am not sure it is RU limit, since even I do request every second, the issue is still there, the client in my python just stopped responding without any error message.

    Do you get any suggestion where I can look into the database backlog, so that at least I can get a chance to know where have I done wrong.

    Shawn




    Friday, March 15, 2019 3:35 PM
  • Hi Mike,

    I just tested the code to only send the query with id without any properties, like this `g.addV('{label}').property('id', '{id}')` and the data uploading works fine. However, if I attached all of my properties around 10-20, the client will stop uploading after some time.

    Friday, March 15, 2019 4:19 PM
  • Thank you for the additional information. I am going to ask for additional items:

    • Which driver is your client using and the version.
    • Client SDK language and version.
    • Current RU threshold property values and are these applied to the database or container?

    I will edit your initial comment from today (removing the grammerly tags...recommend disabling this plugin for MSDN) but I think I read that you are using a Phython SDK for the client...which confirms that you are not taking advantage of the response headers indicating RU usage and exceeding threshold, etc. Those are very helpful in this situation.

    Friday, March 15, 2019 5:20 PM
  • Okay, I see that you believe this is an issue with your query and need help debugging it. Can you provide the query that is failing? Also, in the Azure Portal, there are a set of pre-built views that allow you to see activity. Can you look at these reports to see if there is any indication of an issue, such as at the point where the process hangs. What is the % of resource allocation given the service tier the solution is deploy at. This can be the easiest to look into as there is no set up, just navigate to the Metrics blade in the portal.

    Additionaly, here is a document that covers the process of diagnostic logging:Diagnostic logging in Azure Cosmos DB

    I hope this helps, Mike

    Friday, March 15, 2019 5:48 PM
  • Yes, you are right. I am using python(3.6.5) with gremlin_python(3.2.7).

    The RU I put is 10,000 which is the maximum as far as I can see from the portal. I am not sure I understand the difference between database and container correctly, I changed the setting through the "Scale&Settings" button in the graph database I am working with.

    Friday, March 15, 2019 7:28 PM
  • I have a callback object returned by the gremlin python API

    "callback = cosmos.submitAsync(query)", but I am not sure if the response headers that you are mentioning is included inside it
    Friday, March 15, 2019 7:49 PM
  • I looked into the metrics, the RU limit haven't been hitted at all with maximum of 1k request/s and threshold of 10k.

    another observation that I found out is that for now I don't think is the problem with ,since even if I skip the query without any response, all the following queries are not responded anymore. look like the client is blocked at the moment.
    • Edited by shawn_bmt Friday, March 15, 2019 8:02 PM
    Friday, March 15, 2019 7:58 PM
  • A couple of alternatives to upload data to Cosmos DB I have tried:

    1. bulk insert, which looks like is not support Gremlin API yet(https://docs.microsoft.com/en-us/azure/cosmos-db/import-data). this tool https://github.com/Azure-Samples/azure-cosmosdb-graph-bulkexecutor-dotnet-getting-started/ is for .NET which I haven't use it before. Do you have any recommendation that. I can do bulk insert through python?(It is frustrating without bulk insert when there is a large amount of data)
    2. upload through UI: I tried different format but none of them works the best is the following:

    "[

        {"type":"vertex", "label": "person", "name":  "test"}
    ]"

    which can create a node with label person and no properties (name is not included in the node). Could you point me to the place where I can get the right format?

    Thanks!

    Shawn Jin


    • Edited by shawn_bmt Friday, March 15, 2019 8:38 PM
    Friday, March 15, 2019 8:37 PM
  • Please take a look at this tutorial for additional Python examples using the Gremlin driver: Quickstart: Create a graph database in Azure Cosmos DB using Python and the Azure portal

    There is a connect.py file that you can source for additional examples

    Here is a basic dataset as an example:

    _gremlin_insert_vertices = [
        "g.addV('person').property('id', 'thomas').property('firstName', 'Thomas').property('age', 44)",
        "g.addV('person').property('id', 'mary').property('firstName', 'Mary').property('lastName', 'Andersen').property('age', 39)",
        "g.addV('person').property('id', 'ben').property('firstName', 'Ben').property('lastName', 'Miller')",
        "g.addV('person').property('id', 'robin').property('firstName', 'Robin').property('lastName', 'Wakefield')"
    ]

    _gremlin_insert_edges = [
        "g.V('thomas').addE('knows').to(g.V('mary'))",
        "g.V('thomas').addE('knows').to(g.V('ben'))",
        "g.V('ben').addE('knows').to(g.V('robin'))"  
    ]

    The response headers I speak of are part of the .NET C# client libraries and not completely accessible in Python.

    Common Azure Cosmos DB REST response headers 

    I hope this information is helpful.

    Regards,

    Mike

    • Proposed as answer by angoyal-msft Wednesday, March 20, 2019 4:10 AM
    Friday, March 15, 2019 10:16 PM
  • Hi ,

    When performing bulk load on cosmos graph do consider BulkExecuator library along with RU unit set on graph collection

    https://docs.microsoft.com/en-us/azure/cosmos-db/bulk-executor-graph-dotnet  


    If this answers your question please mark it accordingly. If this post is helpful, please vote as helpful by clicking the upward arrow mark next to my reply

    Monday, March 18, 2019 2:24 AM