locked
ADLS Gen2: REST API - Flush ends in error RRS feed

  • Question

  • I'm updating a file which has already 44 bytes contents.

    import requests

    headers0 = {
        'Authorization': "Bearer %s" %token1,
        'x-ms-version': '2018-06-17',
         'Content-Length': '4'
        
    }
    data0 = { "data"
    }

    params = (
        ('action', 'append'),
        ('position', '48'),
    )

    response = requests.patch('https://XXXXXXXX.dfs.core.windows.net/mydata/data/file1', headers=headers0, params=params, data=data0)

    Then i try to flush the data to the file:

    headers0 = {
        'Authorization': "Bearer %s" %token1,
        'x-ms-version': '2018-06-17',
        'content-length': '0'    
    }
    response01 = requests.patch('https://XXXXXXXX.dfs.core.windows.net/mydata/data/file1?action=flush&position=48', headers=headers0)
    print(response01.content)
    print (' ')
    print(response01.headers)

    b'{"error":{"code":"InvalidFlushPosition","message":"The uploaded data is not contiguous or the position query parameter value is not equal to the length of the file after appending the uploaded data.\\nRequestId:2ebfb7b2-f01f-00fa-046d-74ae3d000000\\nTime:2019-09-26T13:25:55.2219737Z"}}' {'Content-Length': '284', 'Content-Type': 'application/json;charset=utf-8', 'Server': 'Windows-Azure-HDFS/1.0 Microsoft-HTTPAPI/2.0', 'x-ms-error-code': 'InvalidFlushPosition', 'x-ms-request-id': '2ebfb7b2-f01f-00fa-046d-74ae3d000000', 'x-ms-version': '2018-06-17', 'Date': 'Thu, 26 Sep 2019 13:25:54 GMT'}

    Thursday, September 26, 2019 1:36 PM

All replies

  • Hello F.he and thank you for your inquiry.  I will begin repro shortly.
    Monday, September 30, 2019 9:15 PM
  • Let me share my findings.  My apologies if this is a bit pedantic, as I try to be precise.

    I start with a simple text file with a content length of 7. ("foo,foo")

    When I append at position 7 , and a content length of 4 (",bar")
    And then I flush at position 7, with a content length 0, the status code is 202 accepted.
    Then when I read the file, I see the content length is 7 ("foo,foo").
    This is because the position in the flush call is the total new length of the file.  The initial length and the flush are the same, so nothing was added.

    When I append at position 7 , and a content length of 4 (",bar")
    And then I flush at position 11, witha content length of 0, the status code is 202 accepted.
    Then when I read the file, I see the content length is 11 ("foo,foo,bar").
    This is because the position in the flush call is the total, 7+4=11.

    After the previous success, I try again.  The body is currently "foo,foo,bar".
    I append at position 11, with a content length of 4(",baz")
    And then I flush at position 7, with a content length of 0.  The status code is 400, and I get your error message.
    This is because the new length I specified (flush position 7) is shorter than the initial length.

    The content is still "foo,foo,bar".
    I append at position 3, with a content length of 4 (",baz")
    And then I flush at position 11 with content length 0, The status code is 200 OK.
    Then when I read the file I see content length of 11 ("foo,foo,bar").  Nothing changed.

    I try the last one again, but flush with position 15, and I get your error.

    Findings:
    On the append call, the position should be the size of your file.  The content length should be the size of the body to add.
    On the flush call, the position should be the total new length ( initial + body)

    Monday, September 30, 2019 10:00 PM
  • Did you test it in python?

    I have always the same error:

    import requests

    headers0 = {
        'Authorization': "Bearer %s" %token1,
        'x-ms-version': '2018-06-17',
         'Content-Length': '7'
        
    }
    data0 = { "foo,foo"
    }

    params = (
        ('action', 'append'),
        ('position', '7'),
    )

    response01 = requests.patch('https://storengyacct1.dfs.core.windows.net/mydata/data/file2', headers=headers0, params=params, data=data0)
    print(response01.content)
    print (' ')
    print(response01.headers)

    headers0 = {
        'Authorization': "Bearer %s" %token1,
        'x-ms-version': '2018-06-17',
        'content-length': '0'    
    }
    response01 = requests.patch('https://storengyacct1.dfs.core.windows.net/mydata/data/file2?action=flush&position=7', headers=headers0)
    print(response01.content)
    print (' ')
    print(response01.headers)

    b'{"error":{"code":"InvalidFlushPosition","message":"The uploaded data is not contiguous or the position query parameter value is not equal to the length of the file after appending the uploaded data.\\nRequestId:ebf4d40e-d01f-00ed-291d-790736000000\\nTime:2019-10-02T12:34:59.0380616Z"}}'

    Wednesday, October 2, 2019 12:39 PM
  • I will test it using your code today/tomorrow.
    Wednesday, October 2, 2019 9:31 PM
  • Okay.  I made an empty file2, and then ran your last code post.  I did get your error.  I did have to make a couple changes:

    • changed x-ms-version to '2018-11-09'
    • needed to encode the data as bytes for the append call

    I'm going to keep working on this

    Thursday, October 3, 2019 7:16 PM
  • So, while working on this, I ended up throwing together a wrapper/client.  I do still get that 'not contiguous' error, and I'm trying to nail down exactly what the cause is.  I'd like to share what I have so far with you.

    import requests
    
    '''
    What am I: A crude prototype client for working with Azure Data Lake Gen2 Rest API using Python 3
    
    Legal stuff:  Use at your own risk.  No warrantee provided.  While author is associated with Microsoft, this code is not an official product, and Microsoft takes no responsibility for any negative outcomes.
    
    License: Not yet decided.
    
    Written by: Martin Jaffer on 2019-10-03
    Some code to be attributed to MSDN user 'F.he'
    See: https://social.msdn.microsoft.com/Forums/en-US/db5cb32e-7f4b-4b0a-869e-64056498ce8d/adls-gen2-rest-api-flush-ends-in-error?forum=AzureDataLake
    
    Library requirements: pip install requests
    
    Usage:
    	Instantiate passing the bearer token  (token acquisition not included here).  Leave off the 'Bearer ' prefix.
    	After that  provide the details for constructing the URL like so:
    		gen2.make_path('MyStorageAccountName','MyContainerName','MyFolder/MyFile')
    	All subsequent actions (do_appen, do_flush, do_read, do_touch) operate on that path.
    	Change path by calling make_path again.
    '''
    
    class gen2:
    	
    	def __init__(self, newtoken):
    		self.token = newtoken
    		print("Token set")
    		self.path = ''
    		print("Please make_path(acct,container,filepath)")
    	
    	def make_path(self,acct, container, filepath):
    		''' output should looke like 'https://mystorage.dfs.core.windows.net/container/folder/file' '''
    		self.path = 'https://' + acct + '.dfs.core.windows.net/' + container + '/' + filepath
    		return self.path
    
    	def make_header(self,length):
    		return {
    			'Authorization': "Bearer %s" %self.token,
    			'x-ms-version': '2018-11-09',
    			'Content-Type': 'text/plain',
    			'Content-Length': str(length)
    		}
    
    	def make_params(self,act,length):
    		#return (
    		#	('action',act),
    		#	('position',str(length))
    		#)
    		return '?action='+act+'&position='+str(length)
    
    	def do_append(self,body='foo,foo', position=0):
    		#response = requests.patch(self.path, headers=self.make_header(len(body)), params=self.make_params('append',len(body)), data=body.encode())
    		response = requests.patch(self.path+self.make_params('append',len(body)), headers=self.make_header(len(body)), data=body.encode())
    		print(response.headers)
    		print()
    		print(response.content)
    		print('body length: ' + str(len(body)))
    		
    	def do_flush(self,position):
    		#response = requests.patch(self.path, headers=self.make_header(0), params=self.make_params( 'flush',position))
    		response = requests.patch(self.path+self.make_params('flush',position), headers=self.make_header(0))
    		print(response.headers)
    		print()
    		print(response.content)
    		
    	def do_read(self):
    		response = requests.get(self.path, headers=self.make_header(0))
    		print(response.headers)
    		print()
    		print(response.content)
    		print('body length: ' + str(len(response.content)))
    	
    	def do_touch(self, is_folder=False):
    		'''Aka create new file (without body).  use "is_folder=True" if you want to make folder instead'''
    		if is_folder:
    			response = requests.put(self.path + '?resource=directory', headers=self.make_header(0))
    		else:
    			response = requests.put(self.path + '?resource=file', headers=self.make_header(0))
    		print(response.headers)
    		print()
    		print(response.content)

    Odd thing is, I find that error when I add to an empty file.  Once the file has stuff in it, I don't get the error.  ANother odd  thing is, I can write the 'requests.patch' call by hand and it goes through.  Anyway, take a look and tell me what you think.

    Friday, October 4, 2019 1:08 AM
  • I took a closer look at the requests library, and I came across this warning:

    Warning

    It is strongly recommended that you open files in binary mode. This is because Requests may attempt to provide the Content-Length header for you, and if it does this value will be set to the number of bytes in the file. Errors may occur if you open the file in text mode.


    Friday, October 4, 2019 9:45 PM
  • I had success with:

    Starting from an empty file:
    h1 = {'Authorization': 'Bearer ...', 'x-ms'version': '2018-11-09', ''Content-Length':'0'}
    
    h2 = {'Authorization': 'Bearer ...', 'x-ms'version': '2018-11-09', ''Content-Length':'2'}
    
    X = requests.patch(path, headers=h, params={'action':'append','position':'0'},data='ab')
    #<Response [202]>
    
    Y = requests.patch(path, headers=h2, params={'action':'flush','position':'2'})
    #<Response [200]>
    
    #contents: 'ab'
    
    X = requests.patch(path, headers=h, params={'action':'append','position':'2'},data='ab')
    #<Response [202]>
    
    Y = requests.patch(path, headers=h2, params={'action':'flush','position':'4'})
    #<Response [200]>
    
    #contents: 'abab'

    Monday, October 7, 2019 9:19 PM