none
Binary containers in files generated by OneNote Online service RRS feed

  • Question

  • Hello all!

    I need to extract all binary files from a file generated by OneNote Online service.

    I managed to figure out that containers have the following format:

    [0x05 0x0C][16 bytes][0x80] [24 byte] [0x15] [0x10 or 0x12] [LoD + sizeof(LoD)][length of data(let's name it as LoD)] [data]

    I struggled at the following:

    1. To decode 16 bytes after [0x05 0x0C](I think it is CRC, isn't it?)

    2. Identify rules of building length fields. They are packed as bitstream.

    Can I get some help about these questions?

    Thanks in advance!

    Wednesday, May 16, 2018 3:21 PM

Answers

All replies

  • Hi AlexeiSo,

    Thank you for your inquiry about Microsoft Office Specifications. We have created an incident for investigating this issue. One of the Open specifications team member will contact you to assist further.

    Thanks


    Tarun Chopra | Escalation Engineer | Open Specifications Support Team

    Wednesday, May 16, 2018 8:14 PM
  • Hi AlexeiSo,

    I will assist you with this issue. Are you referring to file attachments in a .one file or just the file data (sections/pages)? We have the following documents which cover the .one file format: 

    [MS-ONE]

    [MS-ONESTORE]

    These describe the structures used in the file format. I should draw your attention to the fact that these documents cover the format used for OneNote desktop application not the online service. They do have some information about format used when transferred over network by Sharepoint. I'm checking with our OneNote team to see if the format is different. 

    In the meantime, please refer to these documents. Start with [MS-ONE] as the high level and it will refer to structures in [MS-ONESTORE]. [MS-ONESTORE] 2.1.2 Cyclic Redundancy Check (CRC) Algorithms describes the calculation used in these formats. The 0x0C and others you are looking at, I believe, are the property id's that are described in 2.1.1 Property Set. 

    I hope this helps.

    Best regards,
    Tom Jebo
    Sr Escalation Engineer
    Microsoft Open Specifications


    Thursday, May 17, 2018 2:47 AM
    Moderator
  • >>I'm checking with our OneNote team to see if the format is different. 

    I've confirmed with our OneNote team and OneNote Online will use the same format that is described in [MS-ONE] and [MS-ONESTORE]. Please have a look at these references I shared earlier and let me know with which specific piece you need clarification.

    Thanks,

    Tom

    Thursday, May 17, 2018 4:02 PM
    Moderator
  • Thanks for your efforts, Tom.

    I will check it on my side and be back soon.

    Thursday, May 17, 2018 7:25 PM
  • >>I'm checking with our OneNote team to see if the format is different. 

    I've confirmed with our OneNote team and OneNote Online will use the same format that is described in [MS-ONE] and [MS-ONESTORE]. Please have a look at these references I shared earlier and let me know with which specific piece you need clarification.

    Thanks,

    Tom

    Thanks for the reply.

    I have read both these documents.

    Format for usual one-files is well documented.
    We have no problems with parsing them:
    RootFileNodeList -> FileDataStoreObjectReferenceFND -> FileNodeChunkReference -> FileDataStoreObject

    But Microsoft OneNote Online service generates documents in different way.
    I couldn't find any docs describing this format.

    I made an experiment.

    One-files generated by OneNote Online I opened using OneNote desktop app and saved them as usual one-files.
    After that I compared binary containers in files of both formats.

    The format for binary containers isn't the same.

    As I mentioned above OneNote Online format has the following structure for binary containers:
    [GUID] [tricky header with encoded length for raw data] [raw data from file]

    I examined files generated by OneNote desktop app. Binary containers storing my files has different format.
    For example, [raw data from file] is prepended by a number of zeroes without any structure encoding length of the data.

    Wednesday, June 13, 2018 1:09 PM
  • Hello AlexeiSo,

    I'm thinking that you might have missed section 2.7 "Transmission by Using the File Synchronization via SOAP Over HTTP Protocol" which details the format the is used by our online services when transferring OneNote files from Sharepoint. 

    Just to be sure, if you can send me one of the downloaded data files (not the file saved by OneNote desktop but the file as downloaded), I can also verify this. I want to make sure we're looking at the same data.

    Please send an email to dochelp at Microsoft.com and I will provide a secure share where you can upload your files for analysis.

    Please do not send any files through email.

    Best regards,
    Tom Jebo
    Sr Escalation Engineer
    Microsoft Open Specifications
    Wednesday, June 13, 2018 11:53 PM
    Moderator
  • Hi AlexeiSo, 

    Just checking if you are watching this thread. I also wanted to bring your attention to [MS-ONESTORE] 2.8 "2.8 Alternative Encoding Using the File Synchronization via SOAP Over HTTP Protocol" as well as 2.7. These may explain the GUIDs and "tricky" headers (these are stream object headers in [MS-FSSHTTPB] 2.2.1.5 "Stream Object Header") with encoded lengths (these are likely the compact encodings for lengths in [MS-FSSHTTPB] 2.2.1.1 "Compact Unsigned 64-bit Integer"). 

    Please check these and let me know if this is what you're looking for. The OneStore format as downloaded by OneNote Online from Sharepoint or OneDrive will have a format that leverages [MS-FSSHTTPB]. If I've not completely understood your question or what you mean by "binary containers", then it would still be helpful if you could email dochelp and we can arrange for you to upload a sample file and point out the structures and specification sections that are in question for you.

    Thanks,

    Tom

    Thursday, June 14, 2018 6:36 PM
    Moderator
  • Thanks, Tom.

    I have checked the document you mentioned. It gave me a lot of information but I still need a little bit of clarification.

    [MS-ONTESTORE] states in 2.7.6 that FileDataStoreObject is represented by Object Data BLOB Data Element, described in [MS-FSSHTTPB] section 2.2.1.12.8.

    This section describes this data element as the following:

    • Data Element Start (2 bytes)
    • Data Element Extended GUID (variable)
    • Serial Number (variable)
    • Data Element Type (variable)
    • Object Data BLOB (variable)
    • Data (variable): A byte stream that specifies the binary data opaque to this protocol.
    • Data Element End (1 byte)

    So having this description, I tried to apply it to my samples.

    Because of Data Element Extended GUID is of variable length it is hard to identify the first two parts:Data Element Start and Data Element Extended GUID.

    A Serial Number is of variable length but this length is either 1 byte for zero value or 25 bytes for any other. In my samples, it is 25 bytes length. Data Element Type is should be always the same (it turned out it is single byte of 0x15 value). Object Data BLOB and Data Element End are identified by corresponding descriptions. But there is an issue with Data part that I couldn’t resolve.

    Let`s take a look at my samples.

    Binary values in second column have leftmost least significant bit.

    Example 1. Attached binary file of 11 bytes length.

    '80'     00000001    Serial number, 25 bytes

    '1a'     01011000

    '43'     11000010

    '9c'     00111001

    '3f'      11111100

    'f4'      00101111

    '78'   00011110

    'd4'     00101011

    '1c'     00111000

    '13'     11001000

    '5b'     11011010

    '4a'     01010010

    'ab'     11010101

    '0b'     11010000

    'b3'     11001101

    'a1'     10000101

    'df'      11111011

    '0d'     10110000

    '00'     00000000

    '00'     00000000

    '00'     00000000

    '00'     00000000

    '00'     00000000

    '00'     00000000

    '00'     00000000

    '15'     10101000    Data Element Type, compact unsigned integer

    '10'     00001000    16-bit stream header

    '18'     00011000    00(A) . 0(B) . 010000(type) . 0011000(length, value of 12)

    '17'     11101000    Unknown byte: 1 . 1101000(length of my file, value of 11)

    … 11 bytes of my file …

    '0c'     00110000    Data Element End

    Example 2. Attached binary file of 126 bytes length.

    The serial number field is skipped.

    '15'     10101000    Data Element Type, compact unsigned integer

    '12'     01001000    32-bit stream header

    '00'     00000000    01(A) . 0(B) . 01000000000000(type) .

    'fe'      01111111    111111100000000 (length, value of 127)

    '00'     00000000

    'fd'      10111111    Unknown byte: 1 . 0111111 (length of my file, value of 126)

    … 126 bytes of my file …

    '0c'     00110000    Data Element End

    Example 3. Attached binary file of 16380 bytes length.

    '15'     10101000    Data Element Type, compact unsigned integer

    '12'     01001000    32-bit stream header

    '00'     00000000    01(A) . 0(B) . 01000000000000(type) .

    'fc'      00111111    011111111111110 (length, value of 16382)

    '7f'      11111110

    'f2'      01001111    Unknown 2 bytes:

    'ff'       11111111    01 . 00111111111111 (length of my file, value of 16380)

    … 16380 bytes of my file …

    '0c'     00110000    Data Element End

    Therefore, it seems in all cases content of my file is prepended by length of the file encoded as Compact Unsigned 64-bit Integer. And I couldn’t match this behavior with the specification. Can you help me with it?

    Friday, June 15, 2018 1:14 PM
  • Hi AlexeiSo, 

    I would really help if we were looking at the same sample file. Can you please send email to dochelp@microsoft.com and we can arrange to get the sample data from you. 

    Tom

    Friday, June 15, 2018 3:35 PM
    Moderator
  • Yes, I have sent an email.

    Subject: "File sharing".

    In the body there is a link to this discussion.

    Friday, June 15, 2018 6:59 PM
  • Thanks, got the file and working on it. 

    Tom

    Monday, June 25, 2018 4:59 PM
    Moderator
  • After discussing in email, we determined that:

    • Data (variable): A byte stream that specifies the binary data opaque to this protocol.

    In the above output is described in [MS-FSSHTTPB] 2.2.1.3 Binary Item

    We will review the document to determine if this can be clarified to make the connection between the two more obvious.

    Tom

    Friday, July 6, 2018 12:32 AM
    Moderator
  • Ok. Got it.

    Thank you, Tom!

    Thursday, July 12, 2018 9:52 PM