MS RPC compression
Hi there,
The project I am working on involves analyzing MS Exchange traffic. Right now I am working on payload decompression.
The MS document, [MS-OXCRPC].pdf, gives a description of a decompressing algorithm but there are some unclear parts in it. So I'd like to know what I should do if the length of the match is greater than the offset. Say the length is 5 and the offset is 2.
I hope someone from Microsoft can clarify this for me.
Thanks,
Igor
Risposte
This means that the there is a problem with the compressed (LZ77) payload so the server will return the error code ecRpcFormat (0x000004B6).
- Proposto come rispostaTom Devey - MSFTMSFT, Moderatoremercoledì 25 giugno 2008 20.55
Whew. I was not t able to repro your scenario and was about to ask for a sample trace for me to debug.
Thanks for the update.
- Hi Andy,
Just spotted your question.
Here's what "The classic LZ77" says:
"Many documents which talk about LZ77 algorithms describe a length-distance pair as a command to "copy" data from the sliding window: "Go back distance characters in the buffer and copy length characters, starting from that point." While those used to imperative programming may find this model intuitive, it may also make it hard to understand a feature of LZ77 encoding: namely, that it is not only acceptable but frequently useful to have a length-distance pair where the length actually exceeds the distance. As a copy command, this is puzzling: "Go back one character in the buffer and copy seven characters, starting from that point." How can seven characters be copied from the buffer when only one of the specified characters is actually in the buffer? Looking at a length-distance pair as a statement of identity, however, clarifies the confusion: each of the next seven characters is identical to the character that comes one before it. This means that each character can be determined by looking back in the buffer – even if the character looked back to was not in the buffer when the decoding of the current pair began. Since by definition a pair like this will be repeating a sequence of distance characters multiple times, it means that LZ77 incorporates a flexible and easy form of run-length encoding."
Igor- Proposto come rispostaChris MullaneyMSFT, Proprietariovenerdì 7 agosto 2009 18.25
- Contrassegnato come rispostaChris MullaneyMSFT, Proprietariogiovedì 13 agosto 2009 22.20
Tutte le risposte
This means that the there is a problem with the compressed (LZ77) payload so the server will return the error code ecRpcFormat (0x000004B6).
- Proposto come rispostaTom Devey - MSFTMSFT, Moderatoremercoledì 25 giugno 2008 20.55
- Thanks Tom for such a quick reply!
From what I saw the situation when the length of a match is bigger than an offset is pretty common. It is hard to believe that there were so many faulty packets.
The classic LZ77 describes this particular case and I implemented it. So now my algorithm works OK if the length is bigger than the offset AND there is NO additional metadata byte. I am curious if this is just a coincidence.
I handle extra length byte this way. If the last 3 bits of a metadata word are set, I read an extra byte. Its high nibble is used to calculate the length, since it is additive. The low order nibble is remembered for future use. The next time the last 3 bits of a metadata word are 1s I do not read an extra byte but use stored low order nibble to calculate the real length. And then if the third time last 3 bits of a metadata are set, I read the extra byte use high order nibble, store low order one and so on... If this additive, high or low order nibble, is all 1s, I read and use another extra byte wich value is additive.
There may be 2 additional bytes if previously I got 111 + 1111 + 11111111 but this never happened to me.
I hope my understanding of this algorithm is correct. Do you see what I am missing or getting wrong?
Sorry for wordiness but it is really frustrating not being able to get it right.
Igor
Igor,
Let me dig into this a bit more and look at the actual code to make sure that I'm not missing something.
Thanks!
- Thank you Tom!
Looking forward to hearing from you!
Igor - Tom,
We finally figured out the problem.
The [MS-OXCRPC] document says that if the match length is greater than 9 then we have to read an additional byte and use high-order nibble as an additive length. The low-order nibble should be kept and used when the next time the match length is greater than nine. As it turned out we should do quite opposite: use LOW-order nibble the first time and HIGH-order nibble the next time.
Igor- Proposto come rispostaTom Devey - MSFTMSFT, Moderatoremercoledì 25 giugno 2008 20.56
Whew. I was not t able to repro your scenario and was about to ask for a sample trace for me to debug.
Thanks for the update.
- Tom,
I'm not sure if you filed a bug for this but it is still not corrected in the 1.00 documentation.
Philippe - Hi Phillipe.
I don't know whether or not Tom filed a bug, but I'll follow up with him.
Thanks,
Robert. Phillipe,
To bring closure on your feedback I did file a Technical Document Issue (TDI) some time ago. The document will be updated to modify the wording "high-order nibbles" to "low-order nibbles" in paragraph 3.1.5.2.2.4. The subsequent table following the paragraph will also be updated to reflect the change.
The fix will be in a future release of the documentation.
Thank you for providing this great feedback and please don't hesitate to provide additional feedback.
Developer Consultant- Proposto come rispostaTom Devey - MSFTMSFT, Moderatoregiovedì 2 ottobre 2008 21.07
- ModificatoTom Devey - MSFTMSFT, Moderatoregiovedì 2 ottobre 2008 21.32Grammar
- Hi Igor,
I am having exactly the same issues trying to decode some MS Exchange traffic. Can you tell me what you mean when you say:
"The classic LZ77 describes this particular case and I implemented it. So now my algorithm works OK if the length is bigger than the offset AND there is NO additional metadata byte. I am curious if this is just a coincidence."
In particular, what does the LZ77 algorithm say if the Length is greater than the Offset?
In my particular case, if I assume that all the lengths are correct I can calculate the size of the decompressed data. This agrees with the actual size specified in the RPC_HEADER_EXT structure. So, I'm thinking that the lengths are OK but perhaps it's the offsets that are incorrect.
As the calculated length is adjust by adding 3 (because the minimum match is 3 bytes), then it doesn't make sense to allow/have offsets of -1 or -2 which is what I see sometimes. So I'm (desparately) hoping that the caluclate offset should be adjusted accordingly.
Can anyone confirm this?
Thanks,
Andy. - Hi Andy,
I will research this and get back to you on your question.
Developer Consultant - Hi Andy,
Just spotted your question.
Here's what "The classic LZ77" says:
"Many documents which talk about LZ77 algorithms describe a length-distance pair as a command to "copy" data from the sliding window: "Go back distance characters in the buffer and copy length characters, starting from that point." While those used to imperative programming may find this model intuitive, it may also make it hard to understand a feature of LZ77 encoding: namely, that it is not only acceptable but frequently useful to have a length-distance pair where the length actually exceeds the distance. As a copy command, this is puzzling: "Go back one character in the buffer and copy seven characters, starting from that point." How can seven characters be copied from the buffer when only one of the specified characters is actually in the buffer? Looking at a length-distance pair as a statement of identity, however, clarifies the confusion: each of the next seven characters is identical to the character that comes one before it. This means that each character can be determined by looking back in the buffer – even if the character looked back to was not in the buffer when the decoding of the current pair began. Since by definition a pair like this will be repeating a sequence of distance characters multiple times, it means that LZ77 incorporates a flexible and easy form of run-length encoding."
Igor- Proposto come rispostaChris MullaneyMSFT, Proprietariovenerdì 7 agosto 2009 18.25
- Contrassegnato come rispostaChris MullaneyMSFT, Proprietariogiovedì 13 agosto 2009 22.20

