none
Saving structures to disks, reading them back via serialization. RRS feed

  • Question

  • I am facing a task of efficiently storing dynamically a few hundred structures. Each structure must be able to have members (substructures) represented by a pair (x,y) where x & y are integers. The structures might also contain signal amplitude information for each pair (x,y): V. I expect each structure contain equal number of bytes without variations. I have a rather vague idea at this point how it could be implemented. Expecting help :-)

    Thank you, - MyCatAlex





    • Edited by MyCatAlex Saturday, May 23, 2020 4:42 PM
    Friday, May 22, 2020 5:35 PM

Answers

  • What does "read back via serilization (sic)" mean?  Same question if you meant serialization.

    If there are only a few hundred, efficiency should not be your primary concern.

    What does it mean for a substructure to be "represented by a pair of integers"?

    If you want each structure to occupy the same number of bytes, does that mean each contains the same number of substructures (even if some are unused)?

    A caution: writing structures to disk in binary may cause them to be unusable if you later change compiler versions or options.

    • Marked as answer by MyCatAlex Friday, May 22, 2020 8:00 PM
    Friday, May 22, 2020 6:25 PM
  • So you have an array of 864 primary structures.  Each contains a few control items (such as max number of substructures and active number of substructures) and an array of 88 substructures.  Each substructure contains a pair of coordinates and a signal amplitude.

    Just to make the arithmetic easy, assume 100 substructures of 50 bytes each.  That is a total of 5000 bytes of substructures for each primary structure.  Add another 100 bytes for other structure data and each primary structure is 5100 bytes.  A thousand primary structures would occupy 5.1 MB.  They all should easily fit in memory on a computer running Windows 10.  You can process them at will without any need for addition disk access once the data has been loaded.

    My caution did not mention Windows 10 at all.  But your assumption of stability is unfounded.  It is changing constantly and not every change is an improvement or even transparent.  The real issue is the compiler and its options.

    The simplest safe way to store the data on disk is as a series of text values.  For example:
        line 1 would contain the control data for primary structure [0].
        lines 2 would contain the data for substructure [0] of this primary.
        lines 3 through n would contain the data for the remaining substructures.
        line n+1 would contain the control data for primary structure [1].
        and the pattern repeats.

    Since you appear to be dealing exclusively with numeric data, values in each line can be separated by space characters.  If you have text data also, then a tab character might be a better choice.

    Are you planning to do the work in C or C++?

    • Marked as answer by MyCatAlex Saturday, May 23, 2020 12:46 AM
    Friday, May 22, 2020 10:36 PM
  • I am facing a task of efficiently storing dynamically a few hundred structures.

    When serializing structures be alert to the effect of packing.

    The size of a struct may be greater than the sum of the size of its 
    parts (members).

    Tighter packing may reduce memory and disk space, but may cause a
    performance loss (execution speed).

    Packing specification must be identical in each program that accesses the
    structs.

    C/C++:

    pack pragma
    https://docs.microsoft.com/en-us/cpp/preprocessor/pack?view=vs-2019

    "Specifies the packing alignment for structure, union, and class members."

    "If you change the alignment of a structure, it may not use as much 
    space in memory, but you may see a decrease in performance or even get 
    a hardware-generated exception for unaligned access."

    Structure Member Alignment, Padding and Data Packing
    https://www.geeksforgeeks.org/structure-member-alignment-padding-and-data-packing/

    Structure alignment in Visual C++
    https://stackoverflow.com/questions/10257995/structure-alignment-in-visual-c

    Force C++ structure to pack tightly
    https://stackoverflow.com/questions/21092415/force-c-structure-to-pack-tightly

    Also see:

    align (C++)
    https://docs.microsoft.com/en-us/cpp/cpp/align-cpp?view=vs-2019

    In C# you have:

    StructLayoutAttribute.Pack Field
    https://docs.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.structlayoutattribute.pack?view=netcore-3.1

    "Controls the alignment of data fields of a class or structure in memory."

    - Wayne

    • Marked as answer by MyCatAlex Saturday, May 23, 2020 12:47 PM
    Saturday, May 23, 2020 1:04 AM
  • Thank you Wayne. It is very helpful. However I have a concern with push and pop. Is it the same as locating a structure/variable with pointer arithmetic? If it is the same in terms of time and performance, I have no concern.

    Thank you, - MyCatAlex

    How do push and pop relate to saving structures to disk and reading them back?  These are operations performed on certain containers (such as queue and list) while they are in memory.
    • Marked as answer by MyCatAlex Saturday, May 23, 2020 3:58 PM
    Saturday, May 23, 2020 3:21 PM

  •  I have a concern with push and pop. Is it the same as locating a structure/variable with pointer arithmetic? If it is the same in terms of time and performance, I have no concern.

    I'm not quite sure I follow your concern. Note that #pack (with or without
    push/pop) is a preprocessor directive. The preprocessor executes at build
    time before the compiling step occurs. It generates the necessary code to
    specify the packing to be done. By using push/pop you are telling the
    preprcoessor to save the current packing setting (push), next change it
    for what follows to the specified packing value, then revert to the prior
    packing setting (pop),

    All of this occurs at build time.

    Performance degradation could occur if you specify one byte packing - so
    that there is no padding - as the default packing may be optimally chosen
    to allow the compiler to use alignment that favours certain hardware
    instructions.

    - Wayne

    • Marked as answer by MyCatAlex Saturday, May 23, 2020 3:29 PM
    Saturday, May 23, 2020 3:27 PM
  • How do push and pop relate to saving structures to disk and reading them back?  These are operations performed on certain containers (such as queue and list) while they are in memory.

    Well, I will probably understand it better after the implementation, but I think pop and push in the context of queue, stack and other containers are very familiar to me from years doing C# applications. Let say I have a stack with 100 objects, perhaps structures, and I need to reach a structure in the middle. I have to apply pop() so many times to get it, in the meantime saving temporarily the other structures that have came out before.

    Although it is unclear from my OP, in fact I will be facing a very dynamic situation whereas I will need to touch many structures stored before, and make minor but essential changes in substructures, also read some information from them. I would prefer a situation whereas I could read the stored structures by adding an integer to the base pointer. instead of using containers like queue or stack. Am I wrong?

    Thank you, - MyCatAlex


    Well, then a stack (queue, deque) isn't a good choice of container.  What about std::map?
    • Marked as answer by MyCatAlex Saturday, May 23, 2020 4:58 PM
    Saturday, May 23, 2020 4:08 PM

All replies

  • What does "read back via serilization (sic)" mean?  Same question if you meant serialization.

    If there are only a few hundred, efficiency should not be your primary concern.

    What does it mean for a substructure to be "represented by a pair of integers"?

    If you want each structure to occupy the same number of bytes, does that mean each contains the same number of substructures (even if some are unused)?

    A caution: writing structures to disk in binary may cause them to be unusable if you later change compiler versions or options.

    • Marked as answer by MyCatAlex Friday, May 22, 2020 8:00 PM
    Friday, May 22, 2020 6:25 PM
  • What does "read back via serilization (sic)" mean?  Same question if you meant serialization.

    If there are only a few hundred, efficiency should not be your primary concern.

    What does it mean for a substructure to be "represented by a pair of integers"?

    If you want each structure to occupy the same number of bytes, does that mean each contains the same number of substructures (even if some are unused)?

    A caution: writing structures to disk in binary may cause them to be unusable if you later change compiler versions or options.

    Yes, I meant serialization. Efficiency is already my primary concern. Some operations take close to 45 seconds, it is just writing. Yes, there will be a fixed number of substructures and some of them, at the end of the count may be left unused. There should be 88 or so substructure that will contain (x,y) coordinates and amplitude of a signal corresponding to this pixel (x,y). and 864 structures. It is a dynamic situation and signal amplitude values will be constantly changing. Coordinates will remain stationary. Your caution is very helpful. I was not aware of it, but Windows 10 reportedly will never be changed. Where should I start?

    Thank you, MyCatAlex


    • Edited by MyCatAlex Friday, May 22, 2020 8:07 PM
    Friday, May 22, 2020 7:59 PM
  • So you have an array of 864 primary structures.  Each contains a few control items (such as max number of substructures and active number of substructures) and an array of 88 substructures.  Each substructure contains a pair of coordinates and a signal amplitude.

    Just to make the arithmetic easy, assume 100 substructures of 50 bytes each.  That is a total of 5000 bytes of substructures for each primary structure.  Add another 100 bytes for other structure data and each primary structure is 5100 bytes.  A thousand primary structures would occupy 5.1 MB.  They all should easily fit in memory on a computer running Windows 10.  You can process them at will without any need for addition disk access once the data has been loaded.

    My caution did not mention Windows 10 at all.  But your assumption of stability is unfounded.  It is changing constantly and not every change is an improvement or even transparent.  The real issue is the compiler and its options.

    The simplest safe way to store the data on disk is as a series of text values.  For example:
        line 1 would contain the control data for primary structure [0].
        lines 2 would contain the data for substructure [0] of this primary.
        lines 3 through n would contain the data for the remaining substructures.
        line n+1 would contain the control data for primary structure [1].
        and the pattern repeats.

    Since you appear to be dealing exclusively with numeric data, values in each line can be separated by space characters.  If you have text data also, then a tab character might be a better choice.

    Are you planning to do the work in C or C++?

    • Marked as answer by MyCatAlex Saturday, May 23, 2020 12:46 AM
    Friday, May 22, 2020 10:36 PM
  • It is late now but tomorrow morning I try to give you a forward looking plan as to what this project should do. There is much more to it, but I have to explain. I will do it for you, it will also be useful for me for additional clarification.

    Thanks, - MyCatAlex

    Saturday, May 23, 2020 12:49 AM
  • I am facing a task of efficiently storing dynamically a few hundred structures.

    When serializing structures be alert to the effect of packing.

    The size of a struct may be greater than the sum of the size of its 
    parts (members).

    Tighter packing may reduce memory and disk space, but may cause a
    performance loss (execution speed).

    Packing specification must be identical in each program that accesses the
    structs.

    C/C++:

    pack pragma
    https://docs.microsoft.com/en-us/cpp/preprocessor/pack?view=vs-2019

    "Specifies the packing alignment for structure, union, and class members."

    "If you change the alignment of a structure, it may not use as much 
    space in memory, but you may see a decrease in performance or even get 
    a hardware-generated exception for unaligned access."

    Structure Member Alignment, Padding and Data Packing
    https://www.geeksforgeeks.org/structure-member-alignment-padding-and-data-packing/

    Structure alignment in Visual C++
    https://stackoverflow.com/questions/10257995/structure-alignment-in-visual-c

    Force C++ structure to pack tightly
    https://stackoverflow.com/questions/21092415/force-c-structure-to-pack-tightly

    Also see:

    align (C++)
    https://docs.microsoft.com/en-us/cpp/cpp/align-cpp?view=vs-2019

    In C# you have:

    StructLayoutAttribute.Pack Field
    https://docs.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.structlayoutattribute.pack?view=netcore-3.1

    "Controls the alignment of data fields of a class or structure in memory."

    - Wayne

    • Marked as answer by MyCatAlex Saturday, May 23, 2020 12:47 PM
    Saturday, May 23, 2020 1:04 AM
  • pack pragma
    https://docs.microsoft.com/en-us/cpp/preprocessor/pack?view=vs-2019

    - Wayne

    Thank you Wayne. It is very helpful. However I have a concern with push and pop. Is it the same as locating a structure/variable with pointer arithmetic? If it is the same in terms of time and performance, I have no concern.

    Thank you, - MyCatAlex

    Saturday, May 23, 2020 1:02 PM
  • Thank you Wayne. It is very helpful. However I have a concern with push and pop. Is it the same as locating a structure/variable with pointer arithmetic? If it is the same in terms of time and performance, I have no concern.

    Thank you, - MyCatAlex

    How do push and pop relate to saving structures to disk and reading them back?  These are operations performed on certain containers (such as queue and list) while they are in memory.
    • Marked as answer by MyCatAlex Saturday, May 23, 2020 3:58 PM
    Saturday, May 23, 2020 3:21 PM

  •  I have a concern with push and pop. Is it the same as locating a structure/variable with pointer arithmetic? If it is the same in terms of time and performance, I have no concern.

    I'm not quite sure I follow your concern. Note that #pack (with or without
    push/pop) is a preprocessor directive. The preprocessor executes at build
    time before the compiling step occurs. It generates the necessary code to
    specify the packing to be done. By using push/pop you are telling the
    preprcoessor to save the current packing setting (push), next change it
    for what follows to the specified packing value, then revert to the prior
    packing setting (pop),

    All of this occurs at build time.

    Performance degradation could occur if you specify one byte packing - so
    that there is no padding - as the default packing may be optimally chosen
    to allow the compiler to use alignment that favours certain hardware
    instructions.

    - Wayne

    • Marked as answer by MyCatAlex Saturday, May 23, 2020 3:29 PM
    Saturday, May 23, 2020 3:27 PM
  • How do push and pop relate to saving structures to disk and reading them back?  These are operations performed on certain containers (such as queue and list) while they are in memory.

    Well, I will probably understand it better after the implementation, but I think pop and push in the context of queue, stack and other containers are very familiar to me from years doing C# applications. Let say I have a stack with 100 objects, perhaps structures, and I need to reach a structure in the middle. I have to apply pop() so many times to get it, in the meantime saving temporarily the other structures that have came out before.

    Although it is unclear from my OP, in fact I will be facing a very dynamic situation whereas I will need to touch many structures stored before, and make minor but essential changes in substructures, also read some information from them. I would prefer a situation whereas I could read the stored structures by adding an integer to the base pointer. instead of using containers like queue or stack. Am I wrong?

    Thank you, - MyCatAlex


    • Edited by MyCatAlex Saturday, May 23, 2020 3:52 PM
    Saturday, May 23, 2020 3:50 PM
  • How do push and pop relate to saving structures to disk and reading them back?  These are operations performed on certain containers (such as queue and list) while they are in memory.

    In my case the memory might be an external SSD or even if they are in RAM, it will take some time to pop() a distant saved element, save all previous ones temporarily and push them back after a change in the sought for element has been made.

    - MyCatAAlex

    Saturday, May 23, 2020 4:02 PM
  • How do push and pop relate to saving structures to disk and reading them back?  These are operations performed on certain containers (such as queue and list) while they are in memory.

    Well, I will probably understand it better after the implementation, but I think pop and push in the context of queue, stack and other containers are very familiar to me from years doing C# applications. Let say I have a stack with 100 objects, perhaps structures, and I need to reach a structure in the middle. I have to apply pop() so many times to get it, in the meantime saving temporarily the other structures that have came out before.

    Although it is unclear from my OP, in fact I will be facing a very dynamic situation whereas I will need to touch many structures stored before, and make minor but essential changes in substructures, also read some information from them. I would prefer a situation whereas I could read the stored structures by adding an integer to the base pointer. instead of using containers like queue or stack. Am I wrong?

    Thank you, - MyCatAlex


    Well, then a stack (queue, deque) isn't a good choice of container.  What about std::map?
    • Marked as answer by MyCatAlex Saturday, May 23, 2020 4:58 PM
    Saturday, May 23, 2020 4:08 PM
  • Well, then a stack (queue, deque) isn't a good choice of container.  What about std::map?
    That's what I've been thinking about. I know you are an expert on this. I've already began begun to look for serialization support. Need this

    boost/archive/text_oarchive.hpp

    but could not find this header in my OS. Too many questions must be answered along the way. Anyway I need your guidance.

    Thank you, - MyCatAlex


    Saturday, May 23, 2020 4:49 PM
  • I just checked std::map. It is a dictionary and a tree at the same time. It will be indispensable at the next stage of storing and searching invariants, but at this stage I really need a system of base pointers and an ability to reach every member of the stored list with a pointer arithmetic.

    Thank you, - MyCatAlex

    Saturday, May 23, 2020 4:58 PM
  • Also I don't think I need serialization at all. I want to store structures without serialization, because I don't see any advantage to it, besides there is very little in Microsoft documentation about serialization in C/C++. All I've seen is in C#.

    Thanks, - MyCatAlex

    Saturday, May 23, 2020 5:44 PM