none
bitmap alike in-memory data - support querying, performance issue RRS feed

  • Question

  • i have a project about pallet and packages placement,

    practically to deal with 2 dimensional data, almost like bitmap data.

    it has logics need to query the pallet/surface which area can fit what size of packaging

    i have the general codes worked out, but the performance is very slow.

    as at certain stage i'm querying the surface bit by bit (alike pixel by pixel)

    when working on a pallet with dimension 1100 x 1100 = 1,210,000 (million in-memory data)

    the performance is so slow

    below are my data structures, and a very simple logic crunching view:

        public class Position  {
            public int Y { get; set; }
            public int X { get; set; }
        }
    
        class DataCruncher  {
          List<Position> Dots { get; set; }
    
          void Crunch(...) {
            for(...) {
              Dots.Add(new dot); // adding record 1 by 1 for certain range/area
            }
            Dots.Where(...); // LINQ
            ...
          }
        }
    

    anyone has any good alternative/approach to process about 1 million in-memory data?

    i'm not using bitmap data as i need querying logic over the data, which it doesn't support, such as checking an area is used/dotted.

    Saturday, April 7, 2018 8:54 AM

Answers

  • I think that 2D arrays will give a better performance. For example, you can use the BitArray class:

       int width = 1100;

       int height = 1100;

       BitArray a = new BitArray( width * height );

     

    It uses one bit for each location.

    In order to set some dot:

       int x = 300;

       int y = 200;

       a[x + y * width] = true;

     

    In order to check a location:

       bool is_used = a[x + y * width];

     

    If you prefer your approach, then extend the class:

    public class Position

    {

           public int X { get; set; }

           public int Y { get; set; }

     

           public override int GetHashCode()

           {

                 return unchecked(X ^ Y);

           }

    }

     

    or use the existing Point structure.

    Then use HashSet<Position> instead of list. Checking with Contains function should work faster.



    • Edited by Viorel_MVP Saturday, April 7, 2018 3:39 PM
    • Marked as answer by Kelmen Thursday, April 12, 2018 11:59 AM
    Saturday, April 7, 2018 3:37 PM
  • i have a project about pallet and packages placement,

    [...]

    i have the general codes worked out, but the performance is very slow.

    as at certain stage i'm querying the surface bit by bit (alike pixel by pixel)


    Hi,

    it seems that you want to search a set of all combinations of data for a "best" combination of data.

    Doing this with a List is possible, but searching lists - especially with lots of entries - is slow. As Viorel says, a bit array as a basic representation of locations can speed up things a lot.

    Often, such searches for best (minimizing the cost for a result) of many can be done with ordered/sortet data-structures like a SortedList in .Net.

    ###########################################

    <basic ideas>

    Some basic concepts can be seen from the following (basic optimization)

    Maybe look at the basic idea of the "Fast Marching Algorithm" and its followers like Dijkstra's algorithm https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm

    (used by Maze solvers, Livewire/"magnetic Lasso" and Active Contours/best level sets for instance).

    </basic ideas>

    ###########################################

    <packing> maybe more related to your special optimization problem

    Packing problems are somewhat similar, but often use other data-representations and different search strategies:

    https://en.wikipedia.org/wiki/Packing_problems

    and the most common:

    https://en.wikipedia.org/wiki/Knapsack_problem

    </packing>

    ###########################################

    Regards,

      Thorsten


    • Edited by Thorsten Gudera Saturday, April 7, 2018 11:22 PM
    • Marked as answer by Kelmen Thursday, April 12, 2018 11:59 AM
    Saturday, April 7, 2018 11:03 PM

All replies

  • Hello,

     The file format for Bitmap has many different headers. Example, a Black and White

    bitmap files do not contain a Palette. However, the actual Binary data state, 1 or 0,

    means 1=White and 0=Black.  Grey scale bitmaps are an Index Palette.  Where the

    bitmap data is an index reference to the Color data.  Custom or optimized Palettes

    can encode the Color data differently. Like 16bpp means 16 bits per pixel Color that

    may or may not use 1 bit for Alpha but does use 5 bits each for Red, Green, and Blue

    color channels.

     

     Your analogy of project and bitmap files seems odd... Not sure what your project

    does or what this "1 million" data is or the format.  If you could provide more details,

    sample code and data then perhaps we can offer better suggestions.

     

     Thanks :)


    Saturday, April 7, 2018 9:20 AM
  • my project doesn't deal with bitmap

    i just mentioned it as the data involved are kind of alike

    the project is about - given some different sizes of packages (rectangle), how i can put them all into N count of fixed size big container (pallet)


    • Edited by Kelmen Saturday, April 7, 2018 12:05 PM
    Saturday, April 7, 2018 12:04 PM
  • I think that 2D arrays will give a better performance. For example, you can use the BitArray class:

       int width = 1100;

       int height = 1100;

       BitArray a = new BitArray( width * height );

     

    It uses one bit for each location.

    In order to set some dot:

       int x = 300;

       int y = 200;

       a[x + y * width] = true;

     

    In order to check a location:

       bool is_used = a[x + y * width];

     

    If you prefer your approach, then extend the class:

    public class Position

    {

           public int X { get; set; }

           public int Y { get; set; }

     

           public override int GetHashCode()

           {

                 return unchecked(X ^ Y);

           }

    }

     

    or use the existing Point structure.

    Then use HashSet<Position> instead of list. Checking with Contains function should work faster.



    • Edited by Viorel_MVP Saturday, April 7, 2018 3:39 PM
    • Marked as answer by Kelmen Thursday, April 12, 2018 11:59 AM
    Saturday, April 7, 2018 3:37 PM
  • i have a project about pallet and packages placement,

    [...]

    i have the general codes worked out, but the performance is very slow.

    as at certain stage i'm querying the surface bit by bit (alike pixel by pixel)


    Hi,

    it seems that you want to search a set of all combinations of data for a "best" combination of data.

    Doing this with a List is possible, but searching lists - especially with lots of entries - is slow. As Viorel says, a bit array as a basic representation of locations can speed up things a lot.

    Often, such searches for best (minimizing the cost for a result) of many can be done with ordered/sortet data-structures like a SortedList in .Net.

    ###########################################

    <basic ideas>

    Some basic concepts can be seen from the following (basic optimization)

    Maybe look at the basic idea of the "Fast Marching Algorithm" and its followers like Dijkstra's algorithm https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm

    (used by Maze solvers, Livewire/"magnetic Lasso" and Active Contours/best level sets for instance).

    </basic ideas>

    ###########################################

    <packing> maybe more related to your special optimization problem

    Packing problems are somewhat similar, but often use other data-representations and different search strategies:

    https://en.wikipedia.org/wiki/Packing_problems

    and the most common:

    https://en.wikipedia.org/wiki/Knapsack_problem

    </packing>

    ###########################################

    Regards,

      Thorsten


    • Edited by Thorsten Gudera Saturday, April 7, 2018 11:22 PM
    • Marked as answer by Kelmen Thursday, April 12, 2018 11:59 AM
    Saturday, April 7, 2018 11:03 PM
  • i have attempted to convert my custom position (x,y) class to use built-in point structure

    it took some works as class and structure work slightly different when changing the instance

    but performance wise i don't see any improvement

    **

    changing to use array is going to be a major changes, especially over logic

    my existing design approach is use a 2d like data structure represent the surface, each record/data represent 1 dot/place which is not used/placed

    so i can use LINQ query to identify used or avail dot/place for placement, the code is easy to write and readable

    once an area (rectangle area) is used, the area records will simply be removed (LINQ Remove), filtering in (WHERE) the area by x and y scope is easy.

    changing to array, the logic will required additional step to check each dot whether its used or not, while the former approach only to find unused dots (existing records)

    will update if has any useful update/changes/progress

    thanks for the feedbacks

    Tuesday, April 10, 2018 11:13 AM
  • Kelmen,

    We see only partially your code. But maybe you have arrays included in it. 

    Arrays are immutable, which means that every change of an array which is not a reference change will direct result in first creating a new array and then releasing the old one.  


    Success
    Cor

    Tuesday, April 10, 2018 11:30 AM
  • I didn't involved any array. my data structure all are classes (or struct, just the Point, for testing)

    I'm now looking into using graphical Rectangle as alternative data structure, googling results hinting Intersect, IsVisible methods from Region, which can be useful to me, though I believe these are not WPF stuffs.

    I will seek more guidance in WPF forum (as the app is wpf)

    Tuesday, April 10, 2018 11:36 AM
  • did a quick code/design review, and few simple test

    i scoped the performance issue down to when checking pixel by pixel for available area to fit parcel

    the performance hit is due to the logic, not much about data structure (though the current code already using Point (replaced my custom cordinate class) and Hashset<> (replace List<Point>)

    i have to work out a logic to skip checking some pixels/small-area, by making some logic assumption.

    ***

    i have 2 stages (and different logic approach) to check surface to fit parcel.

    1st stage is simple and straightforward, as it starts up with all pixels are available to fit parcels, so i don't need to chk each pixel 1 by 1

    once i put in a parcel, the placement will give me 2 hints of pixel (next pixel to check/work on) to work on.

    as i work from top-to-bottom, left-to-right, each parcel placed will hint its upper-right and bottom-left (avail) pixels for the logic to work on for next placement.

    but at 2nd stage, for a complex design and requirement, the surface is no longer comes with clean area, hence i have to chk pixel by pixel

    my initial process is to logically break the big pallet/container into 4 sections, each section are clean surface and used the mentioned logic to determine parcels placement, while the section is at different orientation (0°, 90°, 180°, 270°).

    then 2nd stage is to combine these 4 sections back into single surface, the middle section most time will still have spaces to fit in parcels, and now logic need to check pixel by pixcel over the combined sections.

    this part is what i need to tackle on.

    as by topic, the performance issue is not due to data structure, but by logic. so i closing it and mark helpful feedback.



    • Edited by Kelmen Thursday, April 12, 2018 12:05 PM
    Thursday, April 12, 2018 11:59 AM
  • just to update a finding regards performance.

    as when debuging code using HashSet<Point>, I change the HashSet to List

    the performance is so terrible

    at a function involve using Contains() which I intend to check:

    List<Point> data

    if (!data.Contains(z)....

    when working on 300k records of List<Point>, it so slow that till the IDE assume the process is deadlock and auto-break

    when i revert/change back to HastSet<Point>

    that function only took less than 1 secord (about 200-500 milliseconds)

    the performance difference is so much

    it is as like HashSet practically List<Point> with indexing

    ***

    that function do things more than just using Contains(), so I can't precisely tell which statements within causing the performance hit.

    it also doing creating the Point records dynamically


    • Edited by Kelmen Wednesday, April 18, 2018 9:36 AM
    Wednesday, April 18, 2018 9:27 AM