none
Decrease in speed when using 2 threads for drawing a bitmap with lockbits RRS feed

  • Question

  • Out of curiosity I wanted to see if I could increase the performance of some gdi+ code I had just written.

    The code took one of my images and created a new bitmap from it and then added a very basic reflection. I am wondering why it runs much slower when I use 2 threads (one to copy the image and the other to draw the reflection)

     

        public unsafe Bitmap AddReflection(Bitmap pBitmapIn, double Percentage, byte StartOpacity, byte EndOpacity)
        {
          int w = pBitmapIn.Width, h = pBitmapIn.Height;
          int e = (int)(h * Percentage);
          int Space = 4;
          Bitmap pBitmapOut = new Bitmap(w, h + e + Space, System.Drawing.Imaging.PixelFormat.Format32bppArgb);
    
          System.Drawing.Imaging.BitmapData data1 = pBitmapOut.LockBits(new Rectangle(0, 0, w, h), System.Drawing.Imaging.ImageLockMode.ReadWrite, pBitmapOut.PixelFormat);
          System.Drawing.Imaging.BitmapData data2 = pBitmapIn.LockBits(new Rectangle(0, 0, w, h), System.Drawing.Imaging.ImageLockMode.ReadWrite, pBitmapIn.PixelFormat);
    
          // Below is faster than using 2 threads
          //_DrawImage(data1, data2);
          //_DrawReflection(data1, data2, e, Space, 255, 0);
    
          ThreadStart n1 = delegate { _DrawImage(data1, data2); };
          Thread t1 = new Thread(n1);
          t1.Start();
    
          ThreadStart n2 = delegate { _DrawReflection(data1, data2, e, Space, 255, 0); };
          Thread t2 = new Thread(n2);
          t2.Start();
    
          t1.Join();
          t2.Join();
    
          pBitmapIn.UnlockBits(data1);
          pBitmapOut.UnlockBits(data2);
          return pBitmapOut;
        }
    
        public unsafe void _DrawImage(System.Drawing.Imaging.BitmapData data1, System.Drawing.Imaging.BitmapData data2)
        {
          byte* p = (byte*)data1.Scan0;
          byte* q = (byte*)data2.Scan0;
          int bpp = 4;
          int w = data1.Width, h = data1.Height;
          int nOffset1 = data1.Stride - (w * bpp);
          int nOffset2 = data2.Stride - (w * (bpp - 1));
    
          for (int i = 0; i < h; i++)
          {
            for (int j = 0; j < w; j++)
            {
              p[3] = (byte)255;
              for (int k = 2; k >= 0; k--) p[k] = q[k];
              p += bpp; q += bpp - 1;
            }
            p += nOffset1; q += nOffset2;
          }
        }
    
        public unsafe void _DrawReflection(System.Drawing.Imaging.BitmapData data1, System.Drawing.Imaging.BitmapData data2, int ReflectionHeight, int Space, byte StartOpacity, byte EndOpacity)
        {
          byte* p = (byte*)data1.Scan0;
          byte* q = (byte*)data2.Scan0;
    
          int bpp = 4;
          int w = data1.Width, h = data1.Height;
          int nOffset1 = data1.Stride - (w * bpp);
          int nOffset2 = data2.Stride - (w * (bpp - 1));
    
          p += (h + ReflectionHeight + Space - 1) * ((w * bpp) + nOffset1);
          q += (h - ReflectionHeight - 1) * ((w * (bpp - 1)) + nOffset2);
    
          byte diffOpacity = (byte)(StartOpacity - EndOpacity);
          for (float i = 0; i < ReflectionHeight; ++i)
          {
            byte t = (byte)(EndOpacity + ((i / ReflectionHeight) * diffOpacity));
            for (int j = 0; j < w; j++)
            {
              p[3] = t;
              for (int k = 2; k >= 0; k--) p[k] = q[k];
              p += bpp; q += bpp - 1;
            }
            p -= 2 * ((w * bpp) + nOffset1); q += nOffset2;
          }
        }

    I would appreciate any help anyone can give

     

    Monday, May 10, 2010 9:35 PM

Answers

  • I am not sure why this is occuring, however, there is a bit of unnecessary overhead since you have a dual process but you are allocating 2 threads to do the work. That gives you 3 total threads, not including garbage collection thread, etc ..., the main application thread, t1 and t2. Thus, you have the following costs:
    1) Creation of 2 threads which includes stack allocation, and thread data structure
    2) Now this I am not sure about, but does the main thread which calls t1.join and t2.join take a time slice when it is waiting for the threads to die? Executing 3 threads on 2 processors will give you a small amount of extra time lost due to context switches.

    A few experiments,
    1) if you tried this what would the performance be (ie only creating 1 thread and letting the main thread do the work)

          ThreadStart n1 = delegate { _DrawImage(data1, data2); };
          Thread t1 = new Thread(n1);
          t1.Start();
         
           _DrawReflection(data1, data2, e, Space, 255, 0);
           t1.Join();

    2) What do the numbers look like as the size of the bitmap increases?

    3) What if you used the threadpool? 


    Anyway, not sure if this helps at all but thought I would jot down a few thoughts.

    • Marked as answer by SamAgain Tuesday, May 18, 2010 6:53 AM
    Thursday, May 13, 2010 5:11 AM
  • You never know what will happen with threading and graphics.  I compared CopyFromScreen screen for 1 Bitmap (1920 X 1080) and 8 Bitmaps (960 X 270).  Here's the code and some results:

    using System;
    using System.Collections.Generic;
    using System.ComponentModel;
    using System.Drawing;
    using System.Windows.Forms;
    using System.Diagnostics;

    namespace WindowsFormsApplication4
    {
      
    public partial class Form1 : Form
      {
        
    public Form1()
        {
          InitializeComponent();
        }
        
    List<bgw> bgws = new List<bgw>();
        
    Stopwatch sw = new Stopwatch();
        
    int wkrsReturned;
        
    private void button1_Click(object sender, EventArgs e)
        {
          
    int w = Screen.PrimaryScreen.Bounds.Width;
          
    int h = Screen.PrimaryScreen.Bounds.Height;
          sw.Reset(); sw.Start();
          
    Bitmap bmp = new Bitmap(w, h);
          
    Graphics g = Graphics.FromImage(bmp);
          g.CopyFromScreen(0, 0, 0, 0, 
    new Size(w, h));
          bmp.Dispose();
          sw.Stop();
          
    Console.WriteLine("1 Bitmap:  " + sw.ElapsedMilliseconds);
          sw.Reset(); sw.Start();
          wkrsReturned = 0;
          bgws.Clear();
          
    for (int y = 0; y < 2; y++)
          {
            
    for (int x = 0; x < 4; x++)
            {
              
    bgw wkr = new bgw();
              wkr.RunWorkerCompleted += wkr_Complete;
              bgws.Add(wkr);
              wkr.RunWorkerAsync(
    new Rectangle(x*w/4, y*h/4, w/4, h/4));
            }
          } 
        }
        
    void wkr_Complete(object sender, RunWorkerCompletedEventArgs e)
        {
          
    int i = bgws.IndexOf((bgw)sender);
          ((
    Bitmap)e.Result).Dispose();
          
    Console.Write(i + " ");
          
    if (++wkrsReturned == bgws.Count)
          {
            sw.Stop();
            
    Console.WriteLine(" 8 Bitmaps:  " + sw.ElapsedMilliseconds);
          }
        }
      }
      
    class bgw : BackgroundWorker
      {
        
    public bgw()
        {
          
    this.DoWork += this.bgw_DoWork;
        }
        
    void bgw_DoWork(object sender, DoWorkEventArgs e)
        {
          
    Rectangle r = (Rectangle)e.Argument;
          
    Bitmap bmp = new Bitmap(r.Width,r.Height);
          
    Graphics g = Graphics.FromImage(bmp);
          g.CopyFromScreen(r.Left, r.Top, 0, 0, r.Size);
          e.Result = bmp;
        }
      }
    }

    1 Bitmap:  14
    0 1 2 5 7 6 4 3  8 Bitmaps:  40
    1 Bitmap:  17
    6 1 0 5 7 3 2 4  8 Bitmaps:  32
    1 Bitmap:  17
    2 1 4 0 3 7 5 6  8 Bitmaps:  31
    1 Bitmap:  17
    0 4 2 1 7 5 6 3  8 Bitmaps:  5
    1 Bitmap:  20
    0 1 4 5 2 7 6 3  8 Bitmaps:  5
    1 Bitmap:  18
    6 1 2 0 3 4 5 7  8 Bitmaps:  5

    The drop after three runs is consistent.  ???

    • Marked as answer by SamAgain Tuesday, May 18, 2010 6:53 AM
    Monday, May 17, 2010 8:08 PM

All replies

  • You are using a 2 processor machine, right? Just covering basics... :-)

    Many things could be happening, an "evil" one is "falsed sharing" (see Joe Duffy´s blog entry for explanation and some great examples). It could be problem to your app since both threads are busy with the same output bitmap and might be working on close data locations.

    Cristian.

     

    Monday, May 10, 2010 11:13 PM
  • Hi

    Thanks for the info. It was an interesting point to consider. The gist of the above code is that one thread will draw the entire image onto the new bitmap and the next thread will copy the bottom of the image as the reflection at the bottom of the new bitmap. So...thinking about false sharing I can imagine that they could be slowing each other down as both threads are trying to read the bottom of the bitmap at the same time....as a test I made the 1st thread only copy the top of the bitmap (down to the point where the reflection was going to be copying).....I did this so that both threads would never be attempting to read the same byte* in memory, but it still takes ~2500ms for 100 loops rather than ~700ms without threading

    I am on Core 2 duo E6550 @ 2.33GHz with 2GB RAM, although I would still expect a slight performance increase even on a single core

    I wonder how I could tweak the code to use multiple threads

     

    Tuesday, May 11, 2010 10:58 AM
  • I am not sure why this is occuring, however, there is a bit of unnecessary overhead since you have a dual process but you are allocating 2 threads to do the work. That gives you 3 total threads, not including garbage collection thread, etc ..., the main application thread, t1 and t2. Thus, you have the following costs:
    1) Creation of 2 threads which includes stack allocation, and thread data structure
    2) Now this I am not sure about, but does the main thread which calls t1.join and t2.join take a time slice when it is waiting for the threads to die? Executing 3 threads on 2 processors will give you a small amount of extra time lost due to context switches.

    A few experiments,
    1) if you tried this what would the performance be (ie only creating 1 thread and letting the main thread do the work)

          ThreadStart n1 = delegate { _DrawImage(data1, data2); };
          Thread t1 = new Thread(n1);
          t1.Start();
         
           _DrawReflection(data1, data2, e, Space, 255, 0);
           t1.Join();

    2) What do the numbers look like as the size of the bitmap increases?

    3) What if you used the threadpool? 


    Anyway, not sure if this helps at all but thought I would jot down a few thoughts.

    • Marked as answer by SamAgain Tuesday, May 18, 2010 6:53 AM
    Thursday, May 13, 2010 5:11 AM
  • Thanks for the ideas Joe. I tested your idea of using only one extra thread and interestingly thats preety much the same speed as having no extra threads. Ill test with different sized images. I think it would be nice to get gdi+ to work with multiple threads, although ideally I guess I'd try and figure out how to use Direct2D with c#
    Monday, May 17, 2010 5:12 PM
  • You never know what will happen with threading and graphics.  I compared CopyFromScreen screen for 1 Bitmap (1920 X 1080) and 8 Bitmaps (960 X 270).  Here's the code and some results:

    using System;
    using System.Collections.Generic;
    using System.ComponentModel;
    using System.Drawing;
    using System.Windows.Forms;
    using System.Diagnostics;

    namespace WindowsFormsApplication4
    {
      
    public partial class Form1 : Form
      {
        
    public Form1()
        {
          InitializeComponent();
        }
        
    List<bgw> bgws = new List<bgw>();
        
    Stopwatch sw = new Stopwatch();
        
    int wkrsReturned;
        
    private void button1_Click(object sender, EventArgs e)
        {
          
    int w = Screen.PrimaryScreen.Bounds.Width;
          
    int h = Screen.PrimaryScreen.Bounds.Height;
          sw.Reset(); sw.Start();
          
    Bitmap bmp = new Bitmap(w, h);
          
    Graphics g = Graphics.FromImage(bmp);
          g.CopyFromScreen(0, 0, 0, 0, 
    new Size(w, h));
          bmp.Dispose();
          sw.Stop();
          
    Console.WriteLine("1 Bitmap:  " + sw.ElapsedMilliseconds);
          sw.Reset(); sw.Start();
          wkrsReturned = 0;
          bgws.Clear();
          
    for (int y = 0; y < 2; y++)
          {
            
    for (int x = 0; x < 4; x++)
            {
              
    bgw wkr = new bgw();
              wkr.RunWorkerCompleted += wkr_Complete;
              bgws.Add(wkr);
              wkr.RunWorkerAsync(
    new Rectangle(x*w/4, y*h/4, w/4, h/4));
            }
          } 
        }
        
    void wkr_Complete(object sender, RunWorkerCompletedEventArgs e)
        {
          
    int i = bgws.IndexOf((bgw)sender);
          ((
    Bitmap)e.Result).Dispose();
          
    Console.Write(i + " ");
          
    if (++wkrsReturned == bgws.Count)
          {
            sw.Stop();
            
    Console.WriteLine(" 8 Bitmaps:  " + sw.ElapsedMilliseconds);
          }
        }
      }
      
    class bgw : BackgroundWorker
      {
        
    public bgw()
        {
          
    this.DoWork += this.bgw_DoWork;
        }
        
    void bgw_DoWork(object sender, DoWorkEventArgs e)
        {
          
    Rectangle r = (Rectangle)e.Argument;
          
    Bitmap bmp = new Bitmap(r.Width,r.Height);
          
    Graphics g = Graphics.FromImage(bmp);
          g.CopyFromScreen(r.Left, r.Top, 0, 0, r.Size);
          e.Result = bmp;
        }
      }
    }

    1 Bitmap:  14
    0 1 2 5 7 6 4 3  8 Bitmaps:  40
    1 Bitmap:  17
    6 1 0 5 7 3 2 4  8 Bitmaps:  32
    1 Bitmap:  17
    2 1 4 0 3 7 5 6  8 Bitmaps:  31
    1 Bitmap:  17
    0 4 2 1 7 5 6 3  8 Bitmaps:  5
    1 Bitmap:  20
    0 1 4 5 2 7 6 3  8 Bitmaps:  5
    1 Bitmap:  18
    6 1 2 0 3 4 5 7  8 Bitmaps:  5

    The drop after three runs is consistent.  ???

    • Marked as answer by SamAgain Tuesday, May 18, 2010 6:53 AM
    Monday, May 17, 2010 8:08 PM