locked
DirectX performance through batching

    Question

  • I'm beginning a 3D DirectX game. It will have many 3D models being presented at once on the screen. My problem is that I'm already seeing big frame rate drops on very small maps, with less than 1000 polys. I'm thinking that I need to batch or instance these meshes, I'm curious at to what's the difference and what I need to do. Mostly, I'm looking for advice.

    [url=http://catalog.create.msdn.com/en-US/GameDetails.aspx?catalogEntryId=5cbacba6-05c2-4c33-9005-6cc80c8d5753&type=1]Bible Trivia Avatar Edition[/url], currently in review.

    Tuesday, November 26, 2013 12:41 AM

Answers

  • Search the web for Direct3D Instancing or Direct3D11 Instancing. The idea has been around since Direct3D 9.

    The main idea conceptually is this.  Where you instinctively would write this:

    for each object
       set texture object.texture
       draw object

    To minimize texture state changes, instead you group by texture and do this:

    for each texture
       set texture
       for each object where object.texture == texture
          draw object

    Except that this alone isn't enough to get the real gains, because you are still doing one draw per object.  And it's the driver calls that get expensive.

    To get real savings you want to then take this one step further and share vertex data for all your objects.  If the objects are subtly different (like just different transformation matrices or positions) then use an intelligent vertex layout and make use of instancing.  Parameterize your vertex data based on something small that can be different per-instance (like just the matrix for the object) and let your vertex shader do the work.

    With instancing, now you can draw multiple objects at once with one driver call instead of just one object at a time.  But since the draw instanced call wants a contiguous range of parameters in the vertex buffer, you have to do a little work to plan out those contiguous ranges.  Or essentially "group" or "sort" your objects so that they can be drawn together with instancing.  Then each group of objects that share a texture will correspond to a range of vertices in your instance buffer.  Essentially you can adjust the texture, then draw all the instances that correspond to that texture by providing the right offset into your "grouped" instance data buffer.

    Now your pseudocode looks subtly different like this:

    for each texture
       set texture
       draw instanced [the subset of objects having object.texture == texture]
          

    There is:

    1. One set texture call per texture (batching) and,
    2. One draw call per texture to draw all objects (with instancing) that share the same texture.

    NOTE:  This is just explaining the technique.  In practice we actually have Texture Arrays so the batches can actually span multiple textures too if you do more intelligent setup.  But it's still generally good to minimize changes like changing shaders and vertex buffers by employing the same kind of "group similar API calls together" strategy.  Any such groupings where you take advantage of shared state are referred to as batches.



    • Edited by Wyck Wednesday, November 27, 2013 2:22 PM
    • Marked as answer by Soft Sell Studios Friday, November 29, 2013 1:49 AM
    Wednesday, November 27, 2013 2:20 PM

All replies

  • The idea with batching is to avoid state changes.

    For example, with Z-Buffering enabled, you can often reorder the polygons to draw in any order, which means that you have the flexibility to reorder things so that all the primitives that require a certain texture are drawn together, and this would allow you to reduce the number of times you have to change the texture.

    Depending on what the state changes are (textures, shaders, or whatever) you can group primitives in different ways.

    The reason that batching it is a technique and not a hard and fast rule is that often, with Z-buffering, it is more efficient to draw front to back, so that you will maximize the number of Z-fails you get, thus reducing time spent in shaders that would otherwise just be painted over by a closer triangle.  But if the order is not relevant, then it's always good to try to reduce state changes by grouping things that share state and avoiding setting the state twice.

    Instancing is a good strategy in general as it reduces much of the setup time, makes good reuse of resources (lots of cache hits) and takes advantage of a feature that has a lot of hardware support behind it.

    Always check to see if you're getting a win by measuring performance, and avoid time spent and complexity introduced by premature optimization.
    • Edited by Wyck Tuesday, November 26, 2013 1:50 PM One tidbit of final advice.
    Tuesday, November 26, 2013 1:48 PM
  • Do you know of any resources, books, tutorials, etc. on how to implement this?

    [url=http://catalog.create.msdn.com/en-US/GameDetails.aspx?catalogEntryId=5cbacba6-05c2-4c33-9005-6cc80c8d5753&type=1]Bible Trivia Avatar Edition[/url], currently in review.

    Tuesday, November 26, 2013 7:34 PM
  • Search the web for Direct3D Instancing or Direct3D11 Instancing. The idea has been around since Direct3D 9.

    The main idea conceptually is this.  Where you instinctively would write this:

    for each object
       set texture object.texture
       draw object

    To minimize texture state changes, instead you group by texture and do this:

    for each texture
       set texture
       for each object where object.texture == texture
          draw object

    Except that this alone isn't enough to get the real gains, because you are still doing one draw per object.  And it's the driver calls that get expensive.

    To get real savings you want to then take this one step further and share vertex data for all your objects.  If the objects are subtly different (like just different transformation matrices or positions) then use an intelligent vertex layout and make use of instancing.  Parameterize your vertex data based on something small that can be different per-instance (like just the matrix for the object) and let your vertex shader do the work.

    With instancing, now you can draw multiple objects at once with one driver call instead of just one object at a time.  But since the draw instanced call wants a contiguous range of parameters in the vertex buffer, you have to do a little work to plan out those contiguous ranges.  Or essentially "group" or "sort" your objects so that they can be drawn together with instancing.  Then each group of objects that share a texture will correspond to a range of vertices in your instance buffer.  Essentially you can adjust the texture, then draw all the instances that correspond to that texture by providing the right offset into your "grouped" instance data buffer.

    Now your pseudocode looks subtly different like this:

    for each texture
       set texture
       draw instanced [the subset of objects having object.texture == texture]
          

    There is:

    1. One set texture call per texture (batching) and,
    2. One draw call per texture to draw all objects (with instancing) that share the same texture.

    NOTE:  This is just explaining the technique.  In practice we actually have Texture Arrays so the batches can actually span multiple textures too if you do more intelligent setup.  But it's still generally good to minimize changes like changing shaders and vertex buffers by employing the same kind of "group similar API calls together" strategy.  Any such groupings where you take advantage of shared state are referred to as batches.



    • Edited by Wyck Wednesday, November 27, 2013 2:22 PM
    • Marked as answer by Soft Sell Studios Friday, November 29, 2013 1:49 AM
    Wednesday, November 27, 2013 2:20 PM
  • Okay, Thank you.

    [url=http://catalog.create.msdn.com/en-US/GameDetails.aspx?catalogEntryId=5cbacba6-05c2-4c33-9005-6cc80c8d5753&type=1]Bible Trivia Avatar Edition[/url], currently in review.

    Friday, November 29, 2013 1:49 AM