OpenSceneGraph: methods for improving rendering efficiency

Improving your application

There are a lot of tricks to improve the rendering performance of applications with a large amount of data. But the essence of them is easy to understand: the smaller the number of resources (geometries, display lists, texture objects, and so on) allocated, the faster and smoother the user application is.

You might benefit from the previous article on Implementing Multithreaded Operations and Rendering in OpenSceneGraph.

There are lots of ideas on how to find the bottleneck of an inefficient application. For example, you can replace certain objects by simple boxes, or replace textures in your application by 1x1 images to see if the performance can increase, thanks to the reduction of geometries and texture objects. The statistics class (osgViewer::StatsHandler, or press the S key in the osgviewer) can also provide helpful information.

To achieve a less-enough scene resource, we can refer to the following table and try to optimize our applications if they are not running in good shape:

ProblemInfluencePossible solutionToo many geometriesLow frame rate and huge resource cost

Use LOD and culling techniques to reduce the vertices of the drawables.

Use primitive sets and the index mechanism rather than duplicate vertices.

Merge geometries into one, if possible. This is because one geometry object allocates one display list, and too many display lists occupy too much of the video memory.

Share geometries, vertices, and nodes as often as possible.

Too many dynamic objects (configured with the setDataVariance() method)Low frame rate because the DRAW phase must wait until all dynamic objects finish updating

Don't use the DYNAMIC flag on nodes and drawables that do not need to be modified on the fly.

Don't set the root node to be dynamic unless you are sure that you require this, because data variance can be inherited in the scene graph.

Too many texture objectsLow frame rate and huge resource cost

Share rendering states and textures as much as you can. Lower the resolution and compress them using the DXTC format if possible.

Use osg::TextureRectangle to handle non-power-of-two sized textures, and osg::Texture2D for regular 2D textures.

Use LOD to simplify and manage nodes with large-sized textures.

The scene graph structure is "loose", that is, nodes are not grouped together effectively.Very high cull and draw time, and many redundant state changes

If there are too many parent nodes, each with only one child, which means the scene has as many group nodes as leaf nodes, and even as many drawables as leaf nodes, the performance will be totally ruined.

You should rethink your scene graph and group nodes that have close features and behaviors more effectively.

Loading and unloading resources too frequentlyLower and lower running speed and wasteful memory fragmentationUse the buffer pool to allocate and release resources. OSG has already done this to textures and buffer objects, by default.

An additional helper is the osgUtil::Optimizer class. This can traverse the scene graph before starting the simulation loop and do different kinds of optimizations in order to improve efficiency, including removing redundant nodes, sharing duplicated states, checking and merging geometries, optimizing texture settings, and so on. You may start the optimizing operation with the following code segment:

osgUtil::Optimizer optimizer;
optimizer.optimize( node );

Some parts of the optimizer are optional. You can see the header file include/osgUtil/Optimizer for details.

Time for action – sharing textures with a customized callback

We would like to explain the importance of scene optimization by providing an extreme situation where massive textures are allocated without sharing the same ones. We have a basic solution to collect and reuse loaded images in a file reading callback, and then share all textures that use the same image object and have the same parameters. The idea of sharing textures can be used to construct massive scene graphs, such as digital cities; otherwise, the video card memory will soon be eaten up and thus cause the whole application to slow down and crash.

Include the necessary headers:
#include <osg/Texture2D>
#include <osg/Geometry>
#include <osg/Geode>
#include <osg/Group>
#include <osgDB/ReadFile>
#include <osgViewer/Viewer>

The function for quickly producing massive data can be used in this example, once more. This time we will apply a texture attribute to each quad. That means that we are going to have a huge number of geometries, and the same amount of texture objects, which will be a heavy burden for rendering the scene smoothly:
#define RAND(min, max)
((min) + (float)rand()/(RAND_MAX+1) * ((max)-(min)))
osg::Geode* createMassiveQuads( unsigned int number,
const std::string& imageFile )

{
osg::ref_ptr<osg::Geode> geode = new osg::Geode;
for ( unsigned int i=0; i<number; ++i )
{
osg::Vec3 randomCenter;
randomCenter.x() = RAND(-100.0f, 100.0f);
randomCenter.y() = RAND(1.0f, 100.0f);
randomCenter.z() = RAND(-100.0f, 100.0f);

osg::ref_ptr<osg::Drawable> quad =
osg::createTexturedQuadGeometry(
randomCenter,
osg::Vec3(1.0f, 0.0f, 0.0f),
osg::Vec3(0.0f, 0.0f, 1.0f)
);
osg::ref_ptr<osg::Texture2D> texture = new osg::Texture2D;
texture->setImage( osgDB::readImageFile(imageFile) );
quad->getOrCreateStateSet()->setTextureAttributeAndModes(
0, texture.get() );
geode->addDrawable( quad.get() );
}
return geode.release();
}

The createMassiveQuads() function is, of course, awkward and ineffective here. However, it demonstrates a common situation: assuming that an application needs to often load image files and create texture objects on the fly, it is necessary to check if an image has been loaded already and then share the corresponding textures automatically. The memory occupancy will be obviously reduced if there are plenty of textures that are reusable. To achieve this, we should first record all loaded image filenames, and then create a map that saves the corresponding osg::Image objects.

Whenever a new readImageFile() request arrives, the osgDB::Registry instance will try using a preset osgDB::ReadFileCallback to perform the actual loading work. If the callback doesn't exist, it will call the readImageImplementation() to choose an appropriate plug-in that will load the image and return the resultant object. Therefore, we can take over the reading image process by inheriting the osgDB::ReadFileCallback class and implementing a new functionality that compares the filename and re-uses the existing image objects, with the customized getImageByName() function:
class ReadAndShareImageCallback : public osgDB::ReadFileCallback
{
public:
virtual osgDB::ReaderWriter::ReadResult readImage(
const std::string& filename, const osgDB::Options* options
);

protected:
osg::Image* getImageByName( const std::string& filename )
{
ImageMap::iterator itr = _imageMap.find(filename);
if ( itr!=_imageMap.end() ) return itr->second.get();
return NULL;
}

typedef std::map<std::string, osg::ref_ptr<osg::Image> >
ImageMap;
ImageMap _imageMap;
};

The readImage() method should be overridden to replace the current reading implementation. It will return the previously-imported instance if the filename matches an element in the _imageMap, and will add any newly-loaded image object and its name to _imageMap, in order to ensure that the same file won't be imported again:
osgDB::ReaderWriter::ReadResult ReadAndShareImageCallback::read
Image(
const std::string& filename, const osgDB::Options* options )
{
osg::Image* image = getImageByName( filename );
if ( !image )
{
osgDB::ReaderWriter::ReadResult rr;
rr = osgDB::Registry::instance()->readImageImplementation(
filename, options);
if ( rr.success() ) _imageMap[filename] = rr.getImage();
return rr;
}
return image;
}

Now we get into the main entry. The file-reading callback is set by the setReadFileCallback() method of the osgDB::Registry class, which is designed as a singleton. Meanwhile, we have to enable another important run-time optimizer, named osgDB::SharedStateManager, that can be defined by setSharedStateManager() or getOrCreateSharedStateManager(). The latter will assign a default instance to the registry:
osgDB::Registry::instance()->setReadFileCallback(
new ReadAndShareImageCallback );
osgDB::Registry::instance()->getOrCreateSharedStateManager();

Unlock access to the largest independent learning library in Tech for FREE!

Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.

Renews at $19.99/month. Cancel anytime

Create the massive scene graph. It consists of two groups of quads, each of which uses a unified image file to decorate the quad geometry. In total, 1,000 quads will be created, along with 1,000 newly-allocated textures. Certainly, there are too many redundant texture objects (because they are generated from only two image files) in this case:
osg::ref_ptr<osg::Group> root = new osg::Group;
root->addChild( createMassiveQuads(500, "Images/lz.rgb") );
root->addChild( createMassiveQuads(500, "Images/osg64.png") );

The osgDB::SharedStateManager is used for maximizing the reuse of textures and state sets. It is actually a node visitor, traversing all child nodes' state sets and comparing them when the share() method is invoked. State sets and textures with the same attributes and data will be combined into one:
osgDB::SharedStateManager* ssm =
osgDB::Registry::instance()->getSharedStateManager();
if ( ssm ) ssm->share( root.get() );

Finalize the viewer:
osgViewer::Viewer viewer;
viewer.setSceneData( root.get() );
return viewer.run();

Now the application starts with a large number of textured quads. With the ReadAndShareImageCallback sharing image objects, and the osgDB::SharedStateManager sharing textures, the rendering process can work without a hitch. Try commenting out the lines of setReadFileCallback() and getOrCreateSharedStateManager() and restart the application, and then see what has happened. The Windows Task Manager is helpful in displaying the amount of currently-used memory here:

What just happened?

You may be curious about the implementation of osgDB::SharedStateManager. It collects rendering states and textures that firstly appear in the scene graph, and then replaces duplicated states of successive nodes with the recorded ones. It compares two states' member attributes in order to decide whether the new state should be recorded (because it's not the same as any of the recorded ones) or replaced (because it is a duplication of the previous one).

For texture objects, the osgDB::SharedStateManager will determine if they are exactly the same by checking the data() pointer of the osg::Image object, rather than by comparing every pixel of the image. Thus, the customized ReadAndShareImageCallback class is used here to share image objects with the same filename first, and the osgDB::SharedStateManager shares textures with the same image object and other attributes.

The osgDB::DatabasePager also makes use of osgDB::SharedStateManager to share states of external scene graphs when dynamically loading and unloading paged nodes. This is done automatically if getOrCreateSharedStateManager() is executed.

Have a go hero – sharing public models

Can we also share models with the same name in an application? The answer is absolutely yes. The osgDB::ReadFileCallback could be used again by overriding the virtual method readNode(). Other preparations include a member std::map for recording filename and node pointer pairs, and a user-defined getNodeByName() method as we have just done in the last example.

Paging huge scene data

Are you still struggling with the optimization of huge scene data? Don't always pay attention to the rendering API itself. There is no "super" rendering engine in the world that can work with unlimited datasets. Consider using the scene paging mechanism at this time, which can load and unload objects according to the current viewport and frustum. It is also important to design a better structure for indexing regions of spatial data, like quad-tree, octree, R-tree, and the binary space partitioning (BSP).

Making use of the quad-tree

A classic quad-tree structure decomposes the whole 2D region into four square children (we call them cells here), and recursively subdivides each cell into four regions, until a cell reaches its target capacity and stops splitting (a so-called leaf). Each cell in the tree either has exactly four children, or has no children. It is mostly useful for representing terrains or scenes on 2D planes.

The quad-tree structure is useful for view-frustum culling terrain data. Because the terrain is divided into small pieces that are a part of it, we can easily render pieces of small data in the frustum, and discard those that are invisible. This can effectively unload a large number of chunks of a terrain from memory at a time, and load them back when necessary—which is the basic principle of dynamic data paging. This process can be progressive: when the terrain model is far enough from the viewer, we may only handle its root and first levels. But as it is drawing near, we can traverse down to corresponding levels of the quad-tree, and cull and unload as many cells as possible, to keep the load balance of the scene.