Friends Can See Your Private Parts

An interesting programming issue cropped up this week in Gamebryo. The following code had some thread-safety issues.  This code loads some pixel data from a NIF file. Note that I’ve removed some error handling code for brevity hence the totally unsafe cast at the end. However, the main issue is intact.

NiPixelData* NiNIFImageReader::ReadFile(efd::File& kIst, NiPixelData*)
{
    m_kReadCriticalSection.Lock();

    m_kStream.RemoveAllObjects();
    m_kStream.Load(&kIst);

    NiObject* pkPixelData = m_kStream.GetObjectAt(0);

    m_kReadCriticalSection.Unlock();
    return (NiPixelData*)pkPixelData;
}

It’s possible for another thread to enter ReadFile and remove the reference to pkPixelData before the function returns the pointer to some object in the calling context that will add a reference. Furthermore, NiImageReader subclasses, of which NiNIFImageReader is one, are expected to return an NiPixelData object with a reference count of 0. In this case, however, the reference count is 1. m_kStream holds an internal reference that is cleared by calling RemoveAllObjects. Since the image reader is static, few if any people have noticed this anomaly.

On top of this, things are complicated by this being very old, entrenched code. There are a number of other NiImageReader subclasses both internal to the engine and developed by our customers. Changing the assumptions or the interface is expensive if not out of the question. Barg!

The answer is that we need to remove the reference to pkPixelData from m_kStream without deleting the object. However, the internals of the reference counting base class are private (for good reason) meaning that we have to break encapsulation some how to make this happen. If we try to manually increment and decrement the reference count, then we’ll end up deleting the object. For example:

pkPixelData->IncRefCount(); // It's 2 now.
m_kStream.RemoveAllObjects(); // Back to 1.
pkPixelData->DecRefCount(); // Drops to 0 and calls dtor

What we really need is a method that would zero out the reference count without deleting the object. However, most of the team involved in the discussion really felt that would expose too much of the internal implementation in a public method that would be misused at a later date when ZeroRefCount showed up in autocomplete. After some discussion, I proposed that we simply declare NiNIFImageReader as a friend of the reference counting class and move on. It turns the fix into 4 lines of code, 1 to declare the friend and 3 to manage the count.

In one class:

friend class NiNIFImageReader;

In the offending function:

pkPixelData->IncRefCount();
m_kStream.RemoveAllObjects();
pkPixelData->m_uiRefCount = 0;

The interesting part of this proposal was the vehemence of the reaction that it got. You’d have thought that I was proposing wholesale slaughter of puppies. It was asserted that the fix was “the dirtiest code ever written” and that one should never use the friend keyword. Really? Never? Dirtiest? Perhaps the alternative of refactoring all the image readers is cleaner and preserves encapsulation. However, it would take hours if not days not to mention the support load when we break customer code in the next release. Courtesy of our friend the friend keyword, this issue is fixed in 1 hour or less. I won’t lie and claim that it’s good design, but as a professional programmer with deadlines you have to consider the cost sometimes.

The whole incident also got me to thinking about some of the assumptions that programmers make. If you do some reading, there’s some great nuances to the use of friend. The C++ FAQ Lite has some interesting points, but frankly just search for “C++ friend keyword” to get plenty of folks to opine on whether or not it’s evil. I’m of the opinion that 90% of the time there’s a better design decision. However, there’s always a time when it makes sense. Unfortunately, programmer education simply beat the good use of friend out of folks. It’s similar to many programmers believing that you should always have a virtual destructor whenever a class has a virtual method. You shouldn’t necessarily. Proof left to the reader.

In any case, the whole incident has been valuable. If nothing else, it prompted me to go read some interesting things about C++. Plus, I got to use the witty title of this post.

Aside

I mentioned this post and its title to another programmer at a gathering the other night. He chuckled and noted that we, as a group, have a habit of creating weird metaphors for programming concepts. His examples all involved terminating child and zombie processes. It made me think of Tron with zombies. 🙂

A foolish consistency is the hobgoblin of little minds. dba

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: