Wednesday, October 8, 2008

Globals in libs: If and how

As a follow-up to this post I want to concentrate on the cases, where global variables are tolerable and how this should be done.

Tolerable as global variables are data, whose initialization must be done at runtime and
takes a significant amount of time. One example is the libquicktime codec registry. It's creation involves scanning the plugin directory, comparing the contents with a registry and loading all modules (with a time consuming dlopen), for which the registry entries are missing or outdated. This is certainly not something, which should be done per instance (i.e. for each opened file). Other libraries have similar things.

Next question is how can they be implemented? A simple goal is, that the library must linkable with a plugin (i.e. dynamic module) instead of an executable. This means, that repeated loading and unloading (from different threads) must work without any problems. A well designed plugin architecture knows as little as possible about the plugins, so having global reference counters for each library a plugin might link in, is not possible.

Global initialization and cleanup functions


Many libraries have functions like libfoo_init() and libfoo_cleanup(), which are to be called before the first and after the last use of other functions from libfoo respectively. This causes problems for a plugin, which has no idea if this library was already be loaded/initialized by another plugin (or by another instance of itself). Also before a plugin is unloaded there is no way to find out, if libfoo_cleanup() can safely be called or if this will crash another plugin. Omitting the libfoo_cleanup() call opens a memory leak if the libfoo_init() function allocated memory. From this we find that the global housekeeping functions are ok if either:

  • Initialization doesn't allocate any resources (i.e the cleanup function is either a noop or missing) and
  • Initialization is (thread safely) protected against multiple calls

or:

  • Initialization and cleanup functions maintain an internal (thread safe) reference counter, so that only the first init and last cleanup call will actually do something


Initialization on demand, cleanup automatically


This is how the libquicktime codec registry is handled. It meets the above goals but doesn't need any global functions. Initialization on demand means, that the codec registry is initialized before it's accessed the first time. Each function, which accesses the registry starts with a call to lqt_registry_init(). The subsequent registry access is enclosed by lqt_registry_lock() and lqt_registry_unlock(). These 3 functions do the whole magic and they look like:


static int registry_init_done = 0;
pthread_mutex_t codecs_mutex = PTHREAD_MUTEX_INITIALIZER;

void lqt_registry_lock()
{
pthread_mutex_lock(&codecs_mutex);
}

void lqt_registry_unlock()
{
pthread_mutex_unlock(&codecs_mutex);
}

void lqt_registry_init()
{
/* Variable declarations omitted */
/* ... */

lqt_registry_lock();
if(registry_init_done)
{
lqt_registry_unlock();
return;
}

registry_init_done = 1;

/* Lots of stuff */
/* ... */

lqt_registry_unlock();
}


We see that protection against multiple calls is garantueed. The protection mutex itself initialized from the very beginning (before the main function is called).

While this initialization should work on all POSIX systems, automatic freeing is a bit more tricky and only possible for gcc (don't know if other compilers have similar features). The best time for freeing global resources is right before the library is unloaded. Most binary formats let you mark functions, which should be called before unmapping the library (in ELF files, this is done by putting these into the .fini section). In the sourcecode, this looks like:

#if defined(__GNUC__)

static void __lqt_cleanup_codecinfo() __attribute__ ((destructor));

static void __lqt_cleanup_codecinfo()
{
lqt_registry_destroy();
}

#endif

Fortunately the dlopen() and dlclose() functions maintain reference counts for each module. So the cleanup function is garantueed to be called by the dlclose() call, which unloads the last instance of the last plugin linked to libquicktime.

I regularly check my programs for memory leaks with valgrind. Usually (i.e. after I fixed my own code) all remaining leaks come from libraries, which miss some of the goals described above.

1 comment:

Miah Charley said...

esPMI’s Project Management Professional (PMP) ® credential is the most important industry-recognized certification for project managers. Recently I went for a PMP prep course by the training provider mentioned above, the instructor was too good and I passed with relative ease. Looking forwards to apply what I learned in PMP class in my company.