How to Write NetWare Loadable Modules (NLMs) as Dynamic Libraries

Articles and Tips: article

Russell Bateman
Senior Software Engineer
Server Library Development
Novell, Inc.
rbateman@novell.com

01 May 2003

This AppNote discusses how to write NetWare Loadable Modules (NLMs) that serve as dynamic libraries.

Introduction
History of NLM Library Writing
How Libraries Are Consumed
The Elements of a Well-Written NLM Library
Tricks in Writing Dual-Mode NLMs
Versioning-A Solution
Walk-Through of an NLM Library
Conclusion

Topics	library development, NLM development, kernel and low-level code, DLLs
Products	NetWare 5.x and 6.x
Audience	developers
Level	advanced
Prerequisite Skills	familiarity with NLM programming
Operating System	NetWare 5 and above
Tools	NDK-NLM and NetWare Libraries for C
Sample Code	yes

Introduction

There are three traditional veins of NetWare Loadable Module (NLM) writing: (1) low-level kernel extensions such as drivers, protocol stacks, namespaces, and so on; (2) applications; and (3) libraries. Of the three, libraries have been the most vague and contradicted topic of discussion over the 14-year history of NLM writing. This is the topic addressed by this AppNote.

What differentiates a library from another sort of code on NetWare? First and foremost, a library is any NLM that exports entry points for use by others. Second, a library is an NLM that has no active threads. Many NLM libraries on NetWare are actually a hybrid of an application or service and a library, because they have active threads in addition to exporting entry points. This "unfortunate" circumstance can arise due to necessity, bad architecture, or naivety on the part of the programmer. I can do nothing about the first reason, but I hope to help you do something about the second and overcome the third.

History of NLM Library Writing

NLMs have been around for about 14 years-long enough to have established a solid base of traditional practices regarding how to write them. Any discussion of NLM technology should be prefixed with a nod to history and tradition. This provides a basis for communicating any deviation from existing practice to make the discussion clearer.

How Libraries Were Written

In the early days of writing application NLMs, there were two approaches that were loosely determined by whether or not one wrote to CLib (and linked CLib's prelude object). Applications that wrote directly to the NetWare operating system could not practically have a main and therefore had to handle all their own start-up, shut-down, and other problems solved by linking CLib's prelude object.

Writing to CLib was easier and more closely resembled the coding process on other platforms, so it quickly became the officially-approved way of writing an application NLM. However, because developers and support personnel alike better understood writing to CLib, it also became the way to write NLM libraries-which, in retrospect, may not have been the best thing. As a result, today there are multitudes of CLib-based libraries that are obsolete in the modern world of mulithreading, multiprocessing, and LibC.

Writing a library outside of the CLib world was a fairly simple task-after all, CLib itself was not a CLib NLM. One might initially reject this as an absurd statement, but note that by contrast, LibC is a LibC NLM, for a number of reasons. Since all a basic library has to do is start itself up and get some initializations, resorting to CLib for help in doing that was a little like cracking a walnut with a sledgehammer when a simple nutcracker would have sufficed. True, if the NLM's command line was to be complicated, CLib simplified it into argc and argv. However, initialization of resources (including allocation of memory) could be easily done without CLib's help, for calling AllocSleepOK is scarcely more difficult than malloc.

It is not my intent here to criticize the use of CLib historically for writing libraries. It follows that these remarks must lead to something useful. So, let's see what a non-CLib library would look like.

A Simple Non-CLib Library. The sample library NLM code shown below, which makes use of memset, assumes that CLib is loaded. I could have chosen to use some of the functions available to driver writers (for drivers that loaded without ensuring that CLib was loaded) such as CMovW and CSetB, but I'm choosing to use standard functions so I don't have to explain the others. (It was a mistake not to support string.h interfaces at the lowest levels of the operating system in the first place.) I'm also choosing to use modern LibC headers, like netware.h, though equivalents existed or could have been easily manufactured back in those days.

#include <string.h>
#include <netware.h>

struct
{
   int      this;
   long      that;
   void      *theotherthing;
} gLibGlobals;

void         *gNLMHandle;               // module handle
rtag_t         gAllocRTag;               // for calling AllocSleepOK later on

int _StartProcedure                        // called by NetWare Loader
(
   void            *handle,
   const char            *commandline,
   ...
)
{
   gNLMHandle = handle
   gAllocRTag = AllocateResourceTag(handle, "Foolib's own memory",
                                                AllocSignature);

   if (!gAllocRTag)      // (very unlikely error)
      return -1;
   memset(gLibGlobals, 0, sizeof(gLibGlobals));
   return 0;
}

By returning 0, this code tells the NetWare Loader that everything was successful and that it should remain loaded. Since no new thread is started, everything else in the NLM-whether it calls into CLib or just into the OS-is "dead" code.

And this is precisely my point. A library NLM is primarily "dead" code offered to other NLMs for their use, just as if that code were their own. Of course, this example is an extremely primitive library, but it helps illustrate the core of what a library is. It will also serve to demonstrate my next point about the excess initialization and code baggage that is introduced when one tries to write a CLib-based library.

A CLib Library. When using CLib to write a library, you end up creating a great deal of superfluous initializations and resource baggage incidental to linking CLib's prelude object just to set up the call to main. Most of what is done- creating screens, setting up standard consoles (such as stdin, stdout, and stderr) and much else-is entirely or mostly wasted effort in the case of a library. In addition, main runs on a new thread that will bring the library down as soon as it runs off main's right brace. Thus attention must be paid to keeping the library NLM up and running.

Here is the previous simple NLM code in CLib terms:

#include <string.h>
#include <nwadv.h>
#include <nwthread.h>

struct
{
   int      this;
   long      that;
   void      *theotherthing;
} gLibGlobals;

rtag_t         gAllocRTag;               // for calling AllocSleepOK later on

int main
(
   int      argc,
   char      *argv[]
)
{
   memset(gLibGlobals, 0, sizeof(gLibGlobals));

   gAllocRTag = AllocateResourceTag(GetNLMHandle(),
                              "foolib's own memory", AllocSignature);
   if (!gAllocRTag)
      return -1;

   ExitThread(TSR_THREAD, 0);
   return 0;
}

While this code accomplishes the same thing as the previous example, there are problems with it. As already discussed, there is a lot of useless baggage set up for the library that it will never use and that may get it (or the developer) confused and/or into trouble. Second, the reason for using CLib in the first place was to gain access to "CLib-isms" that would help me write using standard functions in place of NetWare-isms. But in the end, I still have to use AllocSleepOK instead of malloc if I want the library to own that memory. So linking with CLib's prelude object doesn't really get me very far.

Even though this example is somewhat simplistic, it is nevertheless an accurate microcosm of the world of problems facing library NLMs which will be discussed at length in this AppNote.

The Problem of Resource Ownership. One of the most challenging phenomena in library writing is sorting out the problem of resource ownership. What a library developer must keep foremost in mind is that when the library references, for example, malloc in its code, it isn't the library calling it, but the library's client. The memory resource coming back is debited against the client and doesn't belong to the library itself. To make its own memory allocation (except from main when it is starting up), the library has no other recourse than to call AllocSleepOK (exactly what my non-CLib example has to do).

Probably the single most frequent mistake among library developers is failing to understand this distinction. When I discuss a problem with a library developer, I make a personal point of not saying that the library calls CLib for this or that, but insist on saying that the "calling thread" makes a call to this or that interface in CLib or LibC. That way, the developer gets a clearer idea of what is going on and often immediately understands the problem that led to the consultation in the first place. So, let's make this a sort of maxim:

It is not the library calling the interfaces referenced in its code, but the library's client. And it is that client, not the library itself, that owns any resources allocated as a result of calling into the library.

Modern Library Writing

Modern NLM library writing divides into three broad classes, including the first two already discussed:

System library NLMs that don't link CLib's prelude
Library NLMs that do link CLib's prelude
A new type of library NLM that links LibC's prelude

As discussed in a previous AppNote (see "How to Write Start-up Code for NLMs" at http://support.novell.com/techcenter/articles/dnd20020806.html), LibC's prelude isn't as primitive as CLib's. It supports a broad spectrum of NLM writing including all possible library types. And because LibC's prelude can load earlier even than system volume mount, it works well for almost all NLMs.

Linking LibC's prelude results in support for a new classification of three different approaches to library writing:

Coding a main. LibC can discover at run-time whether an NLM has a main function and, if so, creates a new thread to run it. Like the CLib-based method already described, it is possible to perform initialization from main and then park the main thread on a semaphore so that it doesn't run off of main and drop the NLM. (LibC doesn't have an equivalent to ExitThread with its TSR_THREAD flag.) While this still isn't the right way to write a library NLM, it could be a backward-compatibility solution for a quick port to LibC of an existing CLib-based library.
Coding a _NonAppStart. This is the preferable method (unless DllMain is coded). For primitive (simple) libraries, this is the easiest, most versatile, and most NetWare-like method and is thus perfect for auto-loaded libraries. However, it suffers from an inability to manage the phenomenon of thread attach and detach, and it falls down in the area of process detach. (These problems are discussed in more detail later.)
Coding a DllMain. This most modern method is possible starting with the installation of or upgrade to NetWare 5 Service Pack 6, NetWare 6 Service Pack 3, or NetWare 6.5 (code-named Nakoma). The jury is still out on how well this method works, because few have used it beyond Novell's internal testing. It was created to help ports from Windows and solve certain problems whose solution, on other platforms, is arrived at through a more rigid library architecture-problems of process and thread attach/detach that _NonAppStart cannot solve.

How Libraries Are Consumed

Traditionally, auto-loading and linked-symbol dependence form the method by which application NLMs consume entry points (and, on rare occasion, data) from a library. An application NLM is linked with an auto-load dependency against a library, foolib.nlm, thus:

MODULE foolib.nlm # module to load to resolve symbols
IMPORT @foolib.imp # list of imported interfaces

This assumes the standard practice of a library developer publishing an import file containing a list of symbols exported by the library for use in other NLMs. Typically, the name is identical to the library's filename, but with the extension IMP for "import file" in place of NLM.

Upon loading the application, the NetWare Loader "auto-loads" the library NLM because of the MODULE directive to the linker. This means that, before the application's start-up code is executed (which may depend on symbols supplied by the library), the library NLM is loaded. If the library NLM isn't there or it fails to initialize successfully and cannot stay up, the Loader drops the application NLM with an appropriate error message on the system console-usually it says something about missing symbols.

An auto-load library can also be linked with the auto-unload flag such that, as soon as both the auto-loading application and any subsequent consumer NLMs are gone, it unloads quietly to save precious system resources. This flag is linked into a library NLM thus:

FLAG_ON 0x00000040 # unload me if I'm not needed

(For Watcom's wlink.exe, this directive is NLMFlags.)

This is also important for unloading protected address spaces with libraries in them: they won't come down all the way down unless the libraries and other NLMs are gone. It is the NetWare Loader that senses this optimization and enacts it as a policy without conscious effort on the part of the library other than the link flag.

Lately, LibC has offered new interfaces to aid in the porting of applications and libraries from other platforms. These are summed up by the header dlfcn.h and, more recently, windows.h. The functionality is simple:

Call dlopen with the path to the desired library.
Call dlsym for each symbol to be consumed.
Call dlclose once finished.

While this is not optimal functionality, Open Source porting has imposed the existence of such interfaces on Novell, which ever seeks to make NetWare a better target for applications. The reason is simple and has to do with Pentium processor optimizations. Calling functions through pointers can have a devastating effect on the branch prediction table, causing at least a 40 clock loss in some cases. However, offering this mechanism greatly facilitates porting. (For more details, see "Scheme for Optimizing Calls to Functions through Pointers on Pentium Processors" at http://support.novell.com/techcenter/articles/dnd19980804.html.)

The windows.h solution is similar, using LoadLibrary and FreeLibrary. In fact, these two functions are little more than calls to dlopen and dlclose in LibC. There is no equivalent to dlsym.

Note: Although it has little to do with library writing, it is useful to keep in mind that most interfaces offered by LibC discussed here (such as dlopen) are perfectly consumable by CLib applications.

Using the Watcom Compiler to Build a Library

Here's a quick section on using the Watcom 386 C/C++ 11.0x environment for building a library. There are still a few programmers inside Novell who use this environment. While it is impossible to write C++ code to LibC in this environment, it is possible to write C code as long as the statically linked library (libc3s.lib) isn't included. Note that these remarks apply only to Watcom 11. OpenWatcom does not suffer from these problems.

If you use the older Watcom, prefixing becomes a concern. LibC has solved the backward-compatibility problem by providing libc.wmp and libc.ali, which get around wlink.exe's inability to handle prefixing. wlink.exe's ability to alias symbols makes this possible. If you want to support Watcom-built clients (as LibC does), you must take care of this problem for your library by providing similar files. With the existence of OpenWatcom, there is scarely any need to do so.

If you are building a library yourself using old Watcom instead of CodeWarrior, you will not be able to prefix your symbols. Yet Novell recommends that everyone prefix their symbols, which leads to a problem of orthogonality. However, if your library is proprietary (in the sense that your symbols are consumed internally to your applications and not generally available to third parties), and you auto-load and consume these symbols via the linker rather than the dlfcn.h interfaces, then there is no problem here.

You should avoid exporting symbols from your library that clash with industry-standard names like strcpy, getopt, mmap, iconv_open, and so on, without prefixing them. LibC reserves the right to export these symbols either naked or prefixed with LIBC@). If you want to find out more about producing compatible .WMP and .ALI files for use in place of the normal import file by your Watcom down-streamers, contact Novell Developer Support by way of the LibC newsgroup.

The Elements of a Well-Written NLM Library

Outside its actual utility, the elements of a well-written NLM library are:

Initialization
Interface signatures and prefixing
Insulation against cross-contamination of client instance data
Data instancing
Attach and detach mechanisms and safeguards
Kernel and protected address space support

The last three are actually the same thing-ways of guarding data integrity in the rough and tumble world of multi-client and multithread support. The best safeguard against the challenge of data instancing and problems of cross-contamination is to code a library that makes no use of per-process or per-thread data such as a library full of interfaces like strcpy. However, the vast majority of NLM libraries must rely on per-process (application) data and many also need to distinguish between calls from different threads of a same client.

Initialization

Initialization is simple enough as long as required data and other resources are not extensive. In _NonAppStart or DllMain, calls to zero out data, read in tables, and allocate and initialize blocks of memory can be done in an ordered fashion. Any necessary reversal of this process is done in _NonAppStop or DllMain. When such data is complicated, very extensive, or telescopic in that it isn't allocated, set up, or initialized except as needed, that presents a problem. Unless the library has its own threads (which it normally should avoid), you must take special care to ensure library ownership of the resource thus allocated, set up, or initialized.

A library cannot allow itself to be called by an application, realize that a table is needed that hasn't been allocated and initialized, allocate it, and then set it up if that table isn't to belong to the calling application-unless all resource allocation entailed by the operation is done using functions to which the library can pass its own resource tag. Since malloc doesn't have a resource tag argument, a call to it will result in the memory allocated belonging to the caller instead of the library. Remember, the client application is the caller, not the library itself.

There are no easy solutions to this problem. For memory allocations, you can use library_malloc instead or a call directly to AllocSleepOK. But for synchronization primitives, there are no calls that will ensure that a mutex, for instance, gets attributed to the library instead of the NLM of the calling thread.

The best solution can be found in library design. Make certain you know of all such resources. Allocate them up front (in _NonAppStart or DllMain) while you have context to do so. The term "context" refers to the fact that when a module calls into CLib or LibC, these runtime libraries know just who that module is, so, if at start-up it runs on its own context, as it does from _NonAppStart, anything it does will belong to your library. However, if the code later makes calls (to malloc or to NXMutexAlloc), then any resources consumed will belong to the NLM that owns the thread that makes the call (which is not your library).

Another way around this problem, when design cannot solve it, is to create wrappered call-backs for use by the library itself. Wrappering a call-back using NXWRAP_INTERFACE as part of the start-up sequence for later calling in the library itself (but executing on a client's thread) will ensure proper resource attribution.

An issue that goes hand-in-hand with initialization is clean-up. As all NetWare developers know, NetWare complains bitterly about missing resources, especially allocated memory.

Interface Signatures and Prefixing

When designing your library's exported consumables, whether functions or data, choose appropriate names that help make what your library offers an intact set of interfaces. Often, the names are imposed on you, such as the myriad standard calls in LibC. Whatever you choose to name them, ensure that you prefix them with the name of your library in uppercase. This is the only way to ensure against symbol collision in the flat NetWare symbol namespace and ensure that they can be found by consuming NLM applications calling dlsym.

The export list your library is linked with should look like this (incidentally, so should the import file you distribute to others who consume your symbols):

EXPORT (FOOLIB) foo, bar, foobar, morefoo

This results in symbols FOOLIB@foo, FOOLIB@bar, and so on. When other NLMs call foo (in their code, for example), of course they don't add the prefix to their C code. Linking with your import file will cause the linker to generate a call to FOOLIB@foo and so on, instead of merely foo. This means two libraries could export the same symbol without colliding in the flat NetWare symbol namespace.

If you dynamically export symbols at run-time, you should also prefix them as follows:

ExportPublicObject(gNLMHandle, "FOOLIB@foo");

The reason to do all of this exactly as described here-which may be different from how you have done it in the past-is that dlsym cannot magically find the string symbols it is looking for (that is, devoid of their prefix) unless it can calculate the prefix. This calculation is done by dlopen for symbols got on the handle it passes back.

Insulation Against Data Contamination

In your design, take care that resources obtained for one thread are not consumed by or made available to any other thread-at least any other thread that does not belong to the application owning the thread on which they were set up. Stack-sharing would be the most egregious example of this catastrophic practice. You should already understand why this is bad, even if you are new to NLM library writing.

Data Instancing

Data instancing is probably the most difficult aspect in NLM programming. It is in writing NLM libraries that all data instancing problems are encountered in their worst forms.

Application Data. LibC exports a number of interfaces specially set up to help you handle problems of data instancing at the NLM or process level. The most important of these interfaces are get_app_data and set_app_data. As will be shown in my examples, these functions permit you to do the following:

Determine, in the context of any calling thread, whether that thread's application has ever called your module before (get_app_data)
Get the block of allocated data for it if it has already been set up (ibid)
Set up a block of data, initialize it and establish it (set_app_data) for the application

Once established, the data will be there the next time any thread from that NLM calls your library. The destructor for this data is established when you call register_library, or register_destructor from DllMain.

If what I've described were really that simple, the application level would be simply and effectively handled. The problem that arises on NetWare is that, by reason of registered call-backs, a calling thread doesn't always have enough contextual identification to ensure that get_app_data can accurately identify the caller. It may decide that the caller has never called before. The solution to this is for your clients to wrapper their call-backs, as discussed in NKS. This is a problem not only for third-party libraries, but for LibC as well. Correctly wrappering all call-backs using the interfaces in nks/netware.h solves the remaining problem of application data instancing.

While this is not something library code need concern itself with, it is useful to describe call-back wrappering because it occurs in application code. Call-back wrappering is accomplished using macros in nks/netware.h. Here is an example:

#include <nks/netware.h>

void  *gCBFRef = (void *) NULL;

static int  CallBackFunc( int arg1, double arg2 );

static int CallBackFunc( int arg1, double arg2 )
{
   // do stuff here after getting called back by whatever service
   // you've registered with including a library
   // ...
   return 0;
}

void InitializeFunction( void )
{
   int   err;

   // wrap the call-back...
   err = NX_WRAP_INTERFACE(CallBackFunc, 3, &gCBFRef);

   // ...
}

void CleanUpFunction( void )
{
   // free up the call-back wrapper
   if (gCBFRef)
      NX_UNWRAP_INTERFACE(gCBFRef);
}

CallBackFunc is made static here to emphasize the fact that it isn't its address, but the wrapper's, that will be communicated to the service, library, etc. for the purpose of calling. Do not be confused by this example and discussion. Your library client does not need to do this in order to call your interfaces. It is only when the address of a function inside the application is going to be communicated to, say RegisterForEventNotification or other services, that this needs to be done. It must be done because the actual thread that will execute the call-back will not be one that belongs to the client application. In the course of the call-back, the executing thread makes use of LibC interfaces that need context like open which uses the application's open file descriptor table.

There is no way to find the application data without having put a context wrapper around the call-back. The wrapper contains code to save the application identity- a pointer to the NKS virtual machine (VM) structure-and set it as the thread's owner for the duration of call-back execution. After the call-back is through, the original VM identity is restored by the wrapper. (This is a different and slightly less intrustive way of doing what was done by CLib applications using SetThreadGroupID.)

Thread Data. Thread data takes a bit more work to handle because it cascades from application data. It is not possible to identify the thread-specific data without accurately knowing the application data. In combination with get_app_data, which gets the instance data for the application, your library can also discover whether per-thread data has been allocated, allocate it if it has not, or get a pointer to it if it has. This is done by creating a key with NXKeyCreate at the time that the application instance data is allocated and set (using set_app_data) for the calling application.

Thereafter, upon getting application data, you can use the stored-away key index to call NXKeyGetValue. At the time that the key is originally created, a destructor for the data associated with the key is registered if necessary. In my library sample code, I illustrate this point trivially by inventing a sort of errno in a structure that must be kept per-thread. Of course, what is kept per-thread by a library could be very much more complex, but in this example, only free is necessary as a destructor. There could also be no suballocation of per-thread data at all, only a scalar value kept for error reporting, but that would have been too simple for an adequate example.

Attach and Detach Mechanisms and Safeguards

In Windows, which has the most formal specification of dynamically loaded libraries (or DLLs), DllMain is a function coded by the library developer to handle several messages, including:

DLL_PROCESS_ATTACH
DLL_PROCESS_DETACH
DLL_THREAD_ATTACH
DLL_THREAD_DETACH

DllMain is called by client applications (a client application is one that calls LoadLibrary to connect to the dynamically linked library whose pathname is specified in the argument) at opportune times and asked to perform these essential tasks.

In process attach, the library has the opportunity to allocate a block of memory associated with the calling application. (I've already discussed doing this under "Application Data" above.) On Windows, this association is handled transparently because Windows libraries are not written in the kernel, but in a protected address space (also referred to as user address space). All memory allocated belongs transparently to the calling application because the Windows operating system instances the DLL for each of its clients. NetWare could do this too, if it were possible to force developers to accept to run only in the protected address space and only one NLM application in that address space. Unfortunately, this is not possible for many reasons. Consequently, writing libraries for use in a protected address space does not make the problem of managing data instancing disappear.

So, DllMain is called with DLL_PROCESS_ATTACH when LibC detects that a client application is attempting to connect with the library. LibC is able to do this if:

The consumed library has registered itself as a library. This is done automatically for every library that codes a DllMain.
The client explicitly requests the attachment, in the Windows way, by calling LoadLibrary or it may call dlopen which is identical.

Calling DllMain with DLL_PROCESS_DETACH is what happens when the consuming application calls FreeLibrary, dlclose, or unloads. It is an opportunity for the library to deallocate any process or NLM instance data created for its client. This may include memory, synchronization objects like mutexes, and per-thread keys.

DllMain is called with DLL_THREAD_ATTACH and DLL_THREAD_DETACH only under specific circumstances that can be easily gotten around. These messages are delivered if and only if the client application creates new threads and terminates them normally through either NKS mechanisms such as NXThreadCreate or the industry-standard thread packages built upon them, such as pthreads (pthread_create) or UNIX International (thr_create).

The ability to handle the thread and process messages in DllMain is a delicate one only recently made available by the creation of the DllMain mechanism. Libraries built around _NonAppStart, and consuming applications that auto-load libraries in the traditional way, cannot make use of these mechanisms. This is still the majority of NLMs; indeed, at the time this AppNote was written, there were no shipping libraries based on DllMain. Therefore, in addition to coding the DllMain messages, it is best to handle attach and detach in the old way rather than to rely purely on the Windows way. (This will be demonstrated later in this AppNote.) The reason is that you likely cannot force other NLMs to consume your entry points via dlopen (LoadLibrary), but if the library isn't going to be consumed by any applications outside your own, you could dispense with this complexity.

Kernel and Protected Address Space Support

If your library consumes kernel-only services, you cannot load it in a protected address space where it will not have access to those services. If the library is loaded in the kernel, then consuming applications cannot themselves be loaded into a protected address space without special modification of your library.

LibC overcomes this problem by loading both in the kernel and in a protected address space. In cases where the service being interfaced by LibC is a kernel-only one (such as managing System V semaphores and other IPC mechanisms like FIFOs that must be shared across all address spaces), it is the instance of LibC loaded in the kernel that handles all of these issues-either directly with the consuming applications, or indirectly by fronting interfaces to the instances of itself that are loaded in a protected address space.

The coding to do this is rather tricky and involved. To attempt to cover it here would greatly lengthen and confuse the subject at hand. A future AppNote will cover the issues of basic marshalling and illustrate how to write interfaces in kernel code that can be consumed from an application in the protected address space.

Tricks in Writing Dual-Mode NLMs

Dual-mode NLMs are those written to LibC that also support CLib-based clients thinking they are calling into a CLib-based library NLM. Depending on what services an NLM library wants to consume in its pursuit of purveying more sophisticated services to its down-streamers, it may be necessary to check to see what sort of NLM is calling using the get_app_type function. This function returns a bit field that contains information about what sort of NLM owns the calling thread.

Today, the possible types are:

LIBRARY_CLIB - the calling thread was created by a CLib application via BeginThread or BeginThreadGroup
LIBRARY_JAVA - the calling thread is obviously one created by the Java Virtual Machine
LIBRARY_UNKNOWN - no obvious context can be identified

Using this information, it is possible to make calls into CLib instead of LibC. It is hoped that this undesirable phenomenon will not occur very often, especially since a CLib application can also consume interfaces from LibC.

As an illustration, imagine that due to inescapably poor library and client design (perhaps because a library is being ported from CLib dependence to LibC, but the application already running on it cannot be ported yet), a library must allocate a CLib semaphore and pass it back for consumption in its caller. The following code illustrates the decision to do this.

#include <library.h>

extern LONG OpenLocalSemaphore( LONG initialValue );

int dosomething( LONG *sema )
{
   if (get_app_type() & LIBRARY_CLIB)
      *sema = OpenLocalSemaphore(1);

   return 0;
}

Writing a dual-mode library can be tricky when it comes to header file inclusion. It is best to use the LibC headers and, as I have done here for OpenLocalSema-phore, not include the CLib ones, in this case, nwsemaph.h. This is because many LibC and CLib headers have the same names, their types and data structures aren't compatible and, in the case of stdarg.h, they are totally incompatible. Functions such as vprintf in CLib cannot be consumed when LibC's stdarg.h is used to compile the call, and vice versa. Fortunately, the number of such functions is very small and the need to use them is smaller still.

The function foo in the sample code demonstrates coding a dual-mode situation and uses printf (and not vprintf!) from both libraries as its illustration.

Versioning-A Solution

It is conceivable that you could use prefixing to create a sort of interface "versioning." Versioning interfaces is desirable when their parameter lists, structure size, and so on change, creating a situation in which the version is being used cannot be divined from the call, or an inevitable page fault would result no matter how carefully the investigation is done.

One method is to use a different or numbered prefix for the new function. Existing callers are already using the basic prefix in their NLMs. If they recompile and relink, they would get new sizes, or at least compilation errors, forcing them to move up to the new version. Relinking using your import file gets them linked to the new version.

Assume a function, foobar, is getting a facelift for whatever reason in your library foolib. In your library code, when you create a new version of a function, use the old name, but export it as FOOLIB2@foobar.

Next, rename the old function in the code to foobar_old or whatever suits your fancy. When your library loads, you will use ExportPublicObject to export it dynamically under the old name which is remains FOOLIB@foobar.

ExportPublicObject(NLMHandle, "FOOLIB@foobar", foobar_old);

This is because (except for using aliasing in the Watcom linker) there is no way to export a symbol of one name (foobar_old, for example) under a different name, prefixed or not.

When your NLM loads, it will statically export the new function and, quickly upon start-up, begin exporting the old function, though renamed, under its expected name.

Your export/import file will appear thus:

EXPORT (FOOLIB) foo, bar, morefoo, (FOOLIB2) foobar

Note that this versioning solution removes the possibility of discovery via dlsym version even if the rest of your library follows the guidelines in this AppNote for conformance to dlfcn.h use.

Walk-Through of an NLM Library

This section will step you through the elements of a well-written NLM library. Then it will delve into the particular aspects of process and thread attach/detach, where the methods deviate from each other. I'll point out how a DllMain-based library can be coded in such a way as to prefer that method of consumption, but still be consumable by other, more traditional methods.

Startup and Initialization

To link my library, I need the following statements in a makefile.

START _LibCPrelude
EXIT _LibCPostlude

You see, creating a library isn't different with respect to start and exit functions from an application. It is the existence of a DllMain that makes the difference. Even if I use the _NonAppStart method of writing a library, I still use the same link statements to get LibC's help. Allowing LibC to start me up gives me full LibC context for the purposes of initialization. I can call printf or any other reasonable function from my DllMain code if I do this.

I'll show both types of start-up code to demonstrate how this is done.

_NonAppStart.The code for the _NonAppStart version of my library is listed after the DllMain sample code for purposes of comparison.

DllMain.The DllMain method is almost identical to how it would be done on Windows, but with a few differences in that there are three additional messages to handle beyond the usual ones:

DLL_DLLMAIN_STUB. You must return TRUE for this message. This is called by LibC when it starts up your library (only if you link libcpre.o, of course) and is how LibC detects that the DllMain present is a real one and not the stub that applications or other NLMs that don't code a DllMain get from LibC itself. In other words, this is a crutch I've used to get around certain problems in the NetWare Loader. If you do not do this, your DllMain will never get called. You could use this as a trick to embed a dormant DllMain inside your library that is only used when you load it with a special switch which you would get using getcmdline, but that's left to your imagination.
DLL_NLM_STARTUP. This is the rough equivalent to _NonAppStart. Anything you would do there, you can do here instead of coding that function separately, though it doesn't matter if you have both DllMain and _NonAppStart.
DLL_NLM_SHUTDOWN. Similarly, this is the same thing as _NonAppStop.

DllMain Sample Code and Commentary

In this section I make specific comments on code segments and partial or incomplete code segments from the DllMain sample code. But for space reasons, I won't reproduce all of the code here.

Let's start by examining some definitions. First, here's the header that my library, foolib, will distribute to its down-stream clients.

foolib.h:

#ifndef __foolib_h__
#define __foolib_h__

#ifdef __cplusplus
extern "C" {
#endif

int   foo    ( const char * );
int   foobar ( void );
int   foobad ( void );
int   morefoo( void );

int   *__foo_errno( void );

#ifdef __cplusplus
}
#endif

#define foo_errno                  *foo_errno()

#endif

All the interfaces surfaced by foolib are prototyped here, plus a prototype for its errno. Reporting errors in this way is an outdated method, but here it serves to demonstrate how a library can support per-thread data.

Here are the private, internal definitions to remember.

private.h:

#ifndef __private_h__
#define __private_h__

typedef struct                           // instance data for clients down-stream
{
   NXKey_t         appThrKey;
   char         appOtherdata[1];
} appdata_t;

typedef struct                           // thread-specific data managed for clients
{
   int      thrX;                  // (thrX: just so errno isn't the only...
   int      thrErrno;                  // ...per-thread data in this example)
} thrdata_t;

// library-private data...
extern int                  gLibId;
extern void                  *gModuleHandle;
extern NXMutex_t                  *gLibLock;

// internal library function prototypes...
int   GetOrSetInstanceData                        ( int id, appdata_t **data, thrdata_t **thrdata );
int   DisposeAppData                        ( appdata_t *data );
int   DisposeThrData                         ( thrdata_t *data );

#endif

Using these, I'll be able to manage instance data no matter whether it is consumed in modern fashion, via LoadLibrary and DllMain, or in the traditional NetWare way.

Now, here is the DllMain. Ignoring the complexity of handling instance data, it looks like this.

dllmain.c:

#include <windows.h>
#include "private.h"

int DllMain
(
   void           *hinstDLL,
   unsigned long  fdwReason,
   void           *lvpReserved
)
{

   switch (fdwReason)
   {
      case DLL_PROCESS_ATTACH :
      case DLL_PROCESS_DETACH :
      case DLL_THREAD_ATTACH :
      case DLL_THREAD_DETACH :
      case DLL_ACTUAL_DLLMAIN :
         return TRUE;

      case DLL_NLM_STARTUP :
         gModuleHandle            = lvpReserved;
         gLibId            = (int) hinstDLL;
         register_destructor((int) hinstDLL, (int (*)(void *))
            DisposeAppData);
         return TRUE;

      case DLL_NLM_SHUTDOWN :
         return TRUE;
   }

   return FALSE;
}

But this code has to do a lot more, including process and thread attach and detach-hence the code as it appeared in the _NonAppStart example presented earlier. Consumers of foolib that load/use it via LoadLibrary will get their thread attach and detach using DllMain. Those that consume it in the more traditional NetWare way will not. Support for the latter is provided by GetOrSetInstance-Data. This merits some comment, mostly to point out that this function is to be called from each exported function that needs to handle instance data. Otherwise, a thread calling in will fail to do what it is supposed to do.

The remaining detail to discuss was just alluded to. I'm not proposing to create an useful library here, but if I have a function that's going to consume instance data, I must, as I have already said, set up pointers to the data appropriate to the calling thread (which also implies an NLM or application). In this case, I'm going to use an errno-type, per-thread variable-even though hardly anyone creates interfaces in this way anymore, it's a good illustration of per-thread data use. Be sure to read the additional comments in the source code on how to protect yourself in case of failure to get instance data.

int foobar( void )
{
   int         err;
   appdata_t   *data;

   if (err = GetOrSetInstanceData(gLibId, &data, (thrdata_t **) NULL))
      {
         foo_errno = err;
         return -1;
      }

      // do stuff that foobar must do as a library function...
      return 0;
}

int *__foo_errno( void )
{
   thrdata_t   *data;
   static int  MINUS_ONE = (-1);

/*
** This library errno is implemented here only to show how a library might
** make use of one key in the calling application to use to store any thread-
** specific data that might be needed, like an errno.
*/
   return (GetOrSetInstanceData(gLibId, (appdata_t **) NULL, &data))
            ? &MINUS_ONE
            : &data->thrErrno;
}

_NonAppStart Sample Code

Here is the _NonAppStart version of the above code. Note the comments about not managing instance data. If you want to have a check-unload procedure (one that is called by the NetWare Loader before your NLM is to be brought down-something you can prevent using this procedure), be sure to add the following line to your linker directives:

CHECK _LibCCheckUnload

and your check-unload function is called _NonAppCheckUnload.

For more information about check-unload procedures, refer to "How to Write Start-up Code for NLMs" in the August 2002 issue of AppNotes at http://support.novell.com/techcenter/articles/dnd20020806.html. The relevant section starts on page 89 in the hard copy.

nonapp.c:

/*
** This is only an example of how to do library start-up and shut-down
** without DllMain().
*/
#include <screen.h>
#include <stdlib.h>
#include <string.h>
#include <library.h>
#include <netware.h>
#include <nks/synch.h>

#include "private.h"

/*
** This lock protects me from attempting to allocate application data on
** more than one thread at once for the same NLM. I don't need it to govern
** per-thread allocations since they are only done for the calling thread
** by the calling thread. This lock will tend to convoy application instance 
** data creation, so if performance takes a hit because of lots of client 
** applications hitting it, I'll need to think up a better solution.
*/
NXMutex_t   *gLibLock = (NXMutex_t *) NULL;

int         gLibId = -1;
void        *gModuleHandle = (void *) NULL;

NX_LOCK_INFO_ALLOC(gLibLockInfo, "Per-application Data Lock", 0);

int  _NonAppStart
(
   void        *NLMHandle,
   void        *errorScreen,
   const char  *commandLine,
   const char  *loadDirPath,
   size_t      uninitializedDataLength,
   void        *NLMFileHandle,
   int         (*readRoutineP)( int conn, void *fileHandle,
                     size_t offset, size_t nbytes,
                     size_t *bytesRead, void *buffer ),
   size_t      customDataOffset,
   size_t      customDataSize,
   int         messageCount,
   const char  **messages
)
{
   int   err;
#pragma unused(commandLine,loadDirPath,uninitializedDataLength)
#pragma unused(NLMFileHandle,readRoutineP)
#pragma unused(customDataOffset,customDataSize)
#pragma unused(messageCount,messages)

   SetAutoUnloadFlag(NLMHandle);

/* F R O M   N L M   S T A R T - U P . . .
** This is not the right environment to allocate application-specific instance
** data because the module is not being called by either the auto-loading
** consumer or by the one calling LoadLibrary() or dlopen(). Even if one of those
** two is the reason it's loading, this particular call is from the NetWare
** Loader putting it in memory and allowing it a chance to initialize.
*/
   gLibLock = NXMutexAlloc(0, 0, &gLibLockInfo);

   if (!gLibLock)   // this will rightly keep it from loading!
   {
      OutputToScreen(errorScreen, "Unable to allocate library data lock.\n");
      return -1;
   }

   gLibId = register_library((int (*)(void *)) DisposeAppData);

   gModuleHandle = NLMHandle;

   return 0;
}

void _NonAppStop( void )
{
/* F R O M   N L M   S H U T - D O W N . . .
** This case occurs at extreme shut-down (under the control of the NetWare
** Loader) because the module is autounloading, is no longer needed, and has its
** autounload flag set either with the linker or by calling SetAutoUnloadFlag(), 
** or being hard-unloaded.
**
** If this is a case of being unloaded after traditional NLM use, this module
** isn't able to clean up its consumer's instance data now because it doesn't
** have any relationship to it. That's not a problem: the consumer cleaned it up
** by reason of the module having registered its destructor in the first place.
*/
   NXMutexFree(gLibLock);
}

int _NonAppCheckUnload( void )
{
   // it's okay with being unloaded...
   return 0;
}

Conclusion

This AppNote has provided some discussion and recommendations for writing NLMs as dynamic libraries. You are encouraged to download the source code and study it in its entirety. It is available to download from http://developer.novell.com/research/appnotes/download.htm. Feel free to consume this code in any way you see fit for writing your own NLM library.

The following links are to newsgroups where you can get suggestions and help for problems you encounter:

LibC: news://developer-forums.novell.com/novell.devsup.libc
CLib: news://developer-forums.novell.com/novell.devsup.clib.clib

* Originally published in Novell AppNotes

Disclaimer

The origin of this information may be internal or external to Novell. While Novell makes all reasonable efforts to verify this information, Novell does not make explicit or implied claims to its validity.