Guidelines for Enabling Software for Japan

Articles and Tips: article

01 Sep 1995

Unlike the U.S., Japan does not have a single, dominant PC architecture. Instead, some hardware manufacturers use their own proprietary hardware, bus interface, video etc. In addition, not only are OEM versions of MS-DOS different from computer to computer, but the BIOS interface for each machine may also be different. Companies must therefore either build software that can run on Japanese hardware unaltered or create software that can be easily modified through hardware isolation. This DevNote offers guidelines in designing or retrofitting software to run on Japanese machines. The best way to enable software is to do it while writing original code. This DevNote was prepared by Novell's International Software Engineering group.

Introduction
Hardware Compatibility Problems
Japanese Language Programming Issues
Conclusion

Introduction

Technical problems for enabling and localizing software for Japan occur in two areas:

Hardware compatibility problems
Specific Japanese language programming issues

This document describes special rules for writing software to run on Japanese PC hardware. Specific Japanese language programming issues are identified. General language enabling issues are addressed in documents prepared by Novell, IBM, Microsoft, and others. This document has been previously distributed. This version is Rev. 2.2.G (92.11.20).

Hardware Compatibility Problems

The Problem. Unlike the U.S., Japan does not have a single, dominant PC architecture. Instead, some hardware manufacturers use their own proprietary hardware, bus interface, video adapter, interrupt usage, memory map, etc.

Software programs that make hardware assumptions may cause problems. In Japan, not only are OEM versions of MS-DOS different from computer to computer, but the BIOS interface for each machine may also be different (different interrupt numbers, different registers used to pass parameters, etc.) In some Japanese computers, timers, addresses used for hardware interrupt controllers, interrupt levels, etc., may be at entirely different locations - or completely nonexistent. Even simple assumptions like video memory being represented as "character, attribute, character, attribute" prove to be invalid in some cases.

The Solution. All of the problems described can be overcome by using generic DOS function calls, by using enabled Novell standard libraries, and by using some "tricks" described herein.

The following are specific programming rules which should be used as guidelines in designing or retrofitting software to run on Japanese machines.

Rule1: DO NOT make assumptions about hardwareplatforms.

While Japanese PCs use Intel (or compatible) processors, there is no reason to assume they all use Intel chipsets (8259 interrupt controller, 8237 DMA controller, etc.). Even if Intel chips are used, there is no guarantee that they use the same I/O addresses as industry standard PCs. Some manufacturers in Japan use their own proprietary chipsets or different I/O addresses. Some NEC PC9800 series and some Fujitsu FMR computers have no DOS LPTx devices. PRN may be the only printer device in the DOS device list. These machines only support one printer (with no expandability options as with IBM PC/AT compatibles.

Rule2: DO NOT use BIOS calls. Use generic DOSequivalents. In addition, do not make assumptions aboutmemory usage, flags, etc.

The Interrupt vector assignments on some Japanese machines are DIFFERENT. Specifically, vectors greater than 4h and less than 20h, and vectors greater than 30h are machine dependent. The DOS vector implementation (using vectors 20h through 2Fh) seems to be mostly consistent among the Japanese platforms.

The BIOS GetDateAndTime call int 1A, for example, sometimes hangs NEC and Fujitsu machines. This shows up in the C programming language as dosGetTime, dosSetTime, dosGetDate, and dosSetDate family API calls.

Also, the BIOS data area on some machines has a different address location and has a different layout, so things like equipment flags at 40:?? cannot be found. Likewise, the function getEquipFlag() does an illegal int 11 call on some machines, causing programs to fail.

Rule3: Use well-behaved development tools. Companiesneed to standardize on a compiler, whichhelps simplify library maintenance.

Client Software Development

Traditionally, Microsoft C compiles down to the DOS int 21h level and does not use BIOS calls in the startup code. These Microsoft-compiled programs, in general, do not have hardware incompatibility problems on the Japanese machines. Any files with Japanese message strings need to be compiled using an AX version of Microsoft C v5.1 with the /J command line switch. A better practice, however, is to isolate messages from the source.

Borland C++ 2.0 (and Turbo C) compile some runtime library functions to the BIOS (or direct hardware access level), rather than to the DOS function call level. In addition, the Borland startup code (prior to version 4.0) contains BIOS calls that are IBM PC/AT specific--causing programs running on some Japanese machines to fail. Here are some useful guidelines to get Borland-compiled programs to work on the Japanese machines.

Note: Information contained in the following two sections has relevance only when using Borland C compilers prior to version 4.0.

Link with Modified Borland Startup Code. In the startup source code (usually distributed with the compiler), make two changes to allow proper execution on Japanese machines. The files affected are C0.ASM (for DOS EXEs), C0W.ASM (for MS-Windows EXEs), and C0D.ASM (for MS-Windows DLLs). Other Borland runtime functions, such as the timer classes, use INT 1Ah.

Startup code change 1: Comment out INT 1Ah BIOS. This call is used to set up a variable to contain the BIOS tick value at the start of program execution. The variable is used in the clock( ) runtime library function (which should not be used). Also for Borland C++ v3.0 startup code, comment out the setting of the BIOS midnight flag at 40:70 (BIOS data area).

Startup code change 2: Comment out the saving and restoring of interrupt vectors 5 and 6. Only INT 0 through INT 4 are the same for Japanese (and industry standard PC) machines. According to comments in the startup code, vectors 0, 4, 5, and 6 are saved in case they get used in the runtime library signal( )/raise( ) function support. The necessity of this second change may be argued. There is really no harm in saving and restoring all four of these vectors.

This change assumes no use of signal( )/raise( ).

Do Not '#include <conio.h<' or Use Any Function Prototyped in it. Even if the runtime library function you desire to use does not compile down to a BIOS call or direct hardware access (like getch( ) and putch( )), because it was prototyped in CONIO.H, an initialization routine (to setup for runtime function calls) is called during startup which does a series of IBM PC/AT compatible BIOS calls. Even if you do not '#include <conio.h<' and just use the runtime library function call, this video BIOS initialization routine is called by the startup code. By the way, if you need any of the simple screen or keyboard functions (getch( ) or putch( ), etc.), make your own equivalent functions based on the DOS int 21h interface. At this level, all of the Japanese machines are compatible.

Use the Japanese NEC Version of Borland C++ v2.0 with the /J option to Compile Modules Containing Double-Byte Shift JIS Japanese Strings. As noted earlier, however, it is better to isolate strings from source files. The NetWare NLM development kit provides message tools which enable software programs (both EXEs and NLMs) to reference strings contained in separate message files.

Japanese Industrial Standards (JIS) defines a double-byte encoding scheme for Japanese characters. Double-byte support for DOS uses Shift-JIS, where JIS characters are shifted up to valid ranges to avoid conflicts. In Shift-JIS, the backslash character (5Ch) is in the valid second byte range of double-byte characters. A 5Ch in a string or character constant denotes an escape sequence in a C source code file. Thus, double-byte Japanese strings get compiled incorrectly when valid second byte 5Ch characters get interpreted as escape sequences. The /J command line compiler option validates single-byte backslash characters, thus solving this problem.

Unfortunately, the /J option is only available in the NEC version of Borland C++ v2.0, which, of course, only runs on NEC Japanese machines. So all of the message files need to be compiled separately on the NEC and linked-in using an industry standard PC compatible . . . a difficult make process. A better approach would be to avoid compiling messages.

Server Software Development

Watcom C and MetaWare High C compilers work fine for NetWare NLMs. Use the /zk0 (Japanese) command line switch on the Watcom compiler when compiling C files with double-byte Japanese strings. With the MetaWare compiler, use the -kanji switch when compiling double-byte Japanese message files.

Rule4: DO NOT use OS/2 family API calls for DOSapplications. Instead, use generic DOS versions. Useconditional compilation for OS/2 versionswhere family API calls are needed.

Many, but not all, of the family API calls compile directly to BIOS or hardware level, instead of to DOS int 21 functions. The resulting executable code contains many of the anomalies mentioned above. Particularly notorious examples are kbd... calls, vio... calls, and dos... time and date calls.

Rule5: Use generic API interfaces(MS-Windows,Win32, OS/2 Workplace shell, Novell versionof C-Worthy, DOS int 21h, etc.) for screenand keyboard I/O.

Some programs determine the location of video memory and write directly to it. This doesn't work in Japan because the location of video memory is often different. The manner in which data is stored in video memory is also different. For example, the NEC PC9800 series computers convert to a completely different character set, then sets the high-order bit (15) for the left half of a character. NEC memory is laid out in two separate blocks--one for characters, and one for attributes. By contrast, industry standard PCs use "character, attribute, character, attribute."

There are several ways to display messages that have become de facto standards within Novell (i.e., MS-Windows, PM, C-Worthy). Programmers should avoid using other special, proprietary user interfaces.

Rule6: DO NOT assume that logical drive letterscorrespond directly to certain types of physical drives.Instead, use other generic approaches toobtain needed files.

Some computers, NEC PC9800 series for example, assigns logical drive letters differently than the assumed "A:" for first floppy drive, "B:" for second floppy drive, "C:" for first hard drive partition, etc. In some cases, the boot drive is always assigned to "A:", regardless of the type of drive. This causes problems for installation routines that ask users to insert diskettes into "Drive A:". Other operating systems such as Unix and Macintosh don't use drive letters at all.

Alternatives to hard-coding drive letters include the following:

Use the DOS int 21h IOCTL call (4408h) to scanall drives in order to determine which are fixed and which are removable.
For MS-Windows applications, use GetDriveType().
Scan all known drives for desired files.
Ask users what drive to load files from.

Rule7: DO NOT make assumptions about hard diskor floppy disk formats and sizes.

Some Japanese hardware manufacturers have their own proprietary disk format for both hard drives and floppy drives. This includes the location and format of partition tables and the floppy disk data area. Developers of installation programs should not assume that certain sectors hold specific data, that disk partition tables have specific formats, etc. In addition, companies packaging software products onto distribution diskettes should ensure that all files fit within approximately 1.1 MB of disk space since some computers only support 1.2 MB diskettes.

Rule8: If information normally provided by COUNTRY.SYSis needed, provide alternative methods forobtaining it.

Some Japanese computer manufacturers have their own proprietary OEM versions of DOS, which generally look the same "from the top down." There are a few differences, however. As mentioned in Rule 1, some NEC and Fujitsu versions of DOS include only the PRN device (no LPTx devices) in the DOS device list. Several Japanese DOS versions do not implement all int 21h DOS function calls.

In particular, functions 38h and 65h (country dependent COUNTRY.SYS information) have not been supported consistently across all of the various Japanese ports of DOS. Some Japanese manufacturers apparently find it unnecessary to use COUNTRY.SYS, because they assume that users and/or their software already know they are using proprietary Japanese hardware. A workaround is to provide an alternative means of getting necessary information, such as using environment variables or using properly enabled Novell library calls.

Rule9: DO NOT use the 25th line of the video display.

Japanese vendors use a language conversion front-end processor (FEP) TSR which allows users to enter Japanese characters using alphanumeric keyboards. Many FEPs utilize the 25th line (the bottom most 80x25 text mode line) of the display. Unpredictable side effects can happen when text is displayed on the 25th line of the screen after the FEP is invoked (including hanging the machine).

An alternative to this rule would be for software to sense how many screen lines are available on a particular machine and then act accordingly. C-Worthy has a ScreenSize( ) function call which is implemented in the machine dependent $RUN.OVL screen and keyboard driver file. ScreenSize( ) will return 24 lines when running on Japanese machines; 25 lines on other machines.

Developers should be aware that Japanese language FEPs may require over 100 KB of memory. Therefore, applications and utilities need to be written as small as possible to avoid loading problems and possible memory allocation errors. For example, developers may wish to test their products to see whether-or-not they run with 400KB or conventional memory or less.

Rule10: Hardware-isolate ("driverize") softwarecode in places where direct control overhardware must take place.

In the few cases where direct control of hardware is required (network clients, the server OS, print servers, etc.), programmers should hardware-isolate ("driverize") the code so that hardware-dependent pieces of the code can be given to outside developers to adapt to their specific hardware platforms. By doing this, programmers do not need to give away the entire source of a top-secret piece of code just to get it to run on some peculiar piece of hardware.

The hardware-isolated source code should be broken out into separate source code modules/files. These "driverized" modules can then be handed off to trusted third parties who can modify them to work on particular hardware platforms. These modules can then be linked with the code.

Testing Recommendation

Development groups should have access to at least one Japanese Fujitsu FMR (non-DOS/V ) computer to use for testing code. Not only is the FMR architecture and hardware very different from industry standard PCs, it shows appropriate error messages on the display when particular hardware coding rules are broken. When invalid interrupts are executed on FMR computers, interrupt numbers and CPU register contents are displayed (when using DOS 3.3 or older, and not using MS-Windows).

Japanese Language Programming Issues

The Japanese written language consists of three different character types: phonetic Kana characters (comprising Hiragana and Katakana), Roman characters (Romaji), and several thousand Chinese (Kanji) characters. The different character types may be mixed and matched within sentences, or even within words. Kana and Romaji are either single or double-byte characters. Single-byte characters require only one screen column to be displayed, whereas double-byte characters require two screen columns.

In Shift JIS (which corresponds to single-byte code page 897 or 932), single-byte Katakana characters are mapped within the range A1-DF hex. Double-byte Kana, Romaji, and Kanji characters have a first byte within the ranges 81-9F or E0-FC hex. The second byte can be any value except 7F. Current implementations of second byte character tables are between 40 and FC. DO NOT hardcode these values into products. Instead, use NetWare or OS provided API interfaces to handle text.

Novell Enabling Guidelines II ( Novell, Inc., 1991) has some good programming guidelines for handling multi-byte strings (section 2.3.4 Double-byte considerations). Here are some additional issues that should be emphasized.

Backslash Searches in Multi-Byte Strings

The 40-FC second byte range includes the backslash character "\" (0x5C) which is used as a currency symbol as well as a path delimiter in most Asian countries. Routines which blindly search for 5Ch as a path delimiter will not work correctly for Asia. The following is the proper approach for doing path delimiter searches.

Scan the path from the beginning.
Test the character to determine its type (single- or double-byte).
If the first byte of a double-byte character is found then skip the next byte, regardless of what it is, before continuing the search for path delimiters.
If the character is single-byte, do the path delimiter (0x5C) check.

This method of scanning strings is called double-byte enabled parsing. It is very useful in solving other double-byte language programming problems as well.

Uppercasing Second Bytes of Double-Byte Characters

One of the most common coding errors occurs when programs blindly convert characters from lower to uppercase. Lowercase ASCII characters are in the valid second byte range of double-byte Japanese characters. The whole double-byte character changes if the second-byte gets uppercased.

The solution is to double-byte parse the string to be uppercased, only uppercasing single-byte lowercase ASCII characters.

Restricting Valid String Characters

Another common error is restricting acceptable characters (for file names, user names, etc.) to the lower ASCII range (7 bit). Programs that check for control characters or that assume only lower ASCII characters to be valid in filenames also cause problems. All of the single-byte Katakana and the first bytes of valid double-byte characters fall in the extended ASCII range. Programs that accept only lower ASCII characters prevent the use of the double-byte kanji characters.

Note: For NetWare v3.x and v4.x, server names and volume names must be lower ASCII. No double-byte characters are allowed for server and volume names, since double-byte characters may cause problems with internetwork routers and may not display correctly on other "English" servers.

The double-byte space character is another problem. File names should NOT include double-byte space characters. (A double-byte underscore character may be used in its place).

Truncating Strings

Truncating strings is tricky business in the Japanese world of programming, due to the possibility of splitting double-byte characters. Right or left scrolling (skewing) involves truncation on both ends of a sliding window view of a string. Be sure to truncate at character boundaries. Sometimes double-byte character splitting is unavoidable, as in a constant width skewing situation. In this case, blank out the half of the character that would be otherwise visible with a single-byte space character (20h).

Use of Text Mode Box Characters

The single-byte line draw border characters on the Japanese machines may not be the same code points as in extended ASCII (U.S. code page 437 or multilingual code page 850). In fact in Japan, the "line draw" characters may be different from computer type to computer type.

For C-Worthy applications, the $RUN.OVL screen and keyboard driver displays line draw characters correctly. If specific characters are desired, use the standard offsets (defined in nwuform.h or nutform.h) into the $RUN.OVL _formChars table to get the appropriate values for the particular machine you are running on. The C-Worthy background character and the arrow characters are also available.

Be aware that single-line box characters and double-line box characters are not all supported on some Japanese PC platforms. If you specify the single-line characters in the _formChars table, you may get double-line characters and vice versa.

Conclusion

Companies must either build software that can run on Japanese computer hardware unaltered or create software that can be easily modified for Japanese hardware through hardware-isolation ("driverization").

The best way to enable software is to do it while writing the code the first time. In cases where companies are building on existing software, the next best alternative is to enable the code while making changes for the next version. In either case, the time to enable software is now.

* Originally published in Novell AppNotes

Disclaimer

The origin of this information may be internal or external to Novell. While Novell makes all reasonable efforts to verify this information, Novell does not make explicit or implied claims to its validity.