How to Use the ODBC Driver with NDS, Part 3: Understanding XML

Articles and Tips: article

Joshua Parkin
Senior Programmer
NamSys, Inc.
jparkin@home.com

01 Apr 2001

This AppNote, the third in a series, explains what XML is and introduces the basics of its syntax and naming conventions. It also describes how to build a well-formed XML document and display XML files. It concludes with a look at converting data stores to XML.

Introduction
What Is XML?
XML as a Whole
Naming Conventions for XML Tags
Building an XML Document
Displaying XML Files
Converting Data Stores to XML
Conclusion

Introduction

As discussed in Part 2 of this series (see "How to Use the ODBC Driver with NDS, Part 2" in the December 2000 issue of Novell AppNotes), the ODBC driver for NDS does not support ADO (Microsoft"s Active Data Objects programming interface for data access). This is a dilemma when you want to populate your Web pages with NDS data without using Novell Script for NetWare or an ActiveX control. However, you can use another method: you could populate your Web page by creating an XML (eXtensible Markup Language) file to store the information you want from the NDS tree. This method is plausible because the tree data doesn"t really change that often. Unless you are building statistics on volume usage, print queue monitoring, or accessed files, you can use this method.

This AppNote and the next one will cover how to use the ODBC driver with Visual Basic to create an XML file. Part 3 gives you a brief comparison of markup languages and XML. It also discusses important factors like namespaces and the necessity for well-formed documents. Part 4 will help you build a well-formed XML file using SQL queries passed through the ODBC driver. This XML program will be configurable, depending on your query and your tree layout, allowing you to build XML files with little to no hard-code changes.

The ODBC Driver for NDS is available as a free download from http://developer.novell.com. Membership in Novell's DeveloperNet program is free (at the electronic level), and I recommend that you sign up to take advantage of the many resources available to Novell's DeveloperNet members. The NDS ODBC driver runs on any Win32- based computer; it does not run on the NetWare server itself.

What Is XML?

Undoubtedly, you have heard about XML by now. Books cover it extensively; Web sites cover the developments 24 hours a day; newsgroups and chat lines offer numerous discussions and helps. Novell is releasing software that allows you to work with XML. For example, Novell's DirXML allows you to manipulate tree information using XML. But what is XML?

To answer, I will first tell you what it isn't. XML, by itself, is not a way to present data visually. There are many ways to display the data, including Microsoft's DOM (Document Object Model), XSL (eXtensible Stylesheet Language), or one of the many other Java components that come with IBM's WebSphere. Nor does XML process the data by itself. Currently, you cannot write a program entirely out of XML. You must use the various X- languages like XSL, Xlink, Schemas, or DTDs to function with the XML file. Are you still with me?

So what can you do with XML? You can use XML to store data in a file that will interchange among various operating systems and computers. XML has been adopted with open arms by everyone and their cousin's boyfriend. This is the first time that something like this has happened since the world-wide adoption of TCP/IP for networks. It's pretty exciting when you consider we are at the ground floor of this change. XML (in its current version) was approved just a short time ago (in August 1999), and since then its popularity has skyrocketed. You will see more and more content on the Internet that is XML-based, whether you see the XML document behind it or not.

As of the time this article was written, Microsoft's Internet Explorer supports XML and XSL, and Netscape is working on a browser to support this format. Since XML is a relatively new format, legacy browsers such as IE 4.0 and Netscape's 4.x browsers do not support style sheets. To get around this, you can do server-side scripting to handle the processing of XML files and display the data with another markup language like HTML (HyperText Markup Language).

What Is a Markup Language?

You can use markup languages to mark up data so that an application that reads it knows what to do with it. Take a look at how some RTF (Rich Text Format) markup code is displayed in Figure 1.

An example of RTF markup code.

The RTF markup code that produced this display is:

{\rtf1\ansi\ansicpg1252\deff0\deflang4105{\fonttbl {\f0\fnil\fcharset0 Times New Roman;}} \uc1\pard\ulnone\f0\fs20 Here is an example of RTF mark up. This is \b bold\b0 , \i Italic\i0 , \ul Underline\ulnone .\par \par }

At first glance this code might not make much sense. You can kind of understand parts of it, I am sure, like the \b and \b0 turns bold on and off, \i and \i0 turns italics on and off, and so on. But don't worry too much about the syntax here; it's just to give you an example of a markup language. The main thing to notice is that the code itself does nothing but tell the interpreting program how to display the text. There is absolutely no logic or processing code, just simple markup.

Another good example of a markup language is HTML. With HTML you might have a file that looks like this:

<HTML<<<body<<<h1<This is a sample file being marked up</h1<<<p<This is just to show you some text that is being marked up by HTML</p<<</body<<</HTML<

In HTML, everything enclosed in the brackets (< <) is called a tag. With HTML, the tags are used to give the browser commands on how to display the text between the tags. Tags are opened with a <tag name< code and closed with a </tag name< code. The tags in this example tell the Web browser how to display the text. Not all tags follow this command format. For example, the <hr< command tells the browser to draw a horizontal line across the Web page.

Browsers can react to HTML code differently. For example, if you forget to close a table with the </table< command, Internet Explorer will still show the table. But Netscape does not; you get a blank page. For this reason, it is considered bad form not to close the tags as required. This comes into play even more so with XML; that is why you hear references to <well-formed< XML, which we will cover a <bit later.

HTML has set markup tags also knows as commands. Don't confuse these with programming commands. Programming involves logic processing and variable manipulation. Markup languages aren't designed for this. With HTML, the markup tags tell the browser how to display the data. Other than that, they don't do much, nor are they very definitive of what text is carried with the tags.

For example, an address in HTML code would like this:

<h1<Joshua Parkin</h1<<<p<211 Somewhere Lane</p<<<p<Somewhere Land</p<<<p<(321) 987-1234<p<

Now, what if I wanted to use a query to find out the name in this HTML file? How would I do that? There is no definition of where my name is in the HTML file. Therefore, using this as a data store would be rather useless without extensive scripting. Consequently, if I wanted to transfer this data to another machine and process it by another method, this form obviously wouldn't be too appropriate. Someone would have to enter the data manually on the other machine.

For example, if I used HTML code to build a personal profile of myself that included my name, address, telephone number, and so on, and tried to use it with all of the forms I sometimes have to fill out on the Internet, the profile would be useless. The receiving server wouldn't know what to do with the profile. That leads us to XML.

XML as a Whole

With XML you have tags, much like in HTML. These tags look the same, except they have one very important feature: they don't represent display markup. You could have an XML tag named 'p' if you wanted to. But when viewed as an XML file, it would not provide the same formatting feature as the HTML 'p' tag.

You can create rather large XML files that act similarly to an NDS tree. Large XML files have roots and nodes, much like the NDS tree. First, let's look at the layout and naming conventions for XML files.

Layout of an XML File

XML must obey certain layout laws to produce a well-formed document. XML parsers and DOM require documents to be well-formed in order to understand them. In a well-formed document, you basically have tags and elements of those tags.

As an example of this, look again at our sample HTML code:

<h1<Joshua Parkin</h1<<<p<211 Somewhere Lane</p<<<p<Somewhere Land</p<<<p<(321) 987-1234</p<

In this code, <h1< and </h1< are tags. The text between these tags is the element. Therefore, if we want to make this a more meaningful looking document, we must change the tags to something like this:

<Name<Joshua Parkin</Name<<<Address1<211 Somewhere Lane</Address1<<<Address2<Somewhere Land</Address2<<<Phone<(321)987-1234</Phone<

But there is more. We have created some nodes, but what about a root? In XML there must be a root tag, within which all other tags and elements are stored. In this case, let's call the root tag 'personal'. The file would then look like this:

<Personal<< <Name<Joshua Parkin</Name<< <Address1<211 Somewhere Lane</Address1<< <Address2<Somewhere Land</Address2<< <Phone<(321) 987-1234</Phone<<</Personal<

Remember, you can name these tags whatever you like-anything at all. However, it is recommended that you make the tag names somewhat representative of their contents. That way you can read them at a later time and understand them. Also, when passing the file on, the next user will be able understand the elements inside the tags.

The <?xml?< Tag

At the very top of the file is a tag that tells you what version of XML it is. This tag looks like this:

<?xml Version="1.0" ?<

Note that the "xml" is in lowercase. This is important, as Microsoft IE 5.0 expects the tag to be in lowercase. As of now, there is only XML version 1.0. However, many revisions are currently being submitted to the governing council over XML (the W3 consortium), so other versions will probably be released as more things are added.

The <?xml?< tag can also contain other information. For example, you can define the language that is currently being used in your file. The format for this command is:

<?xml version="1.0" encoding="UTF-8" ?<

This command says that you are using 8-bit Unicode. (For more information on international language sets and the current revisions in process, go to http://www.w3.org/International/.)

Elements stored in the <?xml?< tag itself must be enclosed within quotation marks in order for the tag to be well-formed. In HTML, it is suggested that you put the element information inside quotation marks, but for the most part it is not that important. In XML, however, it is crucial to have a well-formed document.

Well-Formed?

In an XML file, being well-formed helps the parsers process the document appropriately. To give you a good example of being well-formed, let's look again at HTML. In HTML, you can have the following statement:

<p<This text is <b<bold, <I<</b<italic</I< and <u<underlined.</p<</u<

This statement will work without a problem. But from an XML standpoint, it is not well-formed. In XML, tags cannot be opened inside another tag and then not be closed within that tag. So, in order to be well-formed, the above example would have to be changed to:

<p<This text is <b<Bold,</b<<I<italic</I< and <u<underlined</u<.</p<

In this example, the <p< is the root tag; <b<, <I<, and <u< are all nodes inside that root. The nodes and the text inside the nodes are the elements. Each node (or tag) is closed before the next one appears. All are happy.

Another well-formed example would look like this:

<p< This text is <b<<I<<u<Bold, italicized, and underlined</u<</I<</b<</p<

Notice how the tags are closed before the previously-opened tags are closed. This is the basis of being well-formed.

There is another point in XML; much like HTML, you must change certain special characters to use them in an element's text. These characters are listed in the following table.

Character	XML
&	&
<	<
>	>
"	"
'	&apos

Naming Conventions for XML Tags

XML tags can be named anything you like-that's right, anything. There are no real laws regarding this. The only thing I would recommend is that you follow some sort of understandable layout. But again, that is entirely up to you. Of course, this flexibility with tag names leads to other problems. What if you want to merge your XML file with other XML files that contain the same tag names? That is where namespaces come into play.

Namespaces

Namespaces are described in the root tag. The namespace can be anything you like. Most people use Web addresses or URLs. For example, look again at our HTML example for my address:

<h1<Joshua Parkin</h1<<<p<211 Somewhere Lane</p<<<p<Somewhere Land</p<<<p<(321) 987-1234<p<

We have three <p< tags here. We can use namespaces to separate them, but first we need a root tag. So let<s insert a <personal< root tag and then build the same example with namespaces:

<Personal xmlns:address1="bobo"< xmlns:address2="xena"<< <h1<Joshua Parkin<h1<< <bobo:p<211 Somewhere Lane</bobo:p<< <xena:p<Somewhere Land</xena:p<< <Phone<(321) 987-1234</Phone<<</Personal<

In this example, I changed the third <p< tag to <Phone< because I thought it looked better. Alternatively, I could write this example so that it<s understandable by doing the following:

<Personal xmlns:address1="bobo"< xmlns:address2="xena"<< <name<Joshua Parkin<name<< <bobo:address<211 Somewhere Lane</bobo:address<< <xena:address<Somewhere Land</xena:address<< <Phone<(321) 987-1234</Phone<<</Personal<

Note that I still use the namespaces, because I am using two address tags.

Empty Tags

When you have empty tags in XML, you do not have to open and close them without having any elements inside. You can just do the following:

<TagName/<

This tells XML that the tag is empty. This can be used to transfer values inside the tags themselves, instead of giving them an element. For example:

<TagName Siam="Thailand"/<

The element of TagName is "Siam" and the value is "Thailand", but the tag is still empty. Therefore, there is no extra text that is added onto this. If you save an XML file using ADO, you will see that this is how your file will be laid out; it is not element-based. That is what the application we will work on in Part 4 will do.

Building an XML Document

Let's assume we want to be able to add information about more people to this XML document. To do this, we would have to create another root tag: we'll call it 'People'. We will then have to change the namespaces to become elements of the root tag:

<People xmlns:address1="bobo"<xmlns:address2="xena"<

Now we can begin to add child or leaf nodes to this parent or root node, as seen here:

We could then add information for another person, like this:

<People xmlns:address1="bobo"< xmlns:address2="xena"<< <Personal<< <name<Joshua Parkin<name<< <bobo:address<211 Somewhere Lane</bobo:address<< <xena:address<Somewhere Land</xena:address<< <Phone<(321) 987-1234</Phone<< </Personal<< <Personal<< <name<Greg Pearson</name<< <bobo:address<321 Elsewhere Lane</bobo:address<< <xena:address<Elsewhere Land</xena:address<< <Phone<(555) 976-3279</Phone<< </Personal<<</People<

Notice that the layout and format is the same for each entry. Because the next node is a completely separate entry, we need to make sure all the previously-started nodes (except for the root node) are closed. Then we just keep adding records this way. I have yet to find any limit to the size of the files or the number of entries. The only consideration I suggest you remember is that size does matter for efficiency reasons.

Displaying XML Files

Now that we have an XML document, we really need to be able to display it some way. There is an X-language designed for this purpose called XSL or eXstensible Stylesheet Language. This language is different from XML in that it is designed to let you display the XML data in the same way, no matter on which system it operates. It also allows you to change something in the style sheet and have those changes be dynamic to the pages that use the style sheet. As with data store-enabled Web pages, you generally have numerous pages accessing the same data in the store. Style sheets give XML programmers a way to display the data the same way throughout the application. Therefore, when a change-such as font size-is made to a style sheet, that change is automatically apparent in all the pages viewing the XML data.

Unfortunately, the only browser that currently has a built-in parser (at the time of this writing) is Internet Explorer 5.0, which handles XSL. Netscape is currently working on it, and their release should be available in the coming months.

An XSL file is a text file that contains information such as display information (HTML tags), sorting information, and a myriad of other commands. Using an XSL file, you can grab selected objects from the XML file, or you can read and display the whole thing. XSL can be quite a discussion topic, and I will not get into it here. I just wanted you to know it exists and to watch for the future developments of this language.

A good reference for learning how to do server-side scripting in order to display XML data is Professional Active Server Pages 3.0 by Wrox Publishing, ISBN 1-861002-61-0. This book includes a great case study on building an XML-based newspaper. This example uses Microsoft's DOM to traverse the XML file and grab the appropriate information.

Converting Data Stores to XML

One of the easiest ways to convert data stores to XML is to use ADO. To use ADO, you simply open the record set and use the following command:

Rs.save "C:\somewhere\data.xml", adPersistXML

This will save the data in XML format, as shown below:

<z:row fname="Joshua" lname="Parkin" address1="123 Somewhere Lane" address2="Somewhere Land" phone="(321)987-4321" /<<<z:row fname="Greg" lname="Pearson" address1="321 Elsewhere Lane" address2="Elsewhere Land" phone="(555)976-3279" /<

This layout is slightly different than what we are used to. ADO stores the entire field as tag elements, and each record is a different tag. Is this wrong? No, but it's something you need to be aware of.

Since the Novell ODBC driver for NDS does not support ADO, we cannot really implement the ODBC driver into Web pages. Therefore, we need to convert the data we want into an XML file. We obviously cannot use the above method of data conversion, due to fact that driver doesn't work with ADO. Part 4 is going to show you how to do exactly that. It's going to go into the realm of throughputting ODBC and DAO to XML (prepare to enter acronym heaven).

Conclusion

There you have it! You should now have a fairly good understanding of what XML is and how to build well-formed XML documents. Always remember that you must have a well-formed XML document, or else parsing errors will abound.

Remember, too, that namespaces help ensure that if someone else parses your XML file into theirs, you will not have to worry about potential conflicts with your tag names. Namespaces can be anything at all, from URLs to the names of favorite foods. This should give you plenty of options.

In Part 4 of this series, we will explain how to use the ODBC driver to retrieve data from the tree and store it in a well-formed XML file. This data can then be passed to various parsers for report building, conversion, or whatever you want.

* Originally published in Novell AppNotes

Disclaimer

The origin of this information may be internal or external to Novell. While Novell makes all reasonable efforts to verify this information, Novell does not make explicit or implied claims to its validity.