Novell is now a part of Micro Focus

An Object-Oriented Approach to Automated Information Mark-Up

Articles and Tips: article

AL YOUNG
Senior Research Engineer
Developer Information

JAY M. JOHNSON
Software Engineer
Electronic Support

01 Dec 1996


Addresses information mark-up for simultaneous distribution in multiple formats. Third article in a series focusing on object-oriented design and development.

Introduction

Editor's note: This article is the third in a series focusing on object oriented design and development. DeveloperNet members faced with publishing information simultaneously in different formats (e.g., distribution on the Web, CD ROM, hard copy, and other media) may find this article particularly useful. Code samples are provided in Java and Smalltalk; however, the focus of the discussion can apply to any OO implementation.

Information mark-up for simultaneous distribution in multiple formats is a problem over which information managers either go gray or get scalped. The leviathan of marking up information in multiple formats appears in numerous forms; nevertheless, the list of constraints in Figure 1 is addressed in this article as a way of exemplifying both the problem as well as a possible solution.

Figure 1: Constraints governing information mark-up by a hypothetical information management system.


Multiple formats a. Various mark-uplanguages b. Various elements 1. Rearrangement 2. Inclusion/Omission 3. Content presentation (including hyper links)

Mark-up must not be embedded in stored data

The first constraint (multiple formats) means that a single data set must be marked up in more than one mark-up language. Furthermore, regardless of the mark-up language used to publish a document, documents may vary in terms of the order in which document contents are presented, the inclusion or omission of content elements from one format to another, and the manner in which each element's content is presented.

The second constraint (mark-up must not be embedded in stored data) represents an attempt to insulate data sets from technological upheaval. Mergers can cause worlds to collide such that enormous data sets must be ported all over the planet. The mandate of a newly appointed executive can bring gigabytes to the brink of a cauldron from which harried practitioners are supposed to distill meaningful information that answers customer questions before they arise. And then there's the problem of change for the sake of change.

Each of these circumstances constitutes a very good reason for keeping data as far from proprietary formats, or dependance upon particular tools, as possible.

Figure 2 presents a hypothetical context in which the implementation of an information management system must comply with the foregoing list of constraints.

Figure 2: Hypothetical publishing environment.

Several assumptions and constraints delimit the discussion presented in this article:

  • Data integrity is assumed, based on functionality provided by, and the use made of, data management application(s) represented in Figure 2. For example, the relational schema in which data is stored in the database may require a relationship between Information and Information_Distribution_Classification (see Young 1). If such a relationship does not exist for a particular information item, a query of the database may fail to return the particular item. These kinds of data-integrity issues are resolved by data management applications.

  • Actual distribution of marked-up information is outside the scope of this article (i.e., transmission of information from point A to point B).

  • The hypothetical information system delegates to a memory-resident instance of Distribution the responsibility for querying the database for information qualifying for distribution to a particular destination. Each distribution object holds onto several formats in which qualified information can be marked-up. This article does not describe how a distribution object queries the database, nor does the article discuss how a distribution object selects the format(s) to be used to mark-up qualified information.

  • Logging and reporting of distribution activities are outside the scope of this article.

In Figure 2, data from the relational database must be available for output to or interaction with seven applications or destinations. For example, between the web browser and the database appearing in Figure 2, a format might be specified in HTML. Between the database and the hard-copy journal another format might be specified in SGML, or some other mark-up language.

The format between the database and the CD ROM might require that data be marked-up in a language suited to the search engine distributed with the CD ROM. No format might be necessary between the database and the diagnostic application or between the database and the data-management and publication application(s) because these applications could interact directly with the database, perhaps via SQL.

To illustrate the compliance of the proposed design with the constraints listed in Figure 1, let's examine in some detail the format for Web distribution as opposed to the format for information published in a monthly hard-copy journal. Figure 3 presents part of the format in which information is supposed to appear in a Web browser.

Figure 3: Format for HTML output.

Problem number: 12345

Title:      Network connection drops during attempt to print

Author:     Salminen

Author email:   salminen@oy.com

Updated:    December 17, 1996

Classification: Public

Products:   Pretty Good Application 1.0

        Really Good Application 2.2

...

Figure 4 presents the format of the same information as it must appear in a monthly journal.

Figure 4: Format for SGML output.

Title:  Network connection drops during attempt to print

Document:  12345

Author:  Kalevi Salminen

Updated:  17Dec96

Abstract:  Describes solution for dropped network connection during attempt to print from Pretty Good Application 1.0.

Products:  Pretty Good Application 1.0, Really Good Application 2.2

...

The following differences between the formats in Figures 2 and 3 must be accommodated. Each difference is labeled with the design constraint (appearing in Figure 1) to which each difference pertains.

The following differences between the formats in Figures 2 and 3 must be accommodated. Each difference is labeled with the design constraint (appearing in Figure 1) to which each difference pertains.

  1. (Rearrangement) The sequence of headings differs from one format to the other.

  2. (Inclusion/Omission) Author e-mail and classification headings appear in the format in Figure 3, but not in the format in Figure 4.

  3. (Content presentation) The formats vary in that the products listed in the Web browser format are hot links to a products browser. In the monthly journal, the list of products is of course just text.

  4. (Rearrangement) In one format, the text following each heading is tab-justified. In the other format, the text is separated from its heading only by two spaces.

  5. (Rearrangement) The name of the author in one format includes only the last name. First and last names are included in the other format.

  6. (Rearrangement) The date format differs from one output to the other.

To accommodate the kinds of format differences listed, we begin by proposing the object structure presented in Figure 5. This figure illustrates a relationship among three objects: Distribution, Format, and Element. The relationship illustrated in Figure 5 is one of containment, not inheritance [See Booch1, p. 102].

Figure 5: Containment relationship among instances of Distribution, Format, and Element.

The purpose of Distribution is to represent, or model, the destination to which information is distributed. For example, the Web browser (appearing in Figure 2) is represented in memory by an instance of Distribution. Similarly, the monthly journal (also represented in Figure 2) is represented in memory as yet another instance of Distribution.

Instances of Distribution can be used to model any characteristics of a destination pertinent to the objectives of our hypothetical information management system. Each instance of Distribution could, for example, query the database for information to be distributed to a particular destination. A Distribution object querying the database for information to be published on the Web might accept only information items whose classification were PUBLIC. Information classified as PROPRIETARY, PARTNERSHIP, etc., might be disqualified. Such functionality is not described in this article; however, the need to have such behavior specifically for a Distribution is a major reason for proposing that the information system contain a memory-resident object representing each destination.

We seek a design that insulates information and the tools and activities of our information management system from changes in any particular distribution. We want to be able to add, modify, or remove a distribution quickly, and with minimal impact on the rest of the system. For example, if a Distribution goes away, we want its removal from the system to constitute the only thing we need to do to update the system. (If a person chops down a tree in a forest, he probably wants his effort confined exclusively to the tree to be removed. He does not want to have to somehow update all the other trees in order to reconstitute the forest, nor does he want to have to update all the other objects in the world that interact with the forest. Similarly, it should not be necessary to re-initialize the world every time a tree sprouts from the forest floor.)

By providing a Distribution object, we put destination-specific knowledge and behavior inside instances of Distribution. As long as we ensure that every other object in the system associates with instances of Distribution according to a common interface, we can add and delete instances without changing the rest of the system. A truly resilient design focuses as much attention on object interaction as it does on the objects themselves [SHAW, pp. 160-165].

In terms of the focus of this article, an instance of Distribution holds on to a collection of format objects. Each instance of Format prescribes the manner in which information items must be marked up. Any Distribution can have one or more formats (e.g., bulletin, white paper, release notice, etc.) in which information is available. The proposed design implies that once an instance of Distribution queries the database for relevant information, the Distribution instance iterates across its collection of formats, selects the format(s) appropriate for the information, asks each appropriate format to return a string version of the information, then distributes the formatted string. This article focuses only on a format object's preparation of the string version of its associated information.

Each instance of Format holds on to a collection of element objects. Each instance of Element knows how to extract data from an information object (or how to derive an element's content) and mark it up. Some content, such as an information item's creation date, may be an attribute value belonging to the information item. The content of other elements, however, may be calculated based on characteristics of an information item.

Element and Format classes exist primarily to insulate Information from knowledge and behavior specific to each instance of Distribution. This insulation builds into the information management system a considerable degree of flexibility in dealing with various output formats. For example, the fact that the structure, content, and mark-up of a document is modeled by Format and Element, instead of Information and associated classes, means that changes in format, as well as the addition and deletion of formats, may have little or no affect on the system's storage of information. This increases system resilience over the long term and provides for more rapid assimilation of change throughout the life of the system.

This aspect of system design (i.e., sufficient modularity to insulate subsystems from what should be localized changes) is an important consideration. It deserves more of a discussion than this article allows; nevertheless, Figure 6 illustrates alternatives for dealing with the problem.

Factors affecting use of the storage alternatives represented in Figure 6 are legion. Optimal use of these alternatives is usually a function of numerous considerations. For example, storing values in a database may be ideal; however, if the climate in which the development effort exists requires more bureaucratic maneuvering to modify the schema than the development schedule allows, the optimal solution may be recourse to environment files. Security might impose constraints making application code or off-line storage of certain items optimal. In any case, the goal is to insulate each of the items represented in Figure 6, and to insulate the objects into which each of those items can be decomposed, from the affects of changes in unrelated aspects of system use.

Figure 6: Storing application-related intelligence.

We now move from the containment and functional views of Distribution, Format, and Element, to relational tables and object hierarchy. Figure 7 presents a relational schema containing the tables and columns needed to support the functionality described in this article. Other columns would of course be needed to support additional functionality.

Figure 7: Relational schema for mark-up subsystem.

In Figure 8, a class diagram presents the inheritance of instance variables for the object hierarchy associated with the relational tables in Figure 7.

Figure 8: Class diagram for memory-resident, mark-up subsystem.

The creator, creationDate, memo, description, and name attributes have the same purpose relative to a format element as these attributes have relative to other objects having similarly named attributes. For example, description can be used to hold on to a brief statement of the purpose of a particular element. A memo can contain operational or procedural commentary of interest to information managers. An element's creator is the user who enters the element into the system, not the user who uses an element in constituting a format, or document. An element's description, if any is needed, is an explanation of the purpose of the format element generally (e.g., the nature or reason for the existence of the element containing a document's title, or the element containing a document's creation date, etc.).

Figure 9 shows the conceptual relationship between a document (represented in the object model by Format) and the sections in a document (represented within the object model by Element).

Figure 9: Conceptual relationship between document and Format, and between a document section and Element.

Figure 9 is intended to suggest that each section in a document is represented by an instance of Element, whose attributes provide the format and content for a section. The label attribute, for example, contains the text of a section title. The anteLabelMarkup attribute contains tags appearing before the text of the label in the mark-up of a section, and postLabelMarkup contains tags appearing immediately after the label text. The anteContentMarkup and postContentMarkup variables contain tags appearing before and after text belonging to a section.

Figure 10 provides a detailed example of HTML mark-up for a section providing the title for a document.

Figure 10: Detailed example of an element.

Three attributes of an element object provide values needed to process an element. Other such attributes may be necessary, depending upon policies and procedures of the organization implementing this kind of functionality. For the purpose of this discussion, however, these three attributes suffice:

  • lineWidth

  • excessTreatment

  • ifEmpty

The lineWidth attribute can be used to reflect the margins within which an element is presented. This hypothetical implementation assumes that a character count is contained in an element's lineWidth attribute (e.g., 60, for a 60-character line).

The excessTreatment attribute is used to tell the mark-up application (i.e., the application that actually formats an element) what to do if an element's content exceeds the limit specified by lineWidth. For example, some formats might want an element's content wrapped if the character count of the content exceeds the lineWidth value. Other formats might want content truncated. By storing the value WRAPor the value TRUNCATE in the excessTreatment attribute, a format can deal appropriately with each element's content, even if that treatment varies from one element to another within the same format.

Finally, the ifEmpty attribute holds on to a value that a mark-up application can use to determine what to do if an element's content is empty. For example, one format may require that each of its documents have content in a "Solutions" section. Another format may not be as stringent, or may omit the field altogether.

Issues of data integrity can be separate from constraints imposed by a document format. The relational storage of the data must be as loosely coupled to format constraints as possible. Thus, the ifEmpty attribute provides information needed by the mark-up application because a particular format may have requirements that should not be reflected in the relational schema. For example, one format may depend upon an information item's abstract for key word searching and, therefore, must have text in each item's abstract. Another format, however, may omit abstract altogether. For a format that depends upon the content of abstract, the ifEmpty attribute of the format element for abstract might be assigned the value REJECT. This value would indicate to the mark-up application that formatting cannot be completed.

Mark-up Functionality

Having described the relational schema and memory-resident object hierarchies needed to support the kind of information mark-up consistent with the constraints governing our hypothetical information-management system, we now move on to a description of the behavior provided by memory-resident objects. Figure 11 is a Sequence Diagram of the kind described in the Unified Modeling Language (UML) version 0.9 [Booch2, pp. 13-14].

Figure 11: Sequence Diagram of information mark-up -- with only a summary view of asStreamOn() behavior of anElement.

In Figure 11, a Distribution is an instance of Distribution. For each destination in Figure 2 (e.g., Web browser, hard-copy journal, etc.), an instance of Distribution is created to provide functionality specific to each destination. The behavior described in Figure 11 is initiated by sending distribute(anInfo) to an instance of Distribution. The long vertical rectangle beneath aDistribution is a graphic representation of the distribute() method.

Within distribute(), the first thing that happens is a request for Diagnostic to return an instance of itself, based on anInfo. Diagnostic is the class of that name mentioned in the first article in this series. Then, qualifyFormatsOn(aDiagnostic) is sent to aDistribution. This method iterates across the collection of formats belonging to aDistribution, and selects the format(s) appropriate for anInfo. This selection process is outside the scope of this article, which focuses on the mark-up of information selected for distribution. It is, nevertheless, worth noting that any selection criteria determining whether an information item is suitable for distribution to a particular destination may be handled in the behavior provided by qualifyFormatsOn().

The result of qualifyFormatsOn() is that the format(s) to be used to mark-up an information item is (are) assigned to the qualifiedFormats attribute of aDistribution. This assignment occurs in qualifyFormatsOn().

Code Listing 1 presents Java as well as Smalltalk implementations of the distribute() functionality associated with a Distribution.

Note: Strings, streams, and collections are handled similarly in Java and Smalltalk, but due to the maturity of the Smalltalk class hierarchy, manySmalltalk components are handled on a higher level than is currenly available in the standard Java class hierarchy. The Java listings in this article assume that methods and classes equivalent to those used in Smalltalk have been defined in appropriate Java packages. Such methods and classes will be italicized and underlined in the code.

Code Listing 1: The distribute() behavior of aDistribution.

Sample A: Java implementation



   public diagnostic distribute (anInformationObject)

   {

      Diagnostic aDiagnostic = newOn (anInformationObject);

      qualifyFormatsOn ( aDiagnostic);

      qualifiedFormatsDo ( qualfiedFormats );

      resetQualifiedFormats();

      return ( aDiagnostic );

   }



Sample B: Smalltalk implementation



   distribute: anInformationObject



      | aDiagnostic |

      aDiagnostic := Diagnostic

      newOn: anInformationObject.

      self qualifyFormatsOn: aDiagnostic.

      self qualifiedFormats do: [ :aFormat |

         aFormat distributeOn: aDiagnostic.

      ].

      self resetQualifiedFormats.

      ^aDiagnostic

The do-block, in distribute() that iterates across the collection of qualified formats, sends distributeOn: aDiagnostic to each qualified format. As indicated in Figure 11, aFormat first asks aDiagnostic to reset its write streams. One of the reasons for passing an instance of Diagnostic back and forth throughout the mark-up behavior is to use the diagnostic object's markupStream in order to build the marked-up document. (Using WriteStream instead of String, at least in a Smalltalk implementation, improves performance.) Each diagnostic object has two streams, the markupStream, to which each element object adds its part of the marked-up document, and temporaryStream, used throughout these methods for the purpose of preparing material to be added to the markupStream. The request that aDiagnostic resetStreams means that the index into both streams is reset to the beginning of the stream. This ensures that nothing done by a particular format will appear with the result of a subsequent format's use of the write streams.

Once the write streams belonging to aDiagnostic have been reset, aFormat sends asStreamOn(aDiagnostic) to itself. The asStreamOn() method iterates across the format's collection of elements, and sends asStreamOn() to each instance of element known to aFormat. Code Listing 2 presents implementations for each format object's distributeOn()behavior.

Code Listing 2: The distributeOn() behavior of aFormat

Sample A: Java implementation



public diagnostic distributeOn ( aDiagnostic )



                    asStreamOn aDiagnostic resetStreams;

   If ( aDiagnostic.status )

   {

      aDiagnostic.addDistributedFormat (this);

      markedUpString ( aDiagnostic.markupStream.contents);

   };

   return ( aDiagnostic );



Sample B: Smalltalk implementation



   distributeOn: aDiagnostic



   self asStreamOn: aDiagnostic resetStreams.

   aDiagnostic status

   ifTrue: [

      aDiagnostic addDistributedFormat:self.

      self markedUpString: aDiagnostic markupStream contents.

   ].

   ^aDiagnostic

The asStreamOn() method belonging to each instance of Format, iterates across the format object's collection of elements. Code Listing 3 presents implementations of this behavior.

Code Listing 3: The asStreamOn() behavior of aFormat.

Sample A: Java implementation



   Public Diagnostic asStreamOn ( aDiagnostic )



   int anElement;



   for (anElement=0; elements.size; anElement++) 

   {

      if (aDiagnostic.status)

      {

         elements [anElement].asStreamOn ( aDiagnostic );

      }

      else

         return ( aDiagnostic );

   };

   return ( aDiagnostic );



Sample B: Smalltalk implementation



   asStreamOn: aDiagnostic



   self elements do: [ :anElement |

      aDiagnostic status

      ifTrue: [

         anElement asStreamOn: aDiagnostic.

      ]

      ifFalse: [

         ^aDiagnostic

      ].

   ].



   ^aDiagnostic

Within the block that iterates across the element collection, the status of aDiagnostic is checked before each element is processed so that if the status is false, processing can halt. Note that inasmuch as these methods all check the status of aDiagnostic, formatting is stopped as soon as a problem is encountered and the diagnostic (containing an explanation of the problem) is passed back along the sequence so that the problem can be reported.

At the end of the sequence, illustrated in Figure 11, each instance of Element invokes its asStreamOn() behavior. Code Listing 4 presents sample implementations of that behavior.

Code Listing 4:The asStreamOn() behavior of anElement.

Sample A: Java implementation



Public Diagnostic asStreamOn ( aDiagnostic )



   contentOn: aDiagnostic.

   if (aDiagnostic.temporaryStream.contents.isEmpty)

   {

      if ( ifEmpty == #reject )

      {

         return (aDiagnostic);

      }

      else

      {

         String.aContent = String new();

         aDiagnostic.status (true);

      };

   }

   else

      aContent := aDiagnostic.temporaryStream.contents();

   

   aDiagnostic.markupStream.nextPutAll (anteLabelMarkup);

   aDiagnostic.markupStream.nextPutAll (label);

   aDiagnostic.markupStream.nextPutAll (postLabelMarkup);

   aDiagnostic.markupStream.nextPutAll (anteContentMarkup);

   aDiagnostic.markupStream.nextPutAll (copyWithTreatments (aContent));

   aDiagnostic.markupStream.nextPutAll (postContentMarkup);

   return (aDiagnostic);



Sample B: Smalltalk implementation



   asStreamOn: aDiagnostic



   | aContent |

   self contentOn: aDiagnostic.aDiagnostic temporaryStream

   contents isEmpty

   ifTrue: [

      self ifEmpty = #reject

      ifTrue: [

         ^aDiagnostic

      ]

      ifFalse: [

         aContent := String new.aDiagnostic status: true.

      ].

   ]

   ifFalse: [

      aContent := aDiagnostic

      temporaryStream contents.

   ].

   aDiagnostic markupStream

   nextPutAll: self anteLabelMarkup;

   nextPutAll: self label;

   nextPutAll: self postLabelMarkup;

   nextPutAll: self anteContentMarkup;

   nextPutAll: (self copyWithTreatments: aContent);

   nextPutAll: self postContentMarkup.

   ^aDiagnostic

The behavior presented in Code Listing 4 is generic for all instances of Element. Of course a subclass of Element could be implemented for overwriting this and other charcteristics of an element object; however, the degree to which a single Element class, or object, applies to all format elements is a primary factor in determining the ease with which the proposed implementation can be maintained. The generic behavior presented in Code Listing 4:

  1. Retrieves the element's content from the information object attached to aDiagnostic. This occurs in the contentOn() behavior, described later.

  2. Determines whether the temporaryStream of aDiagnostic is empty. If temporaryStream is empty, the method determines what to do, based on the element's ifEmpty attribute. Note that when temporaryStreamis empty, and ifEmptyis ACCEPT, the status of aDiagnostic is set toTRUE; otherwise, formatting ceases the next time the status of aDiagnostic is checked.

  3. Updates the markupStream of aDiagnostic with mark-up and information content.

  4. Returns aDiagnostic.

Behavior in an element object's asStreamOn() method is the same for every element. The behavior in an element object's contentOn() method must vary from one element to another (e.g., getting the date from an information object requires use of the #creationDate selector, getting an author's name- and formatting it appropriately- requires accessing data that is part of a user object associated with the information item).

If the behavior for retrieving information content is embedded in the image (in the case of Smalltalk, for example) not only must the image be updated every time a format changes, but Element must have a subclass for each variation in content-retrieval behavior. To obviate this code-maintenance headache, we propose that a content-retrieval block (to use Smalltalk terminology) be implemented to provide element-specific behavior without having to proliferate subclasses of Element.

The Smalltalk implementation in Code Listing 4 also requires a minor modification to the base functionality provided in the nextPutAll: method belonging to Stream. That modification appears in Code Listing 5.

Code Listing 5: Modification of nextPutAll: method on instance-side of Stream (in Smalltalk implementation of proposed functionality).

nextPutAll: aCollection



aCollection isSequenceable

ifTrue: [

   aCollection isEmpty

   ifFalse: [

      self next: aCollection

      size putAll: aCollection startingAt: 1.

   ].

]

ifFalse: [

   aCollection do: [ :v |

      self nextPut: v

   ].

].

^aCollection

In Code Listing 4, the method copyWithTreatments: is used to format aContent before aContent is added to the markupStream of aDiagnostic. Code Listing 6 presents the behavior provided by the copyWithTreatments: method.

Code Listing 6: The copyWithTreatments: behavior of anElement.

Sample A: Java implementation



public String copyWithTreatments ( String aContentString )



   if ( aContentString.size >  lineWidth )

   {

      If (excessTreatment == TRUNCATE )

      {

         aContent = aContentString.copyFrom ( 1, self lineWidth );

      }

      else

      {

         aContent = aContentString.copy.wrapWithinLineWidthOf ( lineWidth );

      };

   }

   else

   {

      aContent = aContentString.copy();

   };

   return ( aContent );



Sample B: Smalltalk implementation



   copyWithTreatments: aContentString



   | aContent |

   aContentString size > self lineWidth

   ifTrue: [

      self excessTreatment = #truncate

      ifTrue: [

         aContent := aContentString copyFrom: 1 to: self lineWidth.

      ]

      ifFalse: [

         aContent := aContentString copy wrapWithinLineWidthOf: self lineWidth.

      ].

   ]

   ifFalse: [

      aContent := aContentString copy.

   ].

   ^aContent

Depending upon whether an element object's content is supposed to be truncated or wrapped when the content text exceeds an element's lineWidth value, copyWithTreatments: either truncates or wraps an element's content text. The copyFrom:to: method (in a Smalltalk implementation) is part of the basic functionality of an image; however, the wrapWithinLineWidthOf: must be added to an image. Code Listing 7 presents that added functionality (not explained in this article, because it is outside the scope of this discussion).

Code Listing 7: The wrapWithinLineWidth: functionality that must be added to the instance-side of String.

wrapWithinLineWidthOf:aNumber

   | aWS aRS |

   aNumber < 1

   ifTrue: [

      ^self

   ].

   aWS := (self copyEmpty: self size) writeStream.

   aRS := self readStream.[aRS atEnd]

   whileFalse: [ | position1

      aLineLength position2 |

      position1 := aRS position.

      aRS skipThrough: Character

      cr.(aLineLength := aRS position ­ position1) <= aNumber

      ifTrue: [

         aRS skip: aLineLength negated.

         aWS nextPutAll: (aRS through: Character cr)

      ]

      ifFalse: [

         aRS skip: aLineLength negated.

         aRS skip: (aNumber min: (aRS size ­ aRS position)).

         [aRS position > position1 and: [aRS peek isSeparator not]]

         whileTrue: [

            aRS skip: ­1

         ].

         position2 := aRS position.

         aRS skip: (position2­position1)

         negated.position2 <= position1

         ifTrue: [

            position2 := position1 + aNumber

         ].

         aWS

         nextPutAll: (aRS nextAvailable:position2­position1);

         cr.

         [aRS position < self size and: [aRS peek isSeparator]]

         whileTrue: [

            aRS skip: 1

         ].

      ].

   ].

   ^aWS contents.

To explain the proposed implementation, wherein an element's content-retrieval behavior is stored in the text of a block in the database, we begin by examining a version of the behavior provided in each element's contentOn()method. (See Code Listing 8.)

Code Listing 8: A version of the contentOn() behavior of anElement.

Sample A: Java implementation



Public void contentOn ( Diagnostic aDiagnostic )



contentBlock.value( aDiagnostic);



Sample B: Smalltalk implementation

contentOn: aDiagnostic



self contentBlock value: aDiagnostic

In contentOn(), the value: message is sent to an element's contentBlock. The contentBlock is assigned to an element as part of each element's instantiation. The text of the block is derived from the contentBehaviorattribute of the Element table, from which the memory-resident instance of Element is constituted. Code Listing 9 shows the behavior provided in the contentBlock: method common to each instance of Element.

Code Listing 9: Converting a string to a block, in the contentBlock: method.

Sample A: Java implementation

   contentBlock ( String aString )



   contentBlock = Compiler.evaluate ( aString );



Sample B: Smalltalk implementation

   contentBlock: aString



   contentBlock := Compiler evaluate: aString

In the proposed implementation, each block is written with the assumption that aDiagnostic will be passed to the block via the value: message (for a Smalltalk implementation), and that aDiagnostic will have a copy of the information item from which an element's content must be derived. The information object is contained in the managedObject attribute of aDiagnostic. With this in mind, Code Listing 10 presents a version of the text of several blocks, one for retrieving an information item's identification number, another for retrieving the title of an informatio item. These blocks are presented in the same code listing to illustrate the variation in content-retrieval behavior supported by the proposed implementation.

Code Listing 10: Content-retrieval blocks for id and author.

Sample A: Java implementation



Text of contentBlock for creationDate retrieval:

{

   aDiagnostic.temporaryStream.reset;

   aContent = aDiagnostic.managedObject.id;

   if (aContent.respondsTo (printString))

   {

      aDiagnostic.temporaryStream.nextPutAll ( aDiagnostic.managedObject.id.printString );

   }

   else

   {

      aDiagnostic.fail ( "Information object has inappropriate id, or is missing id");

   };

}

Text of contentBlock for product name retrieval and mark-up:



{ aDiagnostic.temporaryStream.reset;

   for (aProduct=0; aProduct<aDiagnostic.managedObject.products.size(); aProduct++)<
   {

      aContent = aProduct.name()

      if (aContent.isNil())

      {

         aDiagnostic.fail( "A product object has NULL name");

      }

      else

      {

         if (aContent.isEmpty())

         {

            aDiagnostic fail ( "A product object has name equivalent to empty string");

         }

         else

         {

            aContent = aContent.dropLeadingAndTrailingCharacters

(leadingAndTrailingCharacterArray);

            if (aContent.isEmpty)

            {

               aDiagnostic fail( "A product object has name consisting exclusively of unwanted

characters");

            }

            else

            {

               aDiagnostic.temporaryStream.nextPut (>\t=);

               aDiagnostic.temporaryStream.nextPutAll ("<A<
HREF=\"/novell/mmedia/prods.htm\" >");/* check on this...*/

               aDiagnostic.temporaryStream.nextPutAll ( aProduct name() );

               aDiagnostic.temporaryStream.nextPutAll ("</A<";<
aDiagnostic.temporaryStream.nextPut (>\n=);

            };

         };

      };

   };

};



Sample B: Smalltalk implementation



Text of contentBlock for creationDate retrieval:



[ :aDiagnostic | | aContent |

   aDiagnostic temporaryStream reset.

   aContent := aDiagnostic managedObject id.

   (aContent respondsTo: #printString)

   ifTrue: [

      aDiagnostic temporaryStream nextPutAll: aDiagnostic managedObject id printString.

   ]

   IfFalse: [

      aDiagnostic fail: >Information object has inappropriate id, or is missing

id=



].

]

Text of contentBlock for information object's name retrieval:



   [ :aDiagnostic | | aContent |

      aDiagnostic temporaryStream reset.

      aContent := aDiagnostic managedObject name.

      aContent isNil

      ifTrue: [

         aDiagnostic fail: >Information object has no name=.

      ]

      ifFalse: [

         aContent isEmpty

         ifTrue: [

            aDiagnostic fail: >Information object has name equivalent to an empty

string=.

         ]

         ifFalse: [

            aContent := aContent dropLeadingAndTrailingCharacters: self class

               leadingAndTrailingCharacterArray.

            aContent isEmpty

         ifTrue: [

            aDiagnostic fail: >Information object has name consisting exclusively of

unwanted characters=

         ]

         ifFalse: [

            aDiagnostic temporaryStream nextPutAll: aContent.

         ].

      ].

   ].

]

In the first of the Smalltalk content-retrieval blocks in Code Listing 10, the id message is sent to an information object. The result is assigned to the aContent temporary variable. A test is then performed to determine whether aContent responds to the message printString. This test is performed to ensure that an attempt to add to the temporaryStream of aDiagnostic does not produce an error message. For example, if the id of an information object is nil, an attempt to send the printString message to nil results in an error that interrupts the application's processing.

In the second Smalltalk content-retrieval block in Code Listing 10, three tests are performed on aContent. The third test involves functionality that must be added to a Smalltalk image. The dropLeadingAndTrailingCharacters: method must be added to the instance-side of String. Also, the class-side of Element must respond to the message leadingAndTrailingCharacterArray. This added functionality is described in [Young1].

Hyper Links and Embedded Mark-up

Fundamentally, our approach to hyper link mark-up assumes that such mark-up belongs more to the distribution of information than to the warehousing of it. As stated at the outset of this article, mark-up should not be interwoven with data storage. The proposed implementation assumes that hyper links, as well as mark-up that would otherwise amount to tags embedded in the text fields of a database table, can make data (and those who manage it) vulnerable to the storms of change in today's enterprise.

For this reason, we use database schema and associated object hierarchies to model generic information consumption, as opposed to modeling the tools used in its consumption. The second article in this series (Young 2)presented a generic model of diagnosis. That model applies to any tool set used in the diagnostic process. The information needed at various junctures in that process can be supplied via paper (as a technology), via online helps, or via some other technology. Database schema and object hierarchies, while they model information consumption, should model it only generically. They are thus insulated, at least to some degree, from changes in tool sets involved in particular diagnostic processes.

Just as abstract super classes (in terms of an object-oriented approach to design and development) can be used to insulate code from details with which concrete subclasses must grapple, so the process of modeling information should address an abstract model of consumption, and implement schema and hierarchies according to that model in order to insulate the implementation from changing technologies.

In the proposed implementation, we use an element's contentBlock to handle hyper links as well as mark-up that is embedded within the text of a document element. This means that application maintenance focuses primarily on the addition and deletion of formats and elements in the database, and the creation and modification of contentBlock code stored in the Element table. In Figure 3, each product listed serves as a hot link to a product browser. Code Listing 11 presents a contentBlock that marks up each product in the list as such a link.

Code Listing 11: Content-retrieval blocks that mark-up each product name as a hot link.

Sample A: Java implementation



Text of contentBlock for product name retrieval and mark-up:



{  aDiagnostic.temporaryStream.reset.

   If (i = 0; i <= aDiagnostic.managedObject.products.size(); i++) 

   {

      aContent := aProduct name.

      aContent isNil

      ifTrue: {

         aDiagnostic fail: "Product object has nil name".

      }

      else {

         if (aContent.isEmpty())

         {

            aDiagnostic.fail ( "A product object has name equivalent to empty string")

         }

         else {

            aContent = aContent.dropLeadingAndTrailingCharacters(

leadingAndTrailingCharacterArray)

            if (aContent.isEmpty())

            {

               aDiagnostic.fail ( "A product object has name consisting exclusively of unwanted

characters")

            }

            else

            { 

               aDiagnostic.temporaryStream.nextPut (>\t=);

               aDiagnostic.temporaryStream.nextPutAll ("<A<
HREF=\"/novell/mmedia/prods.htm\">");

               aDiagnostic.temporaryStream.nextPutAll ( aProduct name() );

               aDiagnostic.temporaryStream.nextPutAll ("</A<"; <
               aDiagnostic.temporaryStream.nextPut(>\n=);

            };

         };

      };

   };

};



Sample B: Smalltalk implementation



Text of contentBlock for product name retrieval and mark-up:



[ :aDiagnostic | aDiagnostic

   temporaryStream reset.aDiagnostic managedObject

   products do: [ :aProduct | | aContent |

   aContent := aProduct name.

   aContent isNil

   ifTrue: [

      aDiagnostic fail: >Product object has nil name=.

   ]

   ifFalse: [

      aContent isEmpty

      ifTrue: [

         aDiagnostic fail: >A product object has name equivalent to empty

string=.

      ]

      ifFalse: [

         aContent := aContent dropLeadingAndTrailingCharacters: self class

             leadingAndTrailingCharacterArray.

         aContent isEmpty

         ifTrue: [

            aDiagnostic fail: >A product object has name consisting exclusively of

unwanted characters=.

         ]

         ifFalse: [

            aDiagnostic temporaryStream

            tab;

            nextPutAll: ''<A HREF=''/novell/mmedia/prods.htm'' <'';<
            nextPutAll: aProduct name;

            nextPutAll: ''</A<'';<
            cr.

         ].

      ].

   ].

].

]

Note that in the Smalltalk implementation in Code Listing 11, the HTML mark-up is surrounded with two single quotes (e.g., >=>>=>). The text of the block in Code Listing 11 is stored in the database as a string. Each occurrence of two single quotes (in text stored in the database) ensures that the Smalltalk image parses the string so that each nextPutAll: message is followed by a string wrapped in single quotes (e.g., nextPutAll: >>>=).

Bibliography


Booch1

Booch, Grady. Object-Oriented Analysis and Design withApplications.The Benjamin/Cummings Publishing Company, Inc., 1994. Prentice-Hall,Inc., 1996.

Booch2

Booch, Grady, et al. The Unified ModelingLanguagefor Object-Oriented Development: Version 0.9 Addendum. Rational Software Corporation, 1996.

Shaw

Shaw, Mary and Garlan, David. Software Architecture:Perspectiveson an Emerging Discipline.

Young1

Young, Al, et al. Adding Functionality to Strings.Object Watch Web Site(in press).

Young2

Young, Al and Johnson, Jay M. An Object-oriented Approach toModeling Information Content. Novell Developer Notes, October 1996.

* Originally published in Novell AppNotes


Disclaimer

The origin of this information may be internal or external to Novell. While Novell makes all reasonable efforts to verify this information, Novell does not make explicit or implied claims to its validity.

© Copyright Micro Focus or one of its affiliates