The application of an XML Parser

X ML become increasingly prevalent. Oil like it or not, if the software engineer, before after what you have to program in XML. If programming in VB6 you can use the Document Object Model (DOM), Simple API for XML (SAX) from Microsoft to help you in the work of parsing (analysis, placement) of the XML files.

Read a raw XML DOM and parse it into a file Tree class in memory, that is a parent node of the Document has child nodes represent the comments, tags, directives and text (known as XML entities ).

SAX while reading an XML file and parse will generate during the Events, said it encountered when the XML entities. SAX does not create a Tree, so the applications will depend on how we handle the SAX Events from. Of course small SAX and DOM much easier.

To not depend entirely on the XML parser of others and to help you with practical ideas on how to do an XML Parser, In this paper we will implement a simple XML Parser (Simple XML Parser - SXMLParser) entirely in VB6 and apply it a practical way to model. SXMLParser small but similar characteristics such as SAX and of course you can freely modify, add custom features.

The immediate application is doing pretty (Pretty) XML code, XML content for more colors when displayed in a WebBrowser as in the image below:

and create a TreeView represents DOM:

Get source code of your XML parser will dominate over others when designing or deploying network program.
Before discussing this program is to review the basic rules of a well-formed XML.

Well-Formed XML

Although you can set how many tags as well, but every page as XML need some rules to be considered well-formed (with head, tail).
If a page is not well-formed XML shall be deemed not to be spent, without any treatment program will be working with data inside of it. Therefore a need to XML according to the following rules:
  1. XML must start by saying XML declaration (XML Declaration). This point we can ignore it.
  2. Each component, called the " element "must be between a Pair Tag .
  3. If the Tag does not contain anything in between must be terminated by "/>", eg <br/> or <HR/>.
  4. An XML page must have a unique element contains all other elements. That is the root of the tree represent the XML page.
  5. Pair these tags are not interleaved (ie John Stanmore <name> <address> 25 King Street </ name> </ address> is invalid because <address> Tag Pair in the name).
and a few more rules about how to use the special letters. In addition to the Tag Pair spelled exactly the same, including uppercase, lowercase, (eg <STUDENT> and </ Student> is illegal) and all values ofattributes must be located between two parentheses dual (eg, standalone = yes is illegal, must be used standalone = "yes" will be.)

Design SXMLParser

There is a VB6 class to take care of almost all the parsing an XML file. Once instantiated an Object ofClass clsXMLParser , we simply give it the name of the XML file is that it starts working right parsing.

As in the color of the XML above shows the main part of the XML is from the fourth row beginning with Tag Open <library> . Corresponding to each Open Close Tag is a tag, eg </ library> . Within each pair may have pairs Tags Tags (child) else.

An Open Tag Attributes can contain multiple pairs of the form Name = "Value" . Note that the Value must be between two brackets.

SXMLParser a character will go through each of XML files. When reading a Tag Open, for example:

< book hard back = "yes" series = "Professional C + +" >

Event to Raise a StartElement SXMLParser to be handled in the district by SubXMLParser_StartElement Form. This event for the Form Tag district's name and acollection of pairs Name = "Value" attributes, such as containing the first hardback bookTag = "yes" series = "Professional C + +", for example.

In Sub XMLParser_StartElement do three things simultaneously:
  1. Beauty XML code, ie the indentation found that, depending on the hierarchical ones for easy reading.
  2. Add color for HTML files to display XML code in the WebBrowser
  3. Creating Nodes in the TreeView
Private  Sub XMLParser_StartElement ( ByVal Name As String, ByVal tagAttributes As clsAttributes) 
   'A Complete Start Element Has Been processed 
   Dim TStr 
   'Build a string of Atributes' Name = "Value" pairs
   = TStr BuildAttributeString (tagAttributes) 
   'Display Name Tag in XML Pretty Listbox
   lstXML.AddItem Space (XMLParser.NestedLevel TabWidth *) & "<" & Name & TStr & ">" 
   'Add blue color to the equal sign
   TStr = Replace (TStr, "=" " = ") 
   'prepare the HTML color Name tag
   lstHTML.AddItem Space (XMLParser.NestedLevel TabWidth *) & "<font color=red> 
    <</ Font>" & _ "<font color=blue>" & Name & "" & "<font color=green> "TStr & &" 
    </ font> "&" <font color=red>> </ font> " 
   'Add a node to the TreeView and save the ITS index in the stack of nested nodes 
   If XMLParser.NestedLevel = 0 Then  
      'Create The root node 
      With XMLTree.Nodes.Add (,,, Name) nodeStack (0) =. Index  
      'Save the node index in the stack
         . Expanded = True   'Expand the node 
      End  With  
      'Create a child node of the Higher level nested mode 
      With XMLTree.Nodes.Add (nodeStack (XMLParser.NestedLevel - 1), tvwChild,, Name) 
       nodeStack (XMLParser.NestedLevel) =. Index  'Save the node index in the stack
         . Expanded = True   'Expand the node 
      End  With  
   End  If  End  Sub 
To re-create every Name = "Value" for the collection of the attributes of a Tag is used BuildAttributeString Function as follows:

Function BuildAttributeString(ByVal tagAttributes As clsAttributes) As String 
   ' Build a string of Atributes' Name="Value" pairs for Element or Instruction
   Dim i, TStr 
   Dim attr As clsAttributeItem 
   ' Iterate through each Attribute in the collection
   For i = 1 To tagAttributes.Count 
      ' refer to i-th attribute
      Set attr = tagAttributes.Item(i) 
      ' Start with a space, create string Name="Value"
      TStr = TStr & " " & attr.Name & "=""" & attr.Value & """" 
   BuildAttributeString = TStr  ' Return the resultant string
End Function 
Below is a list of Events Raised by the district handle SXMLParser to Form:

' Start of parsing
Event StartDocument() 
' End of parsing
Event EndDocument() 
' An XML Instruction has been parsed
Event ProcessingInstruction(ByVal Name As String, ByVal tagAttributes As clsAttributes) 
' An XML comment has been parsed
Event Comment(ByVal Text As String) 
' An open tag as been parsed
Event StartElement(ByVal Name As String, ByVal tagAttributes As clsAttributes) 
' A close tag as been parsed
Event EndElement(ByVal Name As String) 
' A block of text has been parsed
Event Characters(ByVal Text As String) 
' Error encountered while parsing
Event ParseError(ByVal ErrorNo As Integer, ByVal Description As String) 
While parsing SXMLParser State or Mode change depending on the condition it was looking for something, such as the character <,>, Attribute Name, Attribute Value, Close Tag. Etc. Discover if it is not well-formed XML, it will Raise an Event ParseError with reason and contact details on it to show error in the right form, helping to modify user know where in the XML file.

List of Error that SXMLParser support are listed below. Please note that the Error Message is not clear as I imagined because the parser is not smart like us.

Const cParseEmptyXML = 1 
Const cParseNoCommentCloseTag = 2 
Const cParseNoValueCloseQuote = 3 
Const cParseNoAttributeName = 4 
Const cParseNoEqualSign = 5 
Const cParseNoAttributeValue = 6 
Const cParseNoCDataCloseTag = 7 
Const cParseUnknownSymbols = 8 
Const cParseNoOpenQuote = 9 
Const cParseBadOpenTag = 10 
Const cParseBadCloseTag = 11 
Const cParseMismatchTagName = 12 
Const cParseNoInstructionCloseTag = 13 
Because of the need to have Tag Tag Pairs Names identical (uppercase, lowercase) and not be interleaved, so we need a stack to hold the Tag Names in accordance with the lower castes. A Stack is a list in order of Last-In, First-Out, which is something new to be the first one.

This is done by Stack Class clsStack . Items clsStack contains a String, which Items are separated by a vbNullChar (ASC character with value equal to 0). Latest Item (Last-In) is the left end of the String.

There are three main Functions of the Class clsStack Push (to insert a tagname in), pop (to get the latest tagname) and LastIn (tagname to view only the latest, do not get it.)

Public  Sub Push (InItem As String) 
   'Push a item up the Stack. 
   " Remove any vbNullChars in the item 
   'Use vbNullChar as the delimiter 
   'Item Prefix the string to the Stack
   mStacks = Replace (InItem, vbNullChar, "") & & mStacks vbNullChar 
   mCount = mCount + 1  'increment the item count 
End  Sub 
Public  Function Pop () As  String  
   'Remove and return the item in the Stack LastIn. 
   Dim Pos 
   If mCount> 0 Then 
      pos = Instr (mStacks, vbNullChar)  'Locate vbNullChar 
      If Pos> 0 Then 
         Pop = Left (mStacks, pos - 1)  ' Extract the LastIn Item
         mStacks = Mid (mStacks, pos + 1)  'Keep the strings remain of the Stack
         mCount = mCount - 1  'decrement the Item Count 
      End  If  
   End  If  
End  Function 
Public  Function LastIn () As  String  
   'View the LastIn Item in the Stack. Leave unchanged Stack 
   Dim Pos 
   If mCount> 0 Then 
      pos = Instr (mStacks, vbNullChar)  'Locate vbNullChar 
      If Pos> 0 Then 
         LastIn = Left (mStacks, pos - 1)  'Extract the LastIn Item 
      End  If  
   End  If  
End  Function 

Especially in this program, just a single click on the Paste button, the XML text in the Clipboard will be converted to HTML code for the beauty and color display XML code in the WebBrowser. Right after that you can paste the Clipboard content on a Web page.
Below is a listing of the Sub CmdPaste_Click

Private Sub CmdPaste_Click() 
   ' Parse the Clipboard content and copy the resultant colour 
     HTML back to clipboard
   Dim i, TStr 
   ' Fetch content of clipboard
   TStr = Clipboard.GetText(vbCFText) 
   ' Write it temporarily to "Temp.XML" file in the same folder
     where this program resides
   WriteTextFile GetLocalDirectory & "Temp.XML", TStr 
   ' Place the XML filename into TextBox txtFilename
   txtFilename.Text = GetLocalDirectory & "Temp.XML" 
   ' Emulating User's action of clicking the commandbutton Parse
   ' If there're something as a result, copy everything from the Listbox lstHTML
   ' except for the first and last line, which contain HTML header/footer.
   ' Select the required lines from the Textbox
   If lstHTML.ListCount > 2 Then 
      For i = 1 To lstHTML.ListCount - 2 
         lstHTML.Selected(i) = True 
      ' Emulating User's action of clicking the commandbutton Copy
   End If 
End Sub 


Post a Comment