One of the first things you will have to do when dealing with XML programmatically is take an XML document and parse it. The post will show the basics of XML parsing, using the easiest methods. We will look closely at two simple examples of how to use XML parser described in previous post to parse an XML file.

To process the XML we shall create an XMLReader to read the XML file content info temporary buffer. The read() function in XMLReader return an XMLDocument structure - the pointer to the top of parsed XML content. If an error occurs during parsing (for example, if the XML is not well-formated), the XMLReader will return a Null pointer.

The XMLDocument is high-level container for the XML data, that contains a list of XMLElements. Each XML documents begins with an identification element - "XML DOCUMENT v1.0". After identification string XML document shall contain an XML declaration, that indicates the version of XML used in the document.

To get the top element of the XML document we have to use document_get_element() function. We have then use the element_get_element() function of the XMLElement structure to going through XML document and retrieve the data we need. Each XML container structure such as XMLDocument, XMLElement and XMLAttribute provides similar API to work with container elements - on a first call with Null argument for requested element, the function returns the first element in this container. Each subsequent call of the function shall contains returned value as argument to retrieve next element. The end element of the container is Null pointer.

Example 1. Parsing sequential XML document

The first example simply displays the XML content stored in /tmp/installer-xml_reader_example1 file.

Input:

<?xml version="1.0" encoding="UTF-8"?>
<config>
    <entry key="Key1" Value="Value1">Entry1</entry>
    <entry key="Key2" Value="Value2">Entry2</entry>
    <entry key="Key3" Value="Value3">Entry3</entry>
</config>

Code

XMLReader    *reader    = NULL;
    XMLDocument  *document  = NULL;
    XMLElement   *element   = NULL;
    XMLElement   *element_t = NULL;
    XMLAttribute *attribute = NULL;
    
    wchar_t *element_name     = NULL;
    wchar_t *element_value    = NULL;
    wchar_t *attribute_name   = NULL;
    wchar_t *attribute_value  = NULL;

    char file_name[] = "/tmp/installer-xml_reader_example1";

    setlocale(LC_ALL, "en_US.UTF-8");

    pointer = fopen(file_name, "r");
    if (pointer == NULL) {
        fprintf(stderr, "[ERROR|%s] %s - %s\n",
                "u_installer_xml_example_example1", file_name, strerror(errno));
        return false;
    }
    
    /* read the file and verifying the structure */
    reader   = installer_xml_reader_new(pointer);
    if (reader == NULL) {
        u_installer_main_print("xml_example", "example1(1)", false);
        return false;
    }
    
    document = installer_xml_reader_read(reader);
    if ((document == NULL) || (document->elements == NULL)) {
        u_installer_main_print("xml_example", "example1(2)", false);
        return false;
    }
    
    /* 
     * main example code 
     */

    /* XML DOCUMENT v1.0 */
    element = installer_xml_document_get_element(document, NULL);
    element_name  = installer_xml_element_get_name(element);
    element_value = installer_xml_element_get_value(element);
    printf("%ls %ls\n", element_name, element_value);
    
    /* <?xml version="1.0" encoding="UTF-8"?> */
    element_t = installer_xml_element_get_element(element, NULL);
    attribute = installer_xml_element_get_attribute(element_t, NULL);
    element_name    = installer_xml_element_get_name(element_t);
    attribute_name  = installer_xml_attribute_get_name(attribute);
    attribute_value = installer_xml_attribute_get_value(attribute);
    printf("<?%ls %ls=\"%ls\" ", 
                    element_name, attribute_name, attribute_value);
    attribute = installer_xml_element_get_attribute(element_t, attribute);
    attribute_name  = installer_xml_attribute_get_name(attribute);
    attribute_value = installer_xml_attribute_get_value(attribute);
    printf("%ls=\"%ls\"?>\n", 
                    attribute_name, attribute_value);
    
    /* <config> */
    element = installer_xml_element_get_element(element, element_t);
    printf("<%ls>\n", installer_xml_element_get_name(element));
    
    /* nullify temporary pointer to use it as iterator */
    element_t = NULL;
    while ((element_t = installer_xml_element_get_element(element, element_t)) != NULL) {
        attribute = installer_xml_element_get_attribute(element_t, NULL);
        element_name    = installer_xml_element_get_name(element_t);
        element_value   = installer_xml_element_get_value(element_t);
        attribute_name  = installer_xml_attribute_get_name(attribute);
        attribute_value = installer_xml_attribute_get_value(attribute);
        printf("\t<%ls %ls=\"%ls\" ", 
                    element_name, attribute_name, attribute_value);
        attribute = installer_xml_element_get_attribute(element_t, attribute);
        attribute_name  = installer_xml_attribute_get_name(attribute);
        attribute_value = installer_xml_attribute_get_value(attribute);
        printf("%ls=\"%ls\">%ls</%ls>\n", 
                    attribute_name, attribute_value, element_value, element_name);
    }
    
    /* free memory and remove temporary file */
    installer_xml_reader_delete(reader);
    fclose(pointer);

Output:

XML DOCUMENT v1.0
<?xml version="1.0" encoding="UTF-8"?>
<config>
<entry key="Key1" Value="Value1">Entry1</entry>
<entry key="Key2" Value="Value2">Entry2</entry>
<entry key="Key3" Value="Value3">Entry3</entry>

Example 2. Parsing nested XML document

The second example is more complicated, - for now we have nested nodes. This is real-world example of storing books data on Amazon website. This examples displays how to extract inheritance data by using while loop. You can replace printf() function to any other handler in your application to process XML data on your own.

Input:

<?xml version="1.0" encoding="UTF-8"?>
<books>
    <category name="Art & Photography">
        <book author="Alex Johnson" published="01 May 2012">Bookshelf</book>
        <book author="Andrew Blauner" published="24 Apr 2012">Central Park</book>
    </category>
    <category name="Romance">
        <book author="Gena Showalter" published="05 Jul 2012">Wicked Nights</book>
        <book author="Elle Kennedy" published="01 May 2012">Midnight Rescue</book>
    </category>
</books>

Code:

XMLReader    *reader    = NULL;
    XMLDocument  *document  = NULL;
    XMLElement   *element   = NULL;
    XMLElement   *element_t = NULL;
    XMLElement   *element_c = NULL; /* the pointer to book's category */
    XMLElement   *element_b = NULL; /* the pointer to book entry */
    XMLAttribute *attribute = NULL;
    
    wchar_t *element_value    = NULL;
    wchar_t *attribute_value  = NULL;

    char file_name[] = "/tmp/installer-xml_reader_example2";

    setlocale(LC_ALL, "en_US.UTF-8");

    pointer = fopen(file_name, "r");
    if (pointer == NULL) {
        fprintf(stderr, "[ERROR|%s] %s - %s\n",
                "u_installer_xml_example_example2", file_name, strerror(errno));
        return false;
    }
    
    /* read the file and verifying the structure */
    reader   = installer_xml_reader_new(pointer);
    if (reader == NULL) {
        u_installer_main_print("xml_example", "example2(1)", false);
        return false;
    }
    
    document = installer_xml_reader_read(reader);
    if ((document == NULL) || (document->elements == NULL)) {
        u_installer_main_print("xml_example", "example2(2)", false);
        return false;
    }
    
    /* 
     * main example code 
     */
     
    /* XML DOCUMENT v1.0 */
    element = installer_xml_document_get_element(document, NULL);
    
    /* <?xml version="1.0" encoding="UTF-8"?> */
    element_t = installer_xml_element_get_element(element, NULL);
    
    /* <books> */
    element = installer_xml_element_get_element(element, element_t);
    
    /* nullify temporary pointer to use it as iterator */
    while ((element_c = installer_xml_element_get_element(element, element_c)) != NULL) {
        attribute       = installer_xml_element_get_attribute(element_c, NULL);
        attribute_value = installer_xml_attribute_get_value(attribute);
        printf("%ls\n", attribute_value);
        
        while ((element_b = installer_xml_element_get_element(element_c, element_b)) != NULL) {
            element_value   = installer_xml_element_get_value(element_b);
            attribute       = installer_xml_element_get_attribute(element_b, NULL);
            attribute_value = installer_xml_attribute_get_value(attribute);
            printf("\t%ls by %ls ", element_value, attribute_value);
            
            attribute       = installer_xml_element_get_attribute(element_b, attribute);
            attribute_value = installer_xml_attribute_get_value(attribute);
            printf("(%ls)\n", attribute_value);
        }
    }
    
    /* free memory and remove temporary file */
    installer_xml_reader_delete(reader);
    fclose(pointer);

Output:

Art & Photography
Bookshelf by Alex Johnson (01 May 2012)
Central Park by Andrew Blauner (24 Apr 2012)
Romance
Wicked Nights by Gena Showalter (05 Jul 2012)
Midnight Rescue by Elle Kennedy (01 May 2012)

You can obtain corresponding code from project source code: http://code.google.com/p/installer-core/source/browse/test/util/xml/u_xml_example.h