|
||||||||||||
|
Chapter 3: PHP and the Document Object Model (DOM)"A fool sees not the same tree that a wise man sees." ~William Blake If you've been paying attention, you now know the basics of parsing XML with PHP. As Chapter 2, "PHP and the Simple API for XML (SAX)" demonstrated, it's pretty simplewhip up some XML data and mix in a few callback functions. It's a simple yet effective recipe, and one that can be used to great effect for the rapid development of XML-based applications. That said, although the event-driven approach to XML parsing is certainly popular, it's not the only option available. PHP also allows you to parse XML using the Document Object Model (DOM), an alternative technique that allows developers to create and manipulate a hierarchical tree representation of XML data for greater flexibility and ease of use. In this chapter, this tree-based approach is explored in greater detail. First it is put under the microscope to see exactly how it works and then PHP's implementation of the DOM is introduced. The various methods exposed by PHP to simplify interaction with the DOM are also examined, together with examples and code listings that demonstrate its capabilities. Both tree- and event-based approaches have significant advantages and disadvantages, and these can impact your choice of technique when implementing specific projects. To that end, this chapter also includes a brief discussion of the pros and cons of each approach in the hope that it will assist you in making the right choice for a particular project. Let's get started! Document Object Model (DOM)The Document Object Model (DOM) is a standard interface to access and manipulate structured data. As the name suggests, it does this by modeling, or representing, a document as a hierarchical tree of objects. A number of different object types are defined in the W3C's DOM specification; these objects expose methods and attributes that can be used by the application layer to navigate and process the DOM tree, exploit the relationships between the different branches of the tree, and extract information from it. The W3C's DOM specification defines a number of different objects to represent the different structures that appear within an XML document. For example, elements are represented by an Element object, whereas attributes are represented by Attr objects. Each of these different object types exposes specific methods and properties. Element objects expose a tagName property containing the element name and getAttribute() and setAttribute() methods for attribute manipulation, whereas Attr objects expose a value property containing the value of the particular attribute. These methods and properties can be used by the application layer to navigate and process the DOM tree, exploit the relationships between the different branches of the tree, and extract information from it. The very first specification of the DOM (DOM Level 1) appeared on the W3C's web site in October 1998, and simply specified the "core" features of the DOMthe basic objects and the interfaces to them. The next major upgrade, DOM Level 2, appeared in November 2000; it examined the DOM from the perspective of core functions, event handling, and document traversal. DOM Level 3, which is currently under development, builds on past work, and incorporates additions and changes from other related technologies (XPath, abstract schemas, and so on). As a standard interface to structured data, the DOM was designed from the get-go to be platform- and language-independent. It can be (and is) used to represent structured HTML and XML data, with DOM (or DOM-based) implementations currently available for Java, JavaScript, Python, C/C++, Visual Basic, Delphi, Perl, SMIL, SVG, and PHP. (The PHP implementation is discussed in detail in the next section.) In order to better understand how the DOM works, consider Listing 3.1. Listing 3.1 A Simple XML Document<?xml version="1.0"?> <sentence>What a wonderful profusion of colors and smells in the market Once a DOM parser chewed on this document, it would spit out the tree structure shown in Figure 3.1.
Figure 3.1 A DOM tree. As you can see, the parser returns a tree containing multiple nodes linked to each other by parent-child relationships. Developers can then write code to move around the tree, access node properties, and manipulate node content. This approach is in stark contrast to the event driven approach you studied in Chapter 2, "PHP and the Simple APIfor XML (SAX)." A SAX parser progresses sequentially through a document, firing events based on the tags it encounters and leaving it to the application layer to decide how to process each event. A DOM parser, on the other hand, reads the entire document into memory, and builds a tree representation of its structure; the application layer can then use standard DOM interfaces to find and manipulate individual nodes on this tree, in a non-sequential manner. PHP and the DOMPHP 4.0 comes with a primitive, though effective, implementation of the DOM, based on the libxml library. Created by Daniel Veillard, libxml (http://www.xmlsoft.org/) is a modular, standards-compliant C library that provides XML parsing capabilities to the GNOME project (http://www.gnome.org/). If you're using a stock PHP binary, it's quite likely that you'll need to recompile PHP to add support for this library to your PHP build. (Detailed instructions for accomplishing this are available in Appendix A, "Recompiling PHP to Add XML Support.") Under Construction If you're planning on using PHP's DOM extension in your development activities, be warned that this extension is still under development and is, therefore, subject to change without notice. Consequently, DOM code that works with one version of PHP may need to be rewritten or retested with subsequent versions. Note also that the examples in this chapter have been tested with the DOM extension that ships with PHP 4.1.1, and are not likely to work with earlier versions because PHP's DOM implementation underwent some fairly radical changes between the release of PHP 4.0.6 and PHP 4.1.1. If you're using an earlier PHP build, you might want to upgrade to PHP 4.1.1 in order to try out the examples in this chapter. A Simple ExampleWhen PHP parses an XML document, it creates a hierarchical tree structure (mirroring the structure of the document) that is composed of objects. Each of these objects has standard properties and methods, and you can use these properties and methods to traverse the object tree and access specific elements, attributes, or character data. The best way to understand how this works is with a simple example. Take a look at Listing 3.2, which demonstrates the basic concepts of this technique by traversing a DOM tree to locate a particular type of element, and print its value. Listing 3.2 Traversing a DOM Tree<?php
// XML data
$xml_string = "<?xml version='1.0'?>
<sentence>What a wonderful profusion of colors and smells in the market -
<vegetable color='green'>cabbages</vegetable>,
<vegetable color='red'>tomatoes</vegetable>,
<fruit color='green'>apples</fruit>,
<vegetable color='purple'>aubergines</vegetable>,
<fruit color='yellow'>bananas</fruit>
</sentence>";
// create a DOM object from the XML data
if(!$doc = xmldoc($xml_string))
{
die("Error parsing XML");
}
// start at the root
$root = $doc->root();
// move down one level to the root's children
$children = $root->children();
// iterate through the list of children
foreach ($children as $child)
{
// if <vegetable> element
if ($child->tagname == "vegetable")
{
// go down one more level
// get the text node
$text = $child->children();
// print the content of the text node
echo "Found: " . $text[0]->content . "<br>";
}
}
?>
Let's go through Listing 3.2 step-by-step:
if(!$doc = xmldoc($xml_string))
{
die("Error parsing XML");
}
The following line of code returns an object representing the document element, and assigns it to the PHP variable $root: $root = $doc->root(); $children = $root->children(); foreach ($children as $child)
{
// if <vegetable> element
if ($child->tagname == "vegetable")
{
// go down one more level
// get the text node
$text = $child->children();
// print the content of the text node
echo "Found: " . $text[0]->content . "<br>";
}
}
When this script runs, it produces the following output: Found: cabbages Found: tomatoes Found: aubergines As Listing 3.2 demonstrates, DOM tree traversal takes place primarily by exploiting the parent-child relationships that exist between the nodes of the tree. After traversal to a particular depth has been accomplished, node properties can be used to extract all required information from the tree. Traversing the DOM with PHP's DOM ClassesBecause PHP's DOM parser works by creating standard objects to represent XML structures, an understanding of these objects and their capabilities is essential to using this technique effectively. This section examines the classes that form the blueprint for these objects in greater detail. DomDocument ClassA DomDocument object is typically the first object created by the DOM parser when it completes parsing an XML document. It may be created by a call to xmldoc(): $doc = xmldoc("<?xml version='1.0'?><element>potassium</element>");
Or, if your XML data is in a file (rather than a string), you can use the xmldocfile() function to create a DomDocument object: $doc = xmldocfile("element.xml");
Treading the Right Path If you're using Windows, you'll need to give xmldocfile() the full path to the XML file. Don't forget to include the drive letter! When you examine the structure of the DomDocument object with print_r(), you can see that it contains basic information about the XML documentincluding the XML version, the encoding and character set, and the URL of the document: DomDocument Object ( [name] => [url] => [version] => 1.0 [standalone] => -1 [type] => 9 [compression] => -1 [charset] => 1 ) Peekaboo! You'll notice that many examples in this book (particularly in this chapter) use the print_r() function to display the structure of a particular PHP variable. In case you're not familiar with this function, you should know that it provides an easy way to investigate the innards of a particular variable, array, or object. Use it whenever you need to look inside an object to see what makes it tick; and, if you're feeling really adventurous, you might also want to take a look at the var_dump() and var_export() functions, which provide similar functionality. Each of these properties provides information on some aspect of the XML document:
The application can use this information to make decisions about how to process the XML datafor example, as Listing 3.3 demonstrates, it may reject documents based on the version of XML being used. Listing 3.3 Using DomDocument Properties to Verify XML Version Information<?php
// XML data
$xml_string = "<?xml version='1.0'?><element>potassium</element>";
// create a DOM object
if (!$doc = xmldoc($xml_string))
{
die("Error in XML");
}
// version check
else if ($doc->version > 1.0)
{
die("Unsupported XML version");
}
else
{
// XML processing code here
}
?>
In addition to the properties described previously, the DomDocument object also comes with the following methods:
While parsing XML data, you'll find that the root() method is the one you use most often, whereas the add_root() and dumpmem() methods come in handy when you're creating or modifying an XML document tree in memory (discussed in detail in the "Manipulating DOM Trees" section). X Marks the Spot In case you're wondering, XPath, or the XML Path Language, provides an easy way to address specific parts of an XML document. The language uses directional axes, coupled with conditional tests, to create node collections matching a specific criterion, and also provides standard constructs to manipulate these collections. PHP's XPath implementation is discussed in detail in the upcoming section titled "Traversing the DOM with PHP's XPath Classes." In Listing 3.4, the variable $fruit contains the root node (the element named fruit). Listing 3.4 Accessing the Document Element via the DOM<?php
// create a DomDocument object
$doc = xmldoc("<?xml version='1.0' encoding='UTF-8'
To DTD or Not to DTD The dtd() method of the DomDocument object creates a DTD object, which contains basic information about the document's Document Type Definition. Here's what it looks like: Dtd Object ( [systemId] => weather.dtd [name] => weather ) This DTD object exposes two properties: the systemId property reveals the filename of the DTD document, whereas the name property contains the name of the document element. DomElement ClassThe PHP parser represents every element within the XML document as an instance of the DomElement class, which makes it one of the most important in this lineup. When you view the structure of a DomElement object, you see that it has two distinct properties that represent the element name and type, respectively. You'll remember from Listing 3.2 that these properties can be used to identify individual elements and extract their values. Here is an example: DomElement Object ( [type] => 1 [tagname] => vegetable ) A special note should be made here of the type property, which indicates the type of node under discussion. This type property contains an integer value mapping to one of the parser's predefined node types. Table 3.1 lists the important types. Table 3.1 DOM Node Types
If you plan to use the type property within a script to identify node types (as I will be doing shortly in Listing 3.5), you should note that it is considered preferable to use the named constants rather than their corresponding integer values, both for readability and to ensure stability across API changes. The DomElement object also exposes a number of useful object methods:
Again, the two most commonly used ones are the children() and attributes() methods, which return an array of DomElement and DomAttribute objects, respectively. The get_attribute() method can be used to return the value of a specific attribute of an element (refer to Listing 3.8 for an example), whereas the new_child(), set_attribute(), and set_content() methods are used when creating or modifying XML trees in memory, and are discussed in detail in the section entitled "Manipulating DOM Trees." Note that PHP's DOM implementation does not currently offer any way of removing an attribute previously set with the set_attribute() method. Choices Most of the object methods discussed in this chapter can also be invoked as functions by prefixing the method name with domxml and passing a reference to the object as the first function argument. The following snippets demonstrate this: <?php // these two are equivalent $root1 = $doc->root(); $root2 = domxml_root($doc); // these two are equivalent $children1 = $root1->children(); $children2 = domxml_children($root2); ?> Listing 3.5 demonstrates one of these in action by combining the children() method of a DomElement object with a recursive function and HTML's unordered lists to create a hierarchical tree mirroring the document structure (similar in concept, though not in approach, to Listing 2.5). At the end of the process, a count of the total number of elements encountered is displayed. Listing 3.5 Representing an XML Document as a Hierarchical List<?php
// XML file
$xml_file = "letter.xml";
// parse it
if (!$doc = xmldocfile($xml_file))
{
die("Error in XML document");
}
// get the root node
$root = $doc->root();
// get its children
$children = get_children($root);
// element counter
// start with 1 so as to include document element
$elementCount = 1;
// start printing
print_tree($children);
// this recursive function accepts an array of nodes as argument,
// iterates through it and prints a list for each element found
function print_tree($nodeCollection)
{
global $elementCount;
// iterate through array
echo "<ul>";
for ($x=0; $x<sizeof($nodeCollection); $x++)
{
// add to element count
$elementCount++;
// print element as list item
echo "<li>" . $nodeCollection[$x]->tagname;
// go to the next level of the tree
$nextCollection = get_children($nodeCollection[$x]);
// recurse!
print_tree($nextCollection);
}
echo "</ul>";
}
// function to return an array of children, given a parent node
function get_children($node)
{
$temp = $node->children();
$collection = array();
// iterate through children array
for ($x=0; $x<sizeof($temp); $x++)
{
// filter out all nodes except elements
// and create a new array
if ($temp[$x]->type == XML_ELEMENT_NODE)
{
$collection[] = $temp[$x];
}
}
// return array containing child nodes
return $collection;
}
echo "Total number of elements in document: $elementCount";
?>
Listing 3.5 is fairly easy to understand. The first step is to obtain a reference to the root of the document tree via the root() method; this reference serves as the starting point for the recursive print_tree() function. This function obtains a reference to the children of the root node, processes them, and then calls itself again to process the next level of nodes in the tree. The process continues until all the nodes in the tree have been exhausted. An element counter is used to track the number of elements found, and to display a total count of all the elements in the document. DomText ClassCharacter data within an XML document is represented by the DomText class. Here's what it looks like: DomText Object ( [type] => 3 [content] => cabbages ) The type property represents the node type (XML_TEXT_NODE in this case, as can be seen from Table 3.1), whereas the content property holds the character data itself. In order to illustrate this, consider Listing 3.6, which takes an XML-encoded list of country names, parses it, and puts that list into a PHP array. Listing 3.6 Using DomText Object Properties to Retrieve Character Data from an XML Document<?php
// XML data
$xml_string = "<?xml version='1.0'?>
<earth>
<country>Albania</country>
<country>Argentina</country>
<!-- and so on -->
<country>Zimbabwe</country>
</earth>";
// create array to hold country names
$countries = array();
// create a DOM object from the XML data
if(!$doc = xmldoc($xml_string))
{
die("Error parsing XML");
}
// start at the root
$root = $doc->root();
// move down one level to the root's children
$nodes = $root->children();
// iterate through the list of children
foreach ($nodes as $n)
{
// for each <country> element
// get the text node under it
// and add it to the $countries[] array
$text = $n->children();
if ($text[0]->content != "")
{
$countries[] = $text[0]->content;
}
}
// uncomment this line to see the contents of the array
// print_r($countries);
?>
Fairly simplea loop is used to iterate through all the <country> elements, adding the character data found within each to the global $countries array. Taking up Space It's important to remember that XML, unlike HTML, does not ignore whitespace, but treats it as literal character data. Consequently, if your XML document includes whitespace or line breaks, PHP's DOM parser identifies them as text nodes, and creates DomText objects to represent them. This is a common cause of confusion for DOM newbies, who are often stumped by the "extra" nodes that appear in their DOM tree. DomAttribute ClassA call to the attributes() method of the DomElement object generates an array of DomAttribute objects, each of which looks like this: DomAttribute Object ( [name] => color [value] => green ) The attribute name can be accessed via the name property, and the corresponding attribute value can be accessed via the value property. Listing 3.7 demonstrates how this works by using the value of the color attribute to highlight each vegetable or fruit name in the corresponding color. Listing 3.7 Accessing Attribute Values with the DomAttribute Object<?php // XML data $xml_string = "<?xml version='1.0'?> <sentence> What a wonderful profusion of colors and smells in the market - <vegetable There is, of course, a simpler way to do thisjust use the DomElement object's get_attribute() method. Listing 3.8, which generates equivalent output to Listing 3.7, demonstrates this alternative (and much shorter) approach. Listing 3.8 Accessing Attribute Values (a Simpler Approach)<?php // XML data $xml_string = "<?xml version='1.0'?> <sentence> What a wonderful profusion of colors and smells in the market - <vegetable A Composite ExampleNow that you know how it works, how about seeing how it plays out in real life? This example takes everything you learned thus far, and uses that knowledge to construct an HTML file from an XML document. I'll be using a variant of the XML invoice (Listing 2.21) from Chapter 2, adapting the SAX-based approach demonstrated there to the new DOM paradigm. As you'll see, although the two techniques are fundamentally different, they can nonetheless achieve a similar effect. Listing 3.9 is the marked-up invoice. Listing 3.9 An XML Invoice (invoice.xml)<?xml version="1.0"?>
<invoice>
<customer>
<name>Joe Wannabe</name>
<address>
<line>23, Great Bridge Road</line>
<line>Bombay, MH</line>
<line>India</line>
</address>
</customer>
<date>2001-09-15</date>
<reference>75-848478-98</reference>
<items>
<item cid="AS633225">
<desc>Oversize tennis racquet</desc>
<price>235.00</price>
<quantity>1</quantity>
<subtotal>235.00</subtotal>
</item>
<item cid="GT645">
<desc>Championship tennis balls (can)</desc>
<price>9.99</price>
<quantity>4</quantity>
<subtotal>39.96</subtotal>
</item>
<item cid="U73472">
<desc>Designer gym bag</desc>
<price>139.99</price>
<quantity>1</quantity>
<subtotal>139.99</subtotal>
</item>
<item cid="AD848383">
<desc>Custom-fitted sneakers</desc>
<price>349.99</price>
<quantity>1</quantity>
<subtotal>349.99</subtotal>
</item>
</items>
<delivery>Next-day air</delivery>
</invoice>
Listing 3.10 parses the previous XML data to create an HTML page, suitable for printing or viewing in a browser. Listing 3.10 Formatting an XML Document with the DOM<html> <head> <basefont face="Arial"> </head> <body bgcolor="white"> <font size="+3">Sammy's Sports Store</font> <br> <font size="-2">14, Ocean View, CA 12345, USA Figure 3.2 shows what the output looks like.
Figure 3.2 Sammy's Sports Store invoice. As with the SAX example (refer to Listing 2.23), the first thing to do is define arrays to hold the HTML markup for specific tags; in Listing 3.10, this markup is stored in the $startTagsArray and $endTagsArray variables. Next, the XML document is read by the parser, and an appropriate DOM tree is generated in memory. An array of objects representing the first level of the treethe children of the root nodeis obtained and the function print_tree() is called. This print_tree() function is a recursive function, and it forms the core of the script. The print_tree() function accepts a node list as argument, and iterates through this list, examining each node and processing it appropriately. As you can see, the function is set up to perform specific tasks, depending on the type of node:
Additionally, if the node is an element, the print_tree() function obtains a list of the element's childrenif any existand proceeds to call itself with that node list as argument. And so the process repeats itself until the entire tree has been parsed. As Listing 3.10 demonstrates, this technique provides a handy way to recursively scan through a DOM tree and perform different actions based on the type of node encountered. You can use this technique to count, classify, and process the different types of elements encountered (Listing 3.5 demonstrated a primitive element counter); or even construct a new tree from the existing one. Process Failure If you've been paying attention, you will have noticed that the XML invoice in Listing 3.9 is not exactly the same as the one shown in Listing 2.21. Listing 2.21 included an additional processing instruction (PI), a call to the PHP function displayTotal(), which is missing in Listing 3.9. Why? Because the DOM extension that ships with PHP 4.1.1 has trouble with processing instructions, and tends to barf all over the screen when it encounters one. Later (beta) versions of the extension do, however, include a fix for the problem. Traversing the DOM with PHP's XPath ClassesThe DOM classes discussed in the previous section are more than adequate for most common tasks, but getting used to them can take awhile. Additionally, for long and complex documents containing a large number of elements and/or levels, progressing from node to node in an iterative manner can often be tedious. You can use a recursive function to simplify the process, but you'll still have to write a fair amount of code to create and manipulate node collections, which are at different levels of the tree. It's precisely to simplify this process that PHP also comes with a couple of XPath classes. XPath, as you may already know, is an addressing mechanism for XML documents, designed to allow XML document authors to quickly access node collections on the basis of both location and condition. A discussion of XPath is beyond the scope of this book, so I'll assume that you already know the basics of axes, predicates, and node tests. In case you don't, you might want to brush up on the basics before proceeding with this section. Go to the companion web site for this book http://www.xmlphp.com/ or http://www.newriders.com) to find a list of reference material to get you started. XPathContext and XPathObject ClassesPHP's XPath classes add flexibility to the DOM parser by freeing developers from the standard parent-child paradigm when constructing node collections. The XPath classes allow developers to quickly build node collections matching specific criteriafor example, every third element or every element containing the attribute shape=squarewith scant regard for their position in the hierarchical document tree. The XPathContext class is used to set up a context for all XPath evaluations, and is created by a call to the xpath_new_context() function. This function must be passed a reference to a DomDocument object. For example: <?php // create a DomDocument object $doc = xmldoc($xml_string); // create an XPath context $xpath = $doc->xpath_new_context(); ?> If you think this doesn't look very interesting, you're rightit's not. The XPathContext object merely sets up a context for all future XPath evaluations. These XPath evaluations usually result in instances of the XPathObject class, which are far more interesting. An instance of the XPathObject class may be created with a call to the xpath_eval() method of the XPathContext object, which requires an XPath address for evaluation. If the XPath evaluates successfully, xpath_eval()returns an instance of the XPathObject class containing a collection of nodes matching the specified XPath expression. Take a look at Listing 3.11, which uses an XPath address to isolate all the vegetable elements in the document: Listing 3.11 Creating Node Collections with XPath<?php
// XML data
$xml_string = "<?xml version='1.0'?>
<sentence>What a wonderful profusion of colors and smells in the market -
<vegetable color='green'>cabbages</vegetable>,
<vegetable color='red'>tomatoes</vegetable>,
<fruit color='green'>apples</fruit>,
<vegetable color='purple'>aubergines</vegetable>,
<fruit color='yellow'>bananas</fruit>
</sentence>";
$doc = xmldoc($xml_string);
// create an XPath context
$xpath = $doc->xpath_new_context();
// get all the "vegetable" elements
$vegetables = $xpath->xpath_eval("//vegetable");
// uncomment the next line to see the node collection
// print_r($vegetables);
?>
When you examine the structure of the XPathObject object instance with print_r(), here's what you'll see: XPathObject Object
(
[type] => 1
[nodeset] => Array
(
[0] => DomElement Object
(
[type] => 1
[tagname] => vegetable
)
[1] => DomElement Object
(
[type] => 1
[tagname] => vegetable
)
[2] => DomElement Object
(
[type] => 1
[tagname] => vegetable
)
)
)
As you can see, the object contains an array of DomElement objects, representing the element nodes matching the XPath expression. These DomElement objects can now be accessed and manipulated using standard class methods and properties. A Composite ExampleIn order to demonstrate just how powerful XPath can be, consider the following situation. In a research project conducted to study the effect of temperature on bacterial culture growth, researchers publish their findings as XML data. Listing 3.12 contains a sample of this data. Listing 3.12 A Compilation of Experiment Readings (data.xml)<?xml version="1.0"?> <project id="49"> <!-- data for 3 cultures: Alpha, Beta and Gamma, tested at temperatures It now becomes necessary to compile this raw data into an easily understandable table so that the results can be analyzed. Ideally, what is required is a 2x2 table displaying the temperature scale on the Y-axis and the culture type on the X-axis. The intersection of the two axes should be an average of all readings made for that culture at that temperature. With XPath, this is a snap to accomplish. Listing 3.13 demonstrates the script. Listing 3.13 Creating Node Collections with XPath<html>
<head>
<basefont face="Arial">
</head>
<body bgcolor="white">
<?php
// XML file
$xml_file = "data.xml";
// parse document
$doc = xmldocfile($xml_file) or die("Could not read file!");
// create arrays to hold culture/temperature list
$cultures = array();
$temperatures = array();
// create XPath context
$xpath = $doc->xpath_new_context();
// get a list of "culture" nodes
$obj = $xpath->xpath_eval("//culture");
$nodeset = $obj->nodeset;
// ...and create an array containing
// the names of all available cultures
for ($x=0; $x<sizeof($nodeset); $x++)
{
$children = $nodeset[$x]->children();
$cultures[] = $children[0]->content;
}
// strip out duplicates
$cultures = array_unique($cultures);
// do the same thing for temperature points
$obj = $xpath->xpath_eval("//temperature");
$nodeset = $obj->nodeset;
for ($x=0; $x<sizeof($nodeset); $x++)
{
$children = $nodeset[$x]->children();
$temperatures[] = $children[0]->content;
}
$temperatures = array_unique($temperatures);
// sort both arrays
natsort($temperatures);
natsort($cultures);
?>
<table border="1" cellspacing="5" cellpadding="5">
<tr>
<td> </td>
<?php
// first row of table, print culture names
foreach($cultures as $c)
{
echo "<td>$c</td>";
}
?>
</tr>
<?php
foreach($temperatures as $t)
{
// create as many rows as there are temperature points
echo "<tr>";
echo "<td>$t</td>";
// for each intersection (culture, temperature)
// print average of available readings
foreach($cultures as $c)
{
echo "<td>" . intersection($t, $c) . "</td>";
}
echo "</tr>";
}
?>
</table>
<?php
// this function collects all readings for
// a particular culture/temperature
// totals them and averages them
function intersection($temperature, $culture)
{
// get a reference to the XPath context
global $xpath;
// set up variables to hold total and frequency
$total = 0;
$count=0;
// get a list of "reading" nodes
// for records with culture c and temperature t
$obj = $xpath->xpath_eval("//record[culture='" . $culture . "' and
I've used three different XPath expressions here. The first two are used to create a list of available cultures and temperature points; these are required for the row and column headings of the table. The third XPath returns a list of nodes matching a specific culture and temperature. Now, all I need to do is add the readings associated with each of these nodes to reach a total number, and divide that total number by the number of nodes (readings) to obtain an average cell count.
Figure 3.3 shows what the output looks like. Figure 3.3 Statistical analysis with Xpath. This kind of thing comes in particularly handy when you need to perform statistical analysis of sampling data; it provides a simple and easy way to bring together different elements of each sample, perform calculations on these elements, and relate them to each other in two or three dimensions. XPath's conditional expressions are a boon herealthough you can certainly do the same thing without XPath (and I encourage you to try, just so you have a better understanding of the difference), the process would be far more tedious. Manipulating DOM TreesThe preceding section discussed traversing an already-extant DOM tree, demonstrating how the nodes of the tree can be processed in a recursive manner. That's not all you can do with the DOM, though; it's also possible to programmatically construct DOM trees from scratch, or modify existing tree structures, and save the result as one or more XML documents. This section discusses the details. Creating New DOM TreesIf you go back to the section dealing with PHP's DOM classes, you'll see that both the DomDocument and DomElement objects include functions to create new documents, nodes, and attributes. The first of these is the new_xmldoc() method of the DomDocument object, which constructs and returns a new instance of the DomDocument object. After a DomDocument instance is available, it becomes possible to add new element and text nodes with the add_root() and new_child() methods. And why stop at elements? The set_attribute() method allows you to define and add attributes to specific elements as well. The following code snippet (see Listing 3.14) demonstrates this by creating a complete XML document tree on the fly with the add_root() and new_child() methods: Listing 3.14 Creating an XML Document Tree<?php
// create DomDocument object
$doc = new_xmldoc("1.0");
// add root node
$root = $doc->add_root("article");
// set attribute for root node
$root->set_attribute("id", "567");
// add children to the root
$title = $root->new_child("title", "Goat milk for dinner can cause insomnia");
$author = $root->new_child("author", "K. Kittle");
// note how I can programatically generate node values!
$date = $root->new_child("date", date("d-M-Y", mktime()));
// dump the tree as a string
echo $doc->dumpmem();
?>
After the tree is constructed to your satisfaction, you need to output it, either for display or storage. The DomDocument object's dumpmem() method returns a representation of the current tree as a string. You can then format it for printing, save it to a file, or transmit it to another agent. Here, Pretty! Note that if you intend to print the dynamically generated DOM tree, it might be a good idea to run your own formatting functions on it first to pretty it up a little. This is because dumpmem() outputs the document tree as a single string, without formatting or indenting it; in the case of long and/or complex XML documents, it can be fairly difficult to read. The ability to construct new DOM trees on the fly comes in particularly handy if you need to build a customized DOM tree from another data source. This data source may be a text file that needs to be parsed, a database that needs to be queried, or even another XML document tree that needs to be pruned or combined with other data. Consider Listing 3.15, which uses MySQL database records to construct an XML book catalog and display it to the user. Listing 3.15 Constructing a DOM Tree from a MySQL Resultset<?php
// create DomDocument object
$doc = new_xmldoc("1.0");
// add root node
$root = $doc->add_root("collection");
// query database for records
$connection = mysql_connect("localhost", "us8749", "secret") or die ("Unable to
Nothing too complicated hereI'm connecting to the database, extracting a list of titles and authors, and creating an XML document from the result. After the document tree has been created in memory, I can either display it (which is what I've done) or save it to a file (demonstrated in Listing 3.17). Manipulating Existing DOM TreesIt's also possible to use the functions described previously to modify an existing DOM tree. Consider the XML document in Listing 3.16, which contains the outline for a book chapter. Listing 3.16 A Book Chapter Marked up in XML (ch9.xml)<?xml version="1.0"?> <chapter id="9"> <!-- chapter 9 of a really bad pulp fiction novel --> <title>Luke Gets Angry</title> <para>As the black-suited warriors swarmed off the HUMVEE, Luke turned to Jo Now, let's suppose the author decides that "Luke" is actually a pretty wimpy name for the lead character. Instead, he decides to go with "Crazy Dan," which has a much more macho ring to it. Because he's already nine chapters into the book, he needs to change every previous occurrence of "Luke" to "Crazy Dan." All he needs to do is write a PHP program to construct a DOM tree from the XML file, scan through it for every occurrence of "Luke," alter it to "Crazy Dan," and save the result to a new file (see Listing 3.17). Search and Destroy I know, I know, he could use any text editor's search-and-replace function. But this chapter's about the DOM, smart guy. Listing 3.17 Performing a Search-and-Replace Operation on a DOM Tree<?php
// XML file
$xml_file = "/tmp/ch9.xml";
// parse document
if(!$doc = xmldocfile($xml_file))
{
die("Error in XML document");
}
// get the root
$root = $doc->root();
// children of the root
$children = $root->children();
// start traversing the tree
search_and_replace($children, "Luke", "Crazy Dan");
// all done, save the new tree to a file
// or display it if file write not possible
if (is_writable(dirname($xml_file)))
{
$filename = dirname($xml_file) . "/_new_" . basename($xml_file);
$fp = fopen($filename,"w+");
fputs($fp,$doc->dumpmem());
fclose($fp);
}
else
{
echo $doc->dumpmem();
}
// this is a recursive function to traverse the DOM tree
// when it finds a text node, it will look for the search string and replace with
// the replacement string
function search_and_replace($nodeCollection, $search, $replace)
{
for ($x=0; $x<sizeof($nodeCollection); $x++)
{
if ($nodeCollection[$x]->type == XML_ELEMENT_NODE)
{
// if element, it may contain child text nodes
// go one level deeper
$nextCollection = $nodeCollection[$x]->children();
search_and_replace($nextCollection, $search, $replace);
}
else if ($nodeCollection[$x]->type == XML_TEXT_NODE)
{
// if text node, perform replacement
$str = str_replace($search, $replace, $nodeCollection[$x]-
This example is similar to Listing 3.10, in that it too uses a recursive function to process the DOM tree. In this case, though, the recursive function limits its activities to two types of nodes: element nodes and text nodes. If the node is an element node, I ignore it, and call the recursive function again to move one level deeper into the tree; if it's a text node, I scan it for a match to the search string, substitute the replacement text, and write the new string back to the tree. After the process has concluded, the new DOM tree is written to a file (or, in the event that the directory is not accessible, displayed to the user). If you examine the resulting output, you'll notice one interesting thing about the set_content() methodit automatically replaces special characters (such as the double quotation marks in Listing 3.16) with the corresponding XML entities (in this case, "). Going Native You may sometimes come across situations that require you to convert raw XML markup into native data structures such as variables, arrays, or custom objects. For these situations, PHP's DOM parser includes a very specialized little function named xmltree(). The xmltree() function parses an XML string, and constructs a hierarchical tree of PHP objects representing the structured markup. This tree includes many of the same objects you've become familiar withinstances of the DomDocument, DomElement, DomText, and DomAttribute objects. xmltree() provides an easy way to quickly see the structure of a complete XML document. For the moment, though, that's all it's useful for; it's not possible to write the tree back to a file, or to memory, after manipulating it. Note also that, as of this writing, xmltree() only accepts an XML string. You cannot pass it a file name or file reference. DOM or SAX?Now that you've seen (and hopefully understood) the two most common approaches to parsing XML with PHP, you're probably wondering: Which one do I use? It's a good question, and one that doesn't have a one-size-fits-all answer. Both DOM and SAX approaches have advantages and disadvantages, and your choice of technique must depend on the type of data being parsed, the requirements of your application, and the constraints under which you are operating. The SAX approach is linear: It processes XML structures as it finds them, generating events and leaving the event handlers to decide what to do with each structure. The advantage of this approach is that it is resource-friendly; because SAX does not build a tree representation of the document in memory, it can parse XML data in chunks, processing large amounts of data with very little impact on memory. This also translates into better performance; if your document structure is simple, and the event handlers don't have anything too complicated to do, SAX-based applications will generally offer a speed advantage over DOM-based ones. The downside, though, is an inability to move around the document in a non- linear manner. SAX does not maintain any internal record of the relationships between the different nodes of an XML document (as the DOM does), making it difficult to create customized node collections or to traverse the document in a non-sequential manner. The only way around this with SAX is to create your own custom object model, and map the document elements into your own custom structuresa process that adds to complexity and can possibly degrade performance. Where SAX flounders, though, the DOM shines. The DOM creates a tree representation of the document in memory, making it possible to easily travel from one node to another, or even access the same node repeatedly (again, not something you can do easily in SAX). This tree representation is based on a standard, easy-to-understand model, making it easier to write code to interact with it. This flexibility does, however, come with an important caveat. Because the DOM builds a tree in memory, DOM processing cannot begin until the document has been fully parsed (SAX, on the other hand, can begin parsing a document even if it's not all available immediately). This reduces a developer's ability to "manage" the parsing process by feeding data to the parser in chunks, and also implies greater memory consumption and consequent performance degradation. Consequently, the choice of technique depends significantly on the type of constraints the application will be performing under, and the type of processing it will be expected to carry out. For systems with limited memory, SAX is a far more efficient approach. On the other hand, complex data-processing requirements can benefit from the standard object model and API of the DOM. SummaryThis chapter demonstrated an alternative approach to the event-based method of parsing XML data. The Document Object Model (DOM) builds a representation of the document structure in memory, and provides the application layer with a standard API to traverse and manipulate this tree. PHP implements the DOM via a number of standard classes; this chapter examined these classes in detail, together with examples of how they can be used to format XML data, construct XML documents on the fly, and create customized representations of XML data. Finally, the concluding section of this chapter examined the pros and cons of the SAX and DOM approaches, discussing them vis-a-vis with each other, in an attempt to help developers with the correct approach for their specific requirements. In the next chapter, I will be discussing Extensible Stylesheet Language Transformations (XSLT), which provides developers with yet another option when it comes to formatting and processing XML data. |
| Contact Us | E-mail Us | Site Guide | About PerfectXML | Advertise | | Privacy |