Zend_Dom_Query
Zend_Dom_Query provides mechanisms for querying
XML and (X)HTML documents utilizing either XPath or
CSS selectors. It was developed to aid with functional testing of
MVC applications, but could also be used for rapid development of screen
scrapers.
CSS selector notation is provided as a simpler and more familiar
notation for web developers to utilize when querying documents with XML
structures. The notation should be familiar to anybody who has developed
Cascading Style Sheets or who utilizes Javascript toolkits that provide
functionality for selecting nodes utilizing CSS selectors
(» Prototype's
$$() and
» Dojo's
dojo.query were both inspirations for the component).
Theory of Operation
To use Zend_Dom_Query, you instantiate a
Zend_Dom_Query object, optionally passing a document to
query (a string). Once you have a document, you can use either the
query() or queryXpath() methods; each
method will return a Zend_Dom_Query_Result object with
any matching nodes.
The primary difference between Zend_Dom_Query and using
DOMDocument + DOMXPath is the ability to select against CSS
selectors. You can utilize any of the following, in any combination:
-
element types: provide an element type to
match: 'div', 'a', 'span', 'h2', etc.
-
style attributes: CSS style attributes
to match: '.error', 'div.error',
'label.required', etc. If an
element defines more than one style, this will match as long as
the named style is present anywhere in the style declaration.
-
id attributes: element ID attributes to
match: '#content', 'div#nav', etc.
-
arbitrary attributes: arbitrary element
attributes to match. Three different types of matching are
provided:
-
exact match: the attribute exactly
matches the string: 'div[bar="baz"]' would match a div
element with a "bar" attribute that exactly matches the
value "baz".
-
word match: the attribute contains
a word matching the string: 'div[bar~="baz"]' would match a div
element with a "bar" attribute that contains the
word "baz". '<div bar="foo baz">' would match, but '<div
bar="foo bazbat">' would not.
-
substring match: the attribute contains
the string: 'div[bar*="baz"]' would match a div
element with a "bar" attribute that contains the
string "baz" anywhere within it.
-
direct descendents: utilize '>' between
selectors to denote direct descendents. 'div > span' would
select only 'span' elements that are direct descendents of a
'div'. Can also be used with any of the selectors above.
-
descendents: string together
multiple selectors to indicate a hierarchy along which
to search. 'div .foo span #one' would select an element
of id 'one' that is a descendent of arbitrary depth
beneath a 'span' element, which is in turn a descendent
of arbitrary depth beneath an element with a class of
'foo', that is an descendent of arbitrary depth beneath
a 'div' element. For example, it would match the link to
the word 'One' in the listing below:
<div>
<table>
<tr>
<td class="foo">
<div>
Lorem ipsum <span class="bar">
<a href="/foo/bar" id="one">One</a>
<a href="/foo/baz" id="two">Two</a>
<a href="/foo/bat" id="three">Three</a>
<a href="/foo/bla" id="four">Four</a>
</span>
</div>
</td>
</tr>
</table>
</div>
Once you've performed your query, you can then work with the result
object to determine information about the nodes, as well as to pull
them and/or their content directly for examination and manipulation.
Zend_Dom_Query_Result implements Countable
and Iterator, and store the results internally as
DOMNodes and DOMElements. As an example, consider the following call,
that selects against the HTML above:
span style="color: #ff0000;">'.foo .bar a'// get number of matches: 4
// $result is a DOMElement
}
Zend_Dom_Query also allows straight XPath queries
utilizing the queryXpath() method; you can pass any
valid XPath query to this method, and it will return a
Zend_Dom_Query_Result object.
Methods Available
The Zend_Dom_Query family of classes have the following
methods available.
Zend_Dom_Query
The following methods are available to
Zend_Dom_Query:
-
setDocumentXml($document): specify an
XML string to query against.
-
setDocumentXhtml($document): specify an
XHTML string to query against.
-
setDocumentHtml($document): specify an
HTML string to query against.
-
setDocument($document): specify a
string to query against; Zend_Dom_Query will
then attempt to autodetect the document type.
-
getDocument(): retrieve the original document
string provided to the object.
-
getDocumentType(): retrieve the document
type of the document provided to the object; will be one of
the DOC_XML, DOC_XHTML, or
DOC_HTML class constants.
-
query($query): query the document using
CSS selector notation.
-
queryXpath($xPathQuery): query the document
using XPath notation.
Zend_Dom_Query_Result
As mentioned previously, Zend_Dom_Query_Result
implements both Iterator and
Countable, and as such can be used in a
foreach() loop as well as with the
count() function. Additionally, it exposes the
following methods:
-
getCssQuery(): return the CSS
selector query used to produce the result (if any).
-
getXpathQuery(): return the XPath query
used to produce the result. Internally,
Zend_Dom_Query converts CSS
selector queries to XPath, so this value will always be populated.
-
getDocument(): retrieve the DOMDocument the
selection was made against.
|
|