9 min read

Formation of XML

Let us look at the structure of a common XML document in case you are totally new to XML. If you are already familiar with XML, which we greatly recommend for this article, then it is not a section for you.

Let’s look at the following example, which represents a set of emails:

<?xml version="1.0" encoding="ISO-8859-1" ?>
<emails>
<email>
<from>[email protected]</from>
<to>[email protected]</to>
<subject>there is no subject</subject>
<body>is it a body? oh ya</body>
</email>
</emails>

So you see that XML documents do have a small declaration at the top which details the character set of the document. This is useful if you are storing Unicode texts. In XML, you must close the tags as you start it. (XML is more strict than HTML, you must follow the conventions.)

Let’s look at another example where there are some special symbols in the data:

<?xml version="1.0" encoding="ISO-8859-1" ?>
<emails>
<email>
<from>[email protected]</from>
<to>[email protected]</to>
<subject>there is no subject</subject>
<body><![CDATA[is it a body? oh ya, with some texts
& symbols]]></body>
</email>
</emails>

This means you have to enclose all the strings containing special characters with CDATA.

Again, each entity may have some attributes with it. For example consider the following XML where we describe the properties of a student:

<student age= "17" class= "11" title= "Mr.">Ozniak</student>

In the above example, there are three attributes to this student tag—age, class, and title. Using PHP we can easily manipulate them too. In the coming sections we will learn how to parse XML documents, or how to create XML documents on the fly.

Introduction to SimpleXML

In PHP4 there were two ways to parse XML documents, and these are also available in PHP5. One is parsing documents via SAX (which is a standard) and another one is DOM. But it takes quite a long time to parse XML documents using SAX and it also needs quite a long time for you to write the code.

In PHP5 a new API has been introduced to easily parse XML documents. This was named SimpleXML API. Using SimpleXML API you can turn your XML documents into an array. Each node will be converted to an accessible form for easy parsing.

Parsing Documents

In this section we will learn how to parse basic XML documents using SimpleXML. Let’s take a breath and start.

$str = <<< END
<emails>
<email>
<from>[email protected]</from>
<to>[email protected]</to>
<subject>there is no subject</subject>
<body><![CDATA[is it a body? oh ya, with some texts &
symbols]]></body>
</email>
</emails>
END;
$sxml = simplexml_load_string($str);
print_r($sxml);
?>

The output is like this:

SimpleXMLElement Object
(
[email] => SimpleXMLElement Object
(
[from] => [email protected]
[to] => [email protected]
[subject] => there is no subject
[body] => SimpleXMLElement Object
(
)

)

)

So now you can ask how to access each of these properties individually. You can access each of them like an object. For example, $sxml->email[0] returns the first email object. To access the from element under this email, you can use the following code like:

echo $sxml->email[0]->from

So, each object, unless available more than once, can be accessed just by its name. Otherwise you have to access them like a collection. For example, if you have multiple elements, you can access each of them using a foreach loop:

foreach ($sxml->email as $email)
echo $email->from;

Accessing Attributes

As we saw in the previous example, XML nodes may have attributes. Remember the example document with class, age, and title? Now you can easily access these attributes using SimpleXML API. Let’s see the following example:

<?
$str = <<< END
<emails>
<email type="mime">
<from>[email protected]</from>
<to>[email protected]</to>
<subject>there is no subject</subject>
<body><![CDATA[is it a body? oh ya, with some texts &
symbols]]></body>
</email>

</emails>
END;
$sxml = simplexml_load_string($str);

foreach ($sxml->email as $email)
echo $email['type'];

?>

This will display the text mime in the output window. So if you look carefully, you will understand that each node is accessible like properties of an object, and all attributes are accessed like keys of an array. SimpleXML makes XML parsing really fun.

Parsing Flickr Feeds using SimpleXML

How about adding some milk and sugar to your coffee? So far we have learned what SimpleXML API is and how to make use of it. It would be much better if we could see a practical example. In this example we will parse the Flickr feeds and display the pictures. Sounds cool? Let’s do it.

If you are interested what the Flickr public photo feed looks like, here is the content. The feed data is collected from http://www.flickr.com/services/feeds/photos_public.gne:

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<feed
>

<title>Everyone's photos</title>
<link rel="self"
href="http://www.flickr.com/services/feeds/photos_public.gne" />
<link rel="alternate" type="text/html"
href="http://www.flickr.com/photos/"/>
<id>tag:flickr.com,2005:/photos/public</id>
<icon>http://www.flickr.com/images/buddyicon.jpg</icon>
<subtitle></subtitle>
<updated>2007-07-18T12:44:52Z</updated>
<generator uri="http://www.flickr.com/">Flickr</generator>

<entry>
<title>A-lounge 9.07_6</title>
<link rel="alternate" type="text/html"
href="http://www.flickr.com/photos/dimitranova/845455130/"/>
<id>tag:flickr.com,2005:/photo/845455130</id>
<published>2007-07-18T12:44:52Z</published>
<updated>2007-07-18T12:44:52Z</updated>
<dc:date.Taken>2007-07-09T14:22:55-08:00</dc:date.Taken>
<content type="html">&lt;p&gt;&lt;a
href=&quot;http://www.flickr.com/people/dimitranova/&quot;
&gt;Dimitranova&lt;/a&gt; posted a photo:&lt;/p&gt;

&lt;p&gt;&lt;a
href=&quot;http://www.flickr.com/photos/dimitranova/845455130/
&quot; title=&quot;A-lounge 9.07_6&quot;&gt;&lt;img src=&quot;
http://farm2.static.flickr.com/1285/845455130_dce61d101f_m.jpg
&quot; width=&quot;180&quot; height=&quot;240&quot; alt=&quot;
A-lounge 9.07_6&quot; /&gt;&lt;/a&gt;&lt;/p&gt;

</content>
<author>
<name>Dimitranova</name>
<uri>http://www.flickr.com/people/dimitranova/</uri>
</author>
<link rel="license" type="text/html" href="deed.en-us" />
<link rel="enclosure" type="image/jpeg"
href="http://farm2.static.flickr.com/1285/
845455130_7ef3a3415d_o.jpg" />

</entry>
<entry>
<title>DSC00375</title>
<link rel="alternate" type="text/html"
href="http://www.flickr.com/photos/53395103@N00/845454986/"/>
<id>tag:flickr.com,2005:/photo/845454986</id>
<published>2007-07-18T12:44:50Z</published>
...
</entry>
</feed>

Now we will extract the description from each entry and display it. Let’s have some fun:

<?
$content =
file_get_contents(
"http://www.flickr.com/services/feeds/photos_public.gne ");

$sx = simplexml_load_string($content);
foreach ($sx->entry as $entry)
{
echo "<a href='{$entry->link['href']}'>".$entry->title."</a><br/>";
echo $entry->content."<br/>";
}
?>

This will create the following output. See, how easy SimpleXML is? The output of the above script is shown below:

Cooking XML with OOP

Managing CDATA Sections using SimpleXML

As we said before, some symbols can’t appear directly as a value of any node unless you enclose them using CDATA tag. For example, take a look at following example:

<?
$str = <<<EOT
<data>
<content>text & images </content>

</data>

EOT;
$s = simplexml_load_string($str);
?>

This will generate the following error:

<br />
<b>Warning</b>: simplexml_load_string()
[<a href='function.simplexml-load-string'>
function.simplexml-load-string</a>]:
Entity: line 2: parser error : xmlParseEntityRef:
no name in <b>C:OOP with PHP5Codesch8cdata.php</b>
on line <b>10</b><br />
<br />
<b>Warning</b>: simplexml_load_string()
[<a href='function.simplexml-load-string'>
function.simplexml-load-string</a>]:
&lt;content&gt;text &amp; images &lt;/content&gt;
in <b>C:OOP with PHP5Codesch8cdata.php</b>
on line <b>10</b><br />
<br />
<b>Warning</b>: simplexml_load_string()
[<a href='function.simplexml-load-string'>
function.simplexml-load-string</a>]:
^ in <b>C:OOP with PHP5Codesch8cdata.php</b>
on line <b>10</b><br />

To avoid this problem we have to enclose using a CDATA tag. Let’s rewrite it like this:

<data>
<content><![CDATA[text & images ]]></content>
</data>

Now it will work perfectly. And you don’t have to do any extra work for managing this CDATA section.

<?
$str = <<<EOT
<data>
<content><![CDATA[text & images ]]></content>

</data>

EOT;
$s = simplexml_load_string($str);
echo $s->content;//print "text & images"
?>

However, prior to PHP5.1, you had to load this section as shown below:

$s = simplexml_load_string($str,null,LIBXML_NOCDATA);

LEAVE A REPLY

Please enter your comment!
Please enter your name here