10 min read

Project Overview

What

Mashup the web APIs from Last.fm and YouTube to create a video jukebox of songs

Protocols Used

REST (XML-RPC available)

Data Formats

XML, XPSF, RSS

Tools Featured

PEAR

APIs Used

Last.fm and YouTube

 

Now that we’ve had some experience using web services, it’s time to fine tune their use. XML-RPC, REST, and SOAP will be frequent companions when you use web services and create mashups. You will encounter a lot of different data formats, and interesting ways in which the PHP community has dealt with these formats. This is especially true because REST has become so popular. In REST, with no formalized response format, you will encounter return formats that vary from plain text to ad-hoc XML to XML-based standards.

The rest of our projects will focus on exposing us to some new formats, and we will look at how to handle them through PHP. We will begin with a project to create our own personalized video jukebox. This mashup will pull music lists feeds from the social music site, Last.fm. We will parse out artist names and song titles from these feeds and use that information to search videos on YouTube, a user-contributed video site, using the YouTube web service. By basing the song selections on ever-changing feeds, our jukebox selection will not be static, and will change as our music taste evolves. As YouTube is a user-contributed site, we will see many interesting interpretations of our music, too. This jukebox will be personalized, dynamic, and quite interesting.

Both Last.fm and YouTube’s APIs offer their web services through REST, and YouTube additionally offers an XML-RPC interface. Like with previous APIs, XML is returned with each service call. Last.fm returns either plain text, an XML playlist format called XSPF (XML Shareable Playlist Format), or RSS (Really Simple Syndication). In the case of YouTube, the service returns a proprietary format. Previously, we wrote our own SAX-based XML parser to extract XML data. In this article, we will take a look at how PEAR, the PHP Extension and Application Repository, can do the XSPF parsing work for us on this project and might help in other projects.

Let’s take a look at the various data formats we will be using, and then the web services themselves.

XSPF

One of XML’s original goals was to allow industries to create their own markup languages to exchange data. Because anyone can create their own elements and schemas, as long as people agreed on a format, XML can be used as the universal data transmission language for that industry. One of the earliest XML-based languages was ChemXML, a language used to transmit data within the chemical industry. Since then, many others have popped up.

XSPF was a complete grassroots project to create an open, non-proprietary music playlist format based on XML. Historically, playlists for software media players and music devices were designed to be used only on the machine or device, and schemas were designed by the vendor themselves. XSPF’s goal was to create a format that could be used in software, devices, and across networks.

XSPF is a very simple format, and is easy to understand. The project home page is at http://www.xspf.org. There, you will find a quick start guide which outlines a simple playlist as well as the official specifications at http://www.xspf.org/specs. Basically, a typical playlist has the following structure:

<?xml version="1.0" encoding="UTF-8"?>
<playlist version="1" >
<title>Shu Chow's Playlist</title>
<date>2006-11-24T12:01:21Z</data>
<trackList>
<track>
<title>Pure</title>
<creator>Lightning Seeds</creator>
<location>
file:///Users/schow/Music/Pure.mp3
</location>
</track>
<track>
<title>Roadrunner</title>
<creator>The Modern Lovers</creator>
<location>
file:///Users/schow/Music/Roadrunner.mp3
</location>
</track>
<track>
<title>The Bells</title>
<creator>April Smith</creator>
<location>
file:///Users/schow/Music/The_Bells.mp3
</location>
</track>
</trackList>
</playlist>

playlist is the parent element for the whole document. It requires one child element, trackList, but there can be several child elements that are the metadata for the playlist itself. In this example, the playlist has a title specified in the title element, and the creation date is specified in the date element. Underneath trackList are the individual tracks that make up the playlist. Each track is encapsulated by the track element. Information about the track, including the location of its file, is encapsulated in elements underneath track. In our example, each track has a title, an artist name, and a local file location. The official specifications allow for more track information elements such as track length and album information.

Here are the playlist child elements summarized:

Playlist Child Element

Required?

Description

trackList

Yes

The parent of individual track elements. This is the only required child element of a playlist. Can be empty if the playlist has no songs.

title

No

A human readable title of the XSPF playlist.

creator

No

The name of the playlist creator.

annotation

No

Comments on the playlist.

info

No

A URL to a page containing more information about the playlist.

location

No

The URL to the playlist itself.

identifier

No

The unique ID for the playlist. Must be a legal Uniform Resource Name (URN).

image

No

A URL to an image representing the playlist.

date

No

The creation (not the last modified!) date of the playlist. Must be in XML schema dateTime format. For example, “2004-02-27T03:30:00”.

license

No

If the playlist is under a license, the license is specified with this element.

attribution

No

If the playlist is modified from another source, the attribution element gives credit back to the original source, if necessary.

link

No

Allows non-XSPF resources to be included in the playlist.

meta

No

Allows non-XSPF metadata to be included in the playlist.

extension

No

Allows non-XSPF XML extensions to be included in the playlist.

A trackList element has an unlimited number of track elements to represent each track. track is the only allowed child of trackList. track’s child elements give us information about each track. The following table summarizes the children of track:

Track Child Element

Required?

Description

location

No

The URL to the audio file of the track.

identifier

No

The canonical ID for the playlist. Must be a legal URN.

title

No

A human readable title of the track. Usually, the song’s name.

creator

No

The name of the track creator. Usually, the song’s artist.

annotation

No

Comments on the track.

info

No

A URL to a page containing more information about the track.

image

No

A URL to an image representing the track.

album

No

The name of the album that the track belongs to.

trackNum

No

The ordinal number position of the track in the album.

duration

No

The time to play the track in milliseconds.

link

No

Allows non-XSPF resources to be included in the track.

meta

No

Allows non-XSPF metadata to be included in the track.

extension

No

Allows non-XSPF XML extensions to be included in the track.

Note that XSPF is very simple and track oriented. It was not designed to be a repository or database for songs. There are not a lot of options to manipulate the list. XSPF is merely a shareable playlist format, and nothing more.

RSS

The simplest answer to, “What is RSS?”, is that it’s an XML file used to publish frequently updated information, like news items, blogs entries, or links to podcast episodes. News sites like Slashdot.org and the New York Times provide their news items in RSS format. As new news items are published, they are added to the RSS feed. Being XML-based, third-party aggregator software makes reading news items easy. With one piece of software, I can tell it to grab feeds from various sources and read the news items in one location. Web applications can also read and parse RSS files. By offering an RSS feed for my blog, another site can grab the feed and keep track of my daily life. This is one way by which a small site can provide rudimentary web services with minimal investment.

The more honest answer is that it is a group of XML standards (used to publish frequently updated information like news items or blogs) that may have little compatibility with each other. Each version release also has a tale of conflict and strife behind it. We won’t dwell on the politicking of RSS. We’ll just look at the outcomes. The RSS world now has three main flavors:

  • The RSS 1.0 branch includes versions 0.90, 1.0, and 1.1. It’s goal is to be extensible and flexible. The downside to the goals is that it is a complex standard.
  • The RSS 2.0 branch includes versions 0.91, 0.92, and 2.0.x. Its goal is to be simple and easy to use. The drawback to this branch is that it may not be powerful enough for complex sites and feeds.

There are some basic skeletal similarities between the two formats. After the XML root element, metadata about the feed itself is provided in a top section. After the metadata, one or more items follow. These items can be news stories, blog entries, or podcasts episodes. These items are the meat of an RSS feed.

The following is an example RSS 1.1 file from XML.com:

<Channel 

rdf_about="http://www.xml.com/xml/news.rss">

<title>XML.com</title>
<link>http://xml.com/pub</link>
<description>
XML.com features a rich mix of information and services for the
XML community.
</description>
<image rdf_parseType="Resource">
<title>XML.com</title>
<url>http://xml.com/universal/images/xml_tiny.gif</url>
</image>
<items rdf_parseType="Collection">
<item rdf_about=
"http://www.xml.com/pub/a/2005/01/05/restful.html">
<title>
The Restful Web: Amazon's Simple Queue Service
</title>
<link>
http://www.xml.com/pub/a/2005/01/05/restful.html
</link>
<description>
In Joe Gregorio's latest Restful Web column, he explains
that Amazon's Simple Queue Service, a web service offering a
queue for reliable storage of transient
messages, isn't as RESTful as it claims.
</description>
</item>
<item rdf_about=
"http://www.xml.com/pub/a/2005/01/05/tr-xml.html">
<title>
Transforming XML: Extending XSLT with EXSLT
</title>
<link>
http://www.xml.com/pub/a/2005/01/05/tr-xml.html
</link>
<description>
In this month's Transforming XML column, Bob DuCharme
reports happily that the promise of XSLT extensibility via
EXSLT has become a reality.
</description>
</item>
</items>
</Channel>

The root element of an RSS file is an element named Channel. Immediately, after the root element are elements that describe the publisher and the feed. The title, link, description, and image elements give us more information about the feed.

The actual content is nested in the items element. Even if there are no items in the feed, the items element is required, but will be empty. Usage of these elements can be summarized as follows:

Channel Child Element

Required?

Description

title

Yes

A human readable title of the channel.

link

Yes

A URL to the feed.

description

Yes

A human readable description of the feed.

items

Yes

A parent element to wrap around item elements.

image

No

A section to house information about an official image for the feed.

others

No

Any other elements not in the RSS namespace can be optionally included here. The namespace must have been declared earlier, and the child elements must be prefixed.

If used, the image element needs its own child elements to hold information about the feed image. A title element is required and while optional, a link element to the actual URL of the image would be extremely useful.

Each news blog, or podcast entry is represented by an item element. In this RSS file, each item has a title, link, and a description, each, represented by the respective element. This file has two items in it before the items and Channel elements are closed off.

LEAVE A REPLY

Please enter your comment!
Please enter your name here