There are some common features in providing website content, but also many differences. Applications easily become complex as they tackle real world problems, and there has been much real innovation in web systems. So the areas to look at in this article are:
- Major areas for content development
- A review of minor yet important areas
- How a simple text manager is built
- An outline of a complex content delivery extension
Discussion and considerations
Now, we will work through the major areas of website content, devoting a section to each one. A round up of some less important aspects of content completes the discussion, leaving us ready to move on to details of implementation.
Articles, blogs, magazines, and FAQ
The most basic requirement is for text and pictures, and the simplest scheme needs little more than the standard database and a WYSIWYG editor. An extension that works at this level is illustrated later in the article. It is pretty much essential to have an ability to create items of this kind in an unpublished state so that they can be revised until ready for use. The state is then changed to published. Almost immediately, a further requirement arises to specify a range of publication dates, so that material aimed at a specific event can be automatically published at the appropriate time. Likewise, it is desirable to have an automatic mechanism for removing information that is no longer current, for example because it refers to a coming event in terms that will be irrelevant once the event has passed. A website that carries plainly obsolete articles is unlikely to be popular!
There are many ways to organize textual material. One is to place it into some kind of tree structure, rather akin to the classification schemes used in libraries. Ideally, such a scheme has no particular constraints on the depth of the tree structure. A concern with this approach is that it can quickly lead to a conflict between two alternative uses—classification according to subject and classification according to reader permissions. An option that can be used in conjunction with a tree structure is to use some form of tagging. This introduces much greater flexibility in some respects, as it is easy to apply multiple tags to a single item of content, which can therefore be classified in a wide variety of ways, and can appear under multiple headings.
A blog is an example of a system that might work best with a combination of a classification tree and a tagging scheme. Where there are several people creating blogs, the different authors fit well with a tree structure, since there is no question of an item belonging to more than one author. On the other hand, items are often tagged according to their subject matter, and several tags may be applicable to an individual article. If authors create more than one blog and there are questions about which visitors are able to see which blog, then careful thought needs to be given as to whether the split of blogs is best handled by the classification tree or by tagging. Using a tree achieves rigid separation, and is easily amenable to imposing access controls. But if the same item appears in more than one blog, then tagging works better as the item is ideally stored only once but has multiple tags. Blogs also frequently provide for comments, discussed in the next section.
A magazine is typically a collection of articles. For a simple case, it might be adequate for the articles of the magazine to be equated to website pages, but a more sophisticated magazine would want to avoid restrictions of that kind. The basic unit of content would still need to be an individual article, but website pages then require some kind of template to build a page from multiple items.
One popular application for quite simple content is the compilation of frequently asked questions (FAQ’s). Advanced implementations might be described more grandly as knowledge bases. Again, both a classification tree and tagging can be relevant, but a useful FAQ (and especially one that wants to be a knowledge base) also needs effective search facilities so that information can be easily found.
In all of these cases, added complexity arises if facilities like versioning are needed. Another similar issue is the need for workflow and differing roles, such as authors and editors. Mention of roles suggests a RBAC mechanism. It seems unlikely that one single model will ever meet every requirement in areas such as versioning and workflow. Version control can become extremely complex, and usually requires the allocation of roles that involve access rights and functional capabilities. Workflow is much the same. In both cases, though, simple and rigid schemes are liable to create problems. For example, the same person is quite likely to be an author in some situations, and an editor or publisher in others. A flexible and an efficient RBAC system is a pre-requisite for handling these problems, but as discussed earlier, the technical provision of RBAC is only a start. Applying it to particular systems and creating an appropriate user interface is a considerable challenge.
Comments and reviews
One of the successful innovations brought about by widespread use of the Web has been feedback through comments and reviews. Amazon is only one of many sites that now include reviews by customers of the products on sale. It could be said that this is a form of social networking, as the more sophisticated sites maintain profiles of reviewers and encourage them to achieve their own identity. Regular readers in particular areas of interest can get to know reviewers and form an opinion on the reliability of their views.
There are two main problems with implementing comments and reviews. One is the question of how to generalize the facility, so as to avoid implementing it repeatedly in different applications. The other is how to deal with the ever present threat of spam.
From the point of view of a developer, handling comments raises much the same issues regardless of what may be the subject of the comments. So blogs, selections of products, image galleries, and so on are all capable of having comments added to their items using similar mechanisms. This suggests a structure something like the scheme where the coarse grained structure is the component, but its display is achieved through the use of a template and a number of modules. Comments can thus be generated by a module that knows relatively little about the application, only enough to keep its comments separate from those for other applications and to relate a set of comments to a particular item, whether it is a blog item, product, gallery image, or whatever. That deals with the display of existing comments, which still leaves a requirement for a general interface that allows new comments to be added. The comment facility can easily enough handle the acceptance of a new comment, although it may need help if the page that accepts comments is to also show the object to which the comment applies. The comment facility also needs to know where to hand control once a new comment has been completed. Some moderately tricky detailed design is involved in providing an implementation of the full scheme.
The other big problem with any facility that permits visitors to a site to enter information for display is that it attracts spammers. Usually, they arrive not in person but in the form of automated bots that can become very sophisticated. There are bots that know how to obtain an account, and log in to a range of systems. There are even bots that can handle CAPTCHAs (those messed up images out of which you are supposed to decipher letters or numbers). Some of the bots can handle CAPTCHAs better than some humans, which makes for accessibility problems. Fortunately, much link spamming is for the purpose of promoting websites, and so the spammer has to give away some information in the form of the link to the site being promoted. A reasonably effective defense against this kind of spamming is a collaborative scheme for blacklisting sites. Even that is not totally effective, as spammers find ways to create new sites quickly and cheaply, so that the threat is constantly changing. As with most forms of attack, there is unlikely to be any conclusion to this battle.
Forums are a very popular Web feature, providing a structured means for public or private discussion. Developing a forum is a major undertaking, and most people will prefer to choose from existing software products. Forum software usually provides for visitors to contribute messages, either starting a new topic or replying to an existing one. There is often a hierarchical structure to the messages so that a number of different areas of interest can be covered in a convenient way. Advanced systems include sophisticated user management, including support for a variety of different groups, which provides a means to decide who has access to which topics. Unwanted messages are a constant threat, and most active forums need moderators to weed them out.
Development of a new forum will clearly need a number of the framework features discussed earlier. Robust user control is essential, and if different users are granted different access rights, a good system of RBAC is a requirement. A forum is highly amenable to the use of cache, since pages are likely to be constructed out of a number of database records, but the records are updated relatively infrequently. To be responsive, the cache needs to have a degree of intelligence so that pages with new contributions are refreshed quickly. Mail services are likely to be employed so that subscribers can receive notification of new contributions to topics in which they have registered an interest.
Another approach is to seek a degree of integration between off the shelf forum software and the CMS. The most popular area for integration is user login. Obviously it is necessary to obtain some information about the way in which the forum software is implemented. Provided that can be found, then it is a relatively simple matter to integrate with a CMS that has been built with plentiful plug in triggers around the area of user authentication. From the point of view of visual integration, the amount of screen space needed by a forum is such that it is often difficult to build it within the framework of a typical CMS. Often a better approach is to build a custom theme for the forum that includes links back to the main site, so as to avoid completely losing continuity of navigation.
Galleries, repositories, and streaming
Although they have come from different requirements, galleries, and file repositories have a lot in common. Both start out simple and rapidly become complex. The general idea of a gallery is to build a collection of images, typically organized into categories and accessible via small versions of the images (thumbnails). File repositories have long been popular since the days of bulletin boards, where collections of files (often programs) were made available for download. Ideally the organization into categories (or folders or containers) is flexible with no particular limit on the depth to which subcategories can go.
Some basic requirements relate to security. It is obviously essential to avoid hosting files that could contain malicious PHP code. This includes avoiding uploads of image files that contain PHP code embedded within actual image data. Simple checks can be fooled by this technique, but a block on the .php extension prevents the code being interpreted. Another potentially major security issue is bandwidth theft. If files or images are too easily accessed, then other sites may choose to use them without acknowledgment, transferring the bandwidth costs to the site hosting the material.
As applications broaden, access control becomes an issue. Files are to be made available only to a restricted group, and uploads may be restricted more tightly again. There may be administrator oversight, with uploads needing approval. Once again, we are seeing a demand for an effective access control system, preferably role-based. In fact demands on systems of this kind can easily become very sophisticated, such as allowing users to have personal upload areas over which they have complete control to determine who is able to gain access. An RBAC system that is technically capable of handling this can be built relatively easily, although creating a good user interface is a challenge.
Whether the system is a gallery or file repository, the use of thumbnail images is increasingly prevalent. File uploads may, therefore, be accompanied by one or more image files that are used to enhance the display of the files available.
Information about the system is likely to be needed, such as which are the most recent additions to the collection, which items are most popular, who has accessed what, and who has uploaded what. Information of this kind can also contribute to security by providing an audit trail of what has been happening to the system.
Streaming of files is a demand now often placed on a file repository, as the files can be audio or video files made available for immediate access. Streaming is simply a mode of file processing whereby the information is delivered to the user at a speed adequate for consumption in real time. Clearly video tends to place greater demands on the system than audio. The problems are both hardware and software related, although with steadily improving technology it is increasingly feasible to overcome both.
E-commerce and payments
Everyone is aware of the huge growth of commercial transactions on the Web. The kind of transaction involved can vary widely across simple fixed price retail sales, auctions of various kinds, and reverse auctions for procurement. For retail transactions immediate settlement is usually required, whereas larger scale business to business transactions are usually handled through relatively traditional invoicing methods. Even those are tending to be altered towards paperless billing and payment schemes that cut transaction costs to a minimum.
Systems for e-commerce vary enormously in their sophistication from simple requests for payment using a PayPal button to highly sophisticated Web operations such as Amazon and eBay. Open source PHP software exists to cover a significant part of this spectrum, some of it in the form of extensions to CMS frameworks.
PayPal has achieved a very high profile, especially with smaller operators, by offering easy access for merchants combined with technology that is relatively simple to implement. This includes the ability to complete a transaction with online confirmation in a way that is suitable for the sale of electronically deliverable goods such as software.
Clearly, robust authentication of users is essential for e-commerce. For all but the simplest transactions, some kind of shopping cart is highly desirable. These requirements imply a need for good session handling, preferably taking effect as soon as a visitor arrives at a site. Nearly every shopping site will allow a visitor to accumulate items in a shopping cart prior to any kind of login.
There is a plethora of payment systems, some of them suitable mainly for large volume uses, but others that can be applied on a small scale. A particular CMS framework might adopt some standard payment mechanisms that are then integral to the CMS and can be used whenever needed. Security is obviously paramount, as loss of data is both financially damaging and extremely bad for the site’s reputation.
E-commerce sites also often use a number of the features described in other sections here. A popular addition is the ability for customers to review the items they have purchased. This kind of facility may lead to further requirements to distinguish categories of users so as to give incentives to people who regularly write reviews.