(For more resources on PHP, see here.)
Building methods that:
- Handle common patterns of data manipulation securely and efficiently
- Help ease the database changes needed as data requirements evolve
- Provide powerful data objects at low cost
Discussion and considerations
Relational databases provide an effective and readily available means to store data. Once established, they normally behave consistently and reliably, making them easier to use than file systems. And clearly a database can do much more than a simple file system!
Efficiency can quickly become an issue, both in relation to how often requests are made to a database, and how long queries take. One way to offset the cost of database queries is to use a cache at some stage in the processing.
Whatever the framework does, a major factor will always be the care developers of extensions take over the design of table structures and software; the construction of SQL can also make a big difference. Examples included here have been assiduously optimized so far as the author is capable, although suggestions for further improvement are always welcome!
Web applications are typically much less mature than more traditional data processing systems. This stems from factors such as speed of development and deployment. Also, techniques that are effective for programs that run for a relatively long time do not make sense for the brief processing that is applied to a single website request. For example, although PHP allows persistent database connections, thereby reducing the cost of making a fresh connection for each request, it is generally considered unwise to use this option because it is liable to create large numbers of dormant processes, and slow down database operations excessively.
Likewise, prepared statements have advantages for performance and possibly security but are more laborious to implement. So, the advantages are diluted in a situation where a statement cannot be used more than once. Perhaps, even more than performance, security is an issue for web development, and there are well known routes for attacking databases. They need to be carefully blocked.
The primary goal of a framework is to make further development easy. Writing web software frequently involves the same patterns of database access, and a framework can help a lot by implementing methods at a higher level than the basic PHP database access functions.
In an ideal world, an object-oriented system is developed entirely on the basis of OO principles. But if no attention is paid to how the objects will be stored, problems arise. An object database has obvious appeal, but for a variety of reasons, such databases are not widely used. Web applications have to be pragmatic, and so the aim pursued here is the creation of database designs that occasionally ignore strict relational principles, and objects that are sometimes simpler than idealized designs might suggest. The benefit of making these compromises is that it becomes practical to achieve a useful correspondence between database rows and PHP objects.
It is possible that PHP Data Objects (PDO) will become very important in this area, but it is a relatively new development. Use of PDO is likely to pick up gradually as it becomes more commonly found in typical web hosting, and as developers get to understand what it can offer. For the time being, the safest approach seems to be for the framework to provide classes on which effective data objects can be built. A great deal can be achieved using this technique.
Lest this section create too much disappointment, let me say at the outset that this article does not provide any help with achieving database independence. The best that can be done here is to explain why not, and what can be done to limit dependency.
Nowadays, the most popular kind of database employs the relational model. All relational database systems implement the same theoretical principles, and even use more or less the same structured query language. People use products from different vendors for an immense variety of reasons, some better than others. For web development, MySQL is very widely available, although PostgreSQL is another highly regarded database system that is available without cost. There are a number of well-known proprietary systems, and existing databases often contain valuable information, which motivates attempts to link them into CMS implementations.
In this situation, there are frequent requests for web software to become database independent. There are, sadly, practical obstacles towards achieving this.
It is conceptually simple to provide the mechanics of access to a variety of different database systems, although the work involved is laborious. The result can be cumbersome, too. But the biggest problem is that SQL statements are inclined to vary across different systems.
It is easy in theory to assert that only the common core of SQL that works on all database systems should be used. The serious obstacle here is that very few developers are knowledgeable about what comprises the common core. ANSI SQL might be thought to provide a system neutral language, but then not all of ANSI SQL is implemented by every system. So, the fact is that developers become expert in one particular database system, or at best a handful.
Skilled developers are conscious of the standardization issue, and where there is a choice, they will prefer to write according to standards. For example, it is better to write:
SELECT username, userid, count(userid) AS number FROM aliro_session AS s
INNER JOIN aliro_session_data AS d ON s.session_id = d.session_id WHERE
isadmin = 0 GROUP BY userid
SELECT username, userid, count(userid) AS number FROM aliro_session AS s,
aliro_session_data AS d WHERE s.session_id = d.session_id AND isadmin = 0
GROUP BY userid
This is because it makes the nature of the query clearer, and also because it is less vulnerable to detailed syntax variations across database systems.
Use of extensions that are only available in some database systems is a major problem for query standardization. Again, it is easy while theorizing to deplore the use of non-standard extensions. In practice, some of them are so tempting that few developers resist them.
An older MySQL extension was the REPLACE command, which would either insert or update data depending on whether a matching key was already present in the database. This is now discouraged on the grounds that it achieved its result by deleting any matching data before doing an insertion. This can have adverse effects on linked foreign keys but the newer option of the INSERT … ON DUPLICATE KEY construction provides a very neat, efficient way to handle the case where data needs to go into the database allowing for what is already there. It is more efficient in every way than trying to read before choosing between INSERT and UPDATE, and also avoids the issue of needing a transaction.
Similarly, there is no standard way to obtain a slice of a result set, for example starting with the eleventh item, and comprising the next ten items. Yet this is exactly the operation that is needed to efficiently populate the second page of a list of items, ten per page. The MySQL extension that offers LIMIT and LIMITSTART is ideal for this purpose.
Because of these practical issues, independence of database systems remains a desirable goal that is rarely fully achieved. The most practical policy seems to avoid dependencies where this is possible at reasonable cost.
The role of the database
We already noted that a database can be thought of as uncontrolled global data, assuming the database connection is generally available. So there should be policies on database access to prevent this becoming a liability.
One policy adopted by Aliro is to use two distinct databases. The “core” database is reserved for tables that are needed by the basic framework of the CMS. Other tables, including those created by extensions to the CMS framework, use the “general” database.
Although it is difficult to enforce restrictions, one policy that is immediately attractive is that the core database should never be accessed by extensions. How data is stored is an implementation matter for the various classes that make up the framework, and a selection of public methods should make up the public interface. Confining access to those public methods that constitute the API for the framework leaves open the possibility of development of the internal mechanisms with little or no change to the API. If the framework does not provide the information needed by extensions, then its API needs further development. The solution should not be direct access to the core database.
Much the same applies to the general database, except that it may contain tables that are intended to be part of an API. By and large, extensions should restrict their database operations to their own tables, and provide object methods to implement interfaces across extensions. This is especially so for write operations, but should usually apply to all database operations.
Level of database abstraction
There have been some clues earlier in this article, but it is worth squarely addressing the question of how far the CMS database classes should go in insulating other classes from the database.
All of the discussions here are based on the idea that currently the best available style of development is object oriented. But we have already decided that using a true object database is not usually a practical option for web development. The next option to consider is building a layer to provide an object-relational transformation, so that outside of the database classes, nobody needs to deal with purely relational concepts or with SQL. An example of a framework that does this is Propel, which can be found at http://propel.phpdb.org/trac/.
While developments of this kind are interesting and attractive in principle, I am not convinced that they provide an acceptable level of performance and flexibility for current CMS developments. There can be severe overheads on object-relational operations and manual intervention is likely to be necessary if high performance is a goal. For that reason, it seems that for some while yet, CMS developments will be based on more direct use of a relational database.
Another complicating factor is the limitations of PHP in respect of static methods, which are obliged to operate within the environment of the class in which they are declared, irrespective of the class that was invoked in the call. This constraint is lifted in PHP 5.3 but at the time of writing, reliance on PHP 5.3 would be premature, software that has not yet found its way into most stable software distributions. With more flexibility in the use of static methods and properties, it would be possible to create a better framework of database-related properties.
Given what is currently practical, and given experience of what is actually useful in the development of applications to run within a CMS framework, the realistic goals are as follows:
- To create a database object that connects, possibly through a choice of different connectors, to a particular database and provides the ability to run SQL queries
- To enable the creation of objects that correspond to database rows and have the ability to load themselves with data or to store themselves in the database
Some operations, such as the update of a single row, are best achieved through the use of a database row object. Others, such as deletion, are often applied to a number of rows, chosen from a list by the user, and are best effected through a SQL query.
You can obtain powerful code for achieving the automatic creation of HTML by downloading the full Aliro project. Unfortunately, experience in use has been disappointing. Often, so much customization of the automated code is required that the gains are nullified, and the automation becomes just an overhead. This topic is therefore given little emphasis.