Dealing with sessions can be confusing, and is also a source of security loopholes. So we want our CMS framework to provide basic mechanisms that are robust. We want them to be easy to use by more application-oriented software. To achieve these aims, we need to consider:
- The need for sessions and their working
- The pitfalls that can introduce vulnerabilities
- Efficiency and scalability considerations
Discussion and considerations
To see what is required for our session handling, we shall first review the need for them and consider how they work in a PHP environment. Then the vulnerabilities that can arise through session handling will be considered. Web crawlers for search engines and more nefarious activities can place a heavy and unnecessary load on session handling, so we shall look at ways to avoid this load. Finally, the question of how best to store session data is studied.
The need for continuity was mentioned when we first discussed users. But it is worth reviewing the requirement in a little more detail.
If Tim Berners-Lee and his colleagues had known all the developments that would eventually occur in the internet world, maybe the Web would have been designed differently. In particular, the basic web transport protocol HTTP might not have treated each request in isolation. But that is hindsight, and the Web was originally designed to present information in a computer-independent way. Simple password schemes were sufficient to control access to specific pages.
Nowadays, we need to cater for complex user management, or to handle things like shopping carts, and for these we need continuity. Many people have recognized this, and introduced the idea of sessions. The basic idea is that a session is a series of requests from an individual website visitor, and the session provides access to enduring information that is available throughout the session. The shopping cart is an obvious example of information being retained across the requests that make up a session. PHP has its own implementation of sessions, and there is no point reinventing the wheel, so PHP sessions are the obvious tool for us to use to provide continuity.
How sessions work
There are three main choices which have been available for handling continuity:
- Adding extra information to the URI
- Using cookies
- Using hidden fields in the form sent to the browser
All of them can be used at times. Which of them is most suitable for handling sessions?
PHP uses either of the first two alternatives. Web software often makes use of hidden variables, but they do not offer a neat way to provide an unobtrusive general mechanism for maintaining continuity. In fact, whenever hidden variables are used, it is worth considering whether session data would be a better alternative.
The PHP cookie-based session mechanism can seem obscure, so it is worth explaining how it works. First we need to review the working of cookies. A cookie is simply a named piece of data, usually limited to around 4,000 bytes, which is stored by the browser in order to help the web server to retain information about a user. More strictly, the connection is with the browser, not the user. Any cookie is tied to a specific website, and optionally to a particular part of the website, indicated by a path. It also has a life time that can be specified explicitly as a duration; a zero duration means that the cookie will be kept for as long as the browser is kept open, and then discarded.
The browser does nothing with cookies, except to save and then return them to the server along with requests. Every cookie that relates to the particular website will be sent if either the cookie is for the site as a whole, or the optional path matches the path to which the request is being sent. So cookies are entirely the responsibility of the server, but the browser helps by storing and returning them. Note that, since the cookies are only ever sent back to the site that originated them, there are constraints on access to information about other sites that were visited using the same browser.
In a PHP program, cookies can be written by calling the set_cookie function, or implicitly through session handling. The name of the cookie is a string, and the value to be stored is also a string, although the serialize function can be used to make more structured data into a string for storage as a cookie. Take care to keep the cookies within the size limit. PHP makes available the cookies that have been sent back by the browser in the $_COOKIES super-global, keyed by their names.
Apart from any cookies explicitly written by code, PHP may also write a session cookie. It will do so either as a result of calls to session handling functions, or because the system has been configured to automatically start or resume a session for each request. By default, session cookies do not use the option of setting an expiry time, but can be deleted when the browser is closed down. Commonly, browsers keep this type of cookie in memory so that they are automatically lost on shutdown.
Before looking at what PHP is doing with the session cookie, let’s note that there is an important general consideration for writing cookies. In the construction of messages between the server and the browser, cookies are part of the header. That means rules about headers must be obeyed. Headers must be sent before anything else, and once anything else has been sent, it is not permitted to send more headers. So, in the case of server to browser communication, the moment any part of the XHTML has been written by the PHP program, it is too late to send a header, and therefore too late to write a cookie.
For this reason, a PHP session is best started early in the processing. The only purpose PHP has in writing a session cookie is to allocate a unique key to the session, and retrieve it again on the next request. So the session cookie is given an identifying name, and its value is the session’s unique key. The session key is usually called the session ID, and is used by PHP to pick out the correct set of persistent values that belong to the session. By default, the session name is PHPSESSID but it can, in most circumstances, be changed by calling the PHP function session_name prior to starting the session. Starting, or more often restarting, a session is done by calling session_start, which returns the session ID. In a simple situation, you do not need the session ID, as PHP places any existing session data in another superglobal, $_SESSION. In fact, we will have a use for the session ID as you will soon see.
The $_SESSION super-global is available once session_start has been called, and the PHP program can store whatever data it chooses in it. It is an array, initially empty, and naturally the subscripts need to be chosen carefully in a complex system to avoid any clashes. The neat part of the PHP session is that provided it is restarted each time with session_start, the $_SESSION superglobal will retain any values assigned during the handling of previous requests. The data is thus preserved until the program decides to remove it. The only exception to this would be if the session expired, but in a default configuration, sessions do not expire automatically. Later in this article, we will look at ways to deliberately kill sessions after a determinate period of inactivity.
As it is only the session ID that is stored in the cookie, rules about the timing of output do not apply to $_SESSION, which can be read or written at any time after session_start has been called. PHP stores the contents of $_SESSION at the end of processing or on request using the PHP function session_write_close. By default, PHP puts the data in a temporary file whose name includes the session ID. Whenever the session data is stored, PHP retrieves it again at the next session_start.
Session data does not have to be stored in temporary files, and PHP permits the program to provide its own handling routines. We will look at a scheme for storing the session data in a database later in the article.
Avoiding session vulnerabilities
So far, the option to pass the session ID as part of the URI instead of as a cookie has not been considered. Looking at security will show why. The main security issue with sessions is that a cracker may find out the session ID for a user, and then hijack that user’s session. Session handling should do its best to guard against that happening. PHP can pass the session ID as part of the URI. This makes it especially vulnerable to disclosure, since URIs can be stored in all kinds of places that may not be as inaccessible as we would like. As a result, secure systems avoid the URI option.
It is also undesirable to find links appearing in search engines that include a session ID as part of the URI. These two points are enough to rule out the URI option for passing session ID. It can be prevented by the following PHP calls:
It is best to avoid the default name of PHPSESSID for the session cookie, since that is something that a cracker could look for in the network traffic. One step that can be taken is to create a session name that is the MD5 hash of various items of internal information. This makes it harder but not impossible to sniff messages to find out a session ID, since it is no longer obvious what to seek—the well known name of PHPSESSID is not used.
It is important for the session ID to be unpredictable, but we rely on PHP to achieve that. It is also desirable that the ID be long, since otherwise it might be possible for an attacker to try out all possible values within the life of a session. PHP uses 32 hexadecimal digits, which is a reasonable defense for most purposes.
The other main vulnerability apart from session hijacking is called session fixation. This is typically implemented by a cracker setting up a link that takes the user to your site with a session already established, and known to the cracker.
An important security step that is employed by robust systems is to change the session ID at significant points. So, although a session may be created as soon as a visitor arrives at the site, the session ID is changed at login. This technique is used by Amazon among others so that people can browse for items and build up a shopping cart, but on purchase a fresh login is required. Doing this reduces the available window for a cracker to obtain, and use, the session ID. It also blocks session fixation, since the original session is abandoned at critical points. It is also advisable to change the ID on logout, so although the session is continued, its data is lost and the ID is not the same.
It is highly desirable to provide logout as an option, but this needs to be supplemented by time limits on inactive sessions. A significant part of session handling is devoted to keeping enough information to be able to expire sessions that have not been used for some time. It also makes sense to revoke a session that seems to have been used for any suspicious activity.
Ideally, the session ID is never transmitted unencrypted, but achieving this requires the use of SSL, and is not always practical. It should certainly be considered for high security applications.
Search engine bots
One aspect of website building is, perhaps unexpectedly, the importance of handling the bots that crawl the web. They are often gathering data for search engines, although some have more dubious goals, such as trawling for e-mail addresses to add to spam lists. The load they place on a site can be substantial. Sometimes, search engines account for half or more of the bandwidth being used by a site, which certainly seems excessive.
If no action is taken, these bots can consume significant resources, often for very little advantage to the site owner. They can also distort information about the site, such as when the number of current visitors is displayed but includes bots in the counts.
Matters are made worse by the fact that bots will normally fail to handle cookies. After all, they are not browsers and have no need to implement support for cookies. This means that every request by a bot is separate from every other, as our standard mechanism for linking requests together will not work. If the system starts a new session, it will have to do this for every new request from a bot. There will never be a logout from the bot to terminate the session, so each bot-related session will last for the time set for automatic expiry.
Clearly it is inadvisable to bar bots, since most sites are anxious to gain search engine exposure. But it is possible to build session handling so as to limit the workload created by visitors who do not permit cookies, which will mostly be bots. When we move into implementation techniques, the mechanisms will be demonstrated.
Session data and scalability
We could simply let PHP take care of session data. It does that by writing a serialized version of any data placed into $_SESSION into a file in a temporary directory. Each session has its own file.
But PHP also allows us to implement our own session data handling mechanism. There are a couple of good reasons for using that facility, and storing the information in the database. One is that we can analyze and manage the data better, and especially limit the overhead of dealing with search engine bots. The other is that by storing session data in the database, we make it feasible for the site to be run across multiple servers. There may well be other issues before that can be achieved, but providing session continuity is an essential requirement if load sharing is to be fully effective. Storing session data in a database is a reliable solution to this issue.
Arguments against storing session data in a database include questions about the overhead involved, constraints on database performance, or the possibility of a single point of failure. While these are real issues, they can certainly be mitigated. Most database engines, including MySQL, have many options for building scalable and robust systems. If necessary, the database can be spread across multiple computers linked by a high speed network, although this should never be done unless it is really needed. Design of such a system is outside the scope of this article, but the key point is that the arguments against storing session data in a database are not particularly strong.