|Read more about this book|
(For more resources on Proxy Servers, see here.)
A proxy server is a computer system sitting between the client requesting a web document and the target server (another computer system) serving the document. In its simplest form, a proxy server facilitates communication between client and target server without modifying requests or replies. When we initiate a request for a resource from the target server, the proxy server hijacks our connection and represents itself as a client to the target server, requesting the resource on our behalf. If a reply is received, the proxy server returns it to us, giving a feel that we have communicated with the target server.
In advanced forms, a proxy server can filter requests based on various rules and may allow communication only when requests can be validated against the available rules. The rules are generally based on an IP address of a client or target server, protocol, content type of web documents, web content type, and so on.
As seen in the preceding image, clients can’t make direct requests to the web servers. To facilitate communication between clients and web servers, we have connected them using a proxy server which is acting as a medium of communication for clients and web servers.
Sometimes, a proxy server can modify requests or replies, or can even store the replies from the target server locally for fulfilling the same request from the same or other clients at a later stage. Storing the replies locally for use at a later time is known as caching. Caching is a popular technique used by proxy servers to save bandwidth, empowering web servers, and improving the end user’s browsing experience.
Proxy servers are mostly deployed to perform the following:
- Reduce bandwidth usage
- Enhance the user’s browsing experience by reducing page load time which, in turn, is achieved by caching web documents
- Enforce network access policies
- Monitoring user traffic or reporting Internet usage for individual users or groups
- Enhance user privacy by not exposing a user’s machine directly to Internet
- Distribute load among different web servers to reduce load on a single server
- Empower a poorly performing web server
- Filter requests or replies using an integrated virus/malware detection system
- Load balance network traffic across multiple Internet connections
- Relay traffic around within a local area network
In simple terms, a proxy server is an agent between a client and target server that has a list of rules against which it validates every request or reply, and then allows or denies access accordingly.
Reverse proxying is a technique of storing the replies or resources from a web server locally so that the subsequent requests to the same resource can be satisfied from the local copy on the proxy server, sometimes without even actually contacting the web server. The proxy server or web cache checks if the locally stored copy of the web document is still valid before serving the cached copy.
The life of the locally stored web document is calculated from the additional HTTP headers received from the web server. Using HTTP headers, web servers can control whether a given document/response should be cached by a proxy server or not.
Web caching is mostly used:
- By ISPs to reduce average page load time to enhance browsing experience for their customers on Dial-Up or broadband.
- To take a load off a very busy web server by serving static pages/documents from a proxy server’s cache.
Squid is available in several forms (compressed source archives, source code from a version control system, binary packages such as RPM, DEB, and so on) from Squid’s official website, various Squid mirrors worldwide, and software repositories of almost all the popular operating systems. Squid is also shipped with many Linux/Unix distributions.
There are various versions and releases of Squid available for download from Squid’s official website. To get the most out of a Squid installation its best to check out the latest source code from a Version Control System (VCS) so that we get the latest features and fixes. But be warned, the latest source code from a VCS is generally leading edge and may not be stable or may not even work properly. Though code from a VCS is good for learning or testing Squid’s new features, you are strongly advised not to use code from a VCS for production deployments.
If we want to play safe, we should probably download the latest stable version or stable version from the older releases. Stable versions are generally tested before they are released and are supposed to work out of the box. Stable versions can directly be used in production deployments.
Time for action – identifying the right version
A list of available versions of Squid is maintained at http://www.squid-cache.org/Versions/. For production environments, we should use versions listed under the Stable Versions section only. If we want to test new Squid features in our environment or if we intend to provide feedback to the Squid community about the new version, then we should be using one of the Beta Versions.
As we can see in the preceding screenshot, the website contains the First Production Release Date and Latest Release Date for the stable versions. If we click on any of the versions, we are directed to a page containing a list of all the releases in that particular version. Let’s have a look at the page for version 3.1:
For every release, along with a release date, there are links for downloading compressed source archives.
Different versions of Squid may have different features. For example, all the features available in Squid version 2.7 may or may not be available in newer versions such as Squid 3.x. Some features may have been deprecated or have become redundant over time and they are generally removed. On the other hand, Squid 3.x may have several new features or existing features in an improved and revised manner.
Therefore, we should always aim for the latest version, but depending on the environment, we may go for stable or beta version. Also, if we need specific features that are not available in the latest version, we may choose from the available releases in a different branch.
What just happened?
We had a brief look at the pages containing the different versions and releases of Squid, on Squid’s official website. We also learned which versions and releases that we should download and use for different types of usage.
Methods of obtaining Squid
After identifying the version of Squid that we should be using for compiling and installation, let’s have a look at the ways in which we can obtain Squid release 3.1.10.
Using source archives
Compressed source archives are the most popular way of getting Squid. To download the source archive, please visit Squid download page, http://www.squid-cache.org/Download/. This web page has links for downloading the different versions and releases of Squid, either from the official website or available mirrors worldwide. We can use either HTTP or FTP for getting the Squid source archive.
Time for action – downloading Squid
Now we are going to download Squid 3.1.10 from Squid’s official website:
- Let’s go to the web page http://www.squid-cache.org/Versions/.
- Now we need to click on the link to Version 3.1, as shown in the following screenshot:
- We’ll be taken to a page displaying the various releases in version 3.1. The link with the display text tar.gz in the Download column is a link to the compressed source archive for Squid release 3.1.10, as shown in the following screenshot:
- To download Squid 3.1.10 using the web browser, just click on the link.
- Alternatively, we can use wget to download the source archive from the command line as follows:
What just happened?
We successfully retrieved Squid version 3.1.10 from Squid’s official website. The process of retrieving other stable or beta versions is very similar.
Obtaining the latest source code from Bazaar VCS
Advanced users may be interested in getting the very latest source code from the Squid code repository, using Bazaar. We can safely skip this section if we are not familiar with VCS in general. Bazaar is a popular version control system used to track project history and facilitate collaboration. From version 3.x onwards, Squid source code has been migrated to Bazaar. Therefore, we should ensure that we have Bazaar installed on our system in order to checkout the source code from repository. To find out more about Bazaar or for Bazaar installation and configuration manuals, please visit Bazaar’s official website at http://bazaar.canonical.com/.
Once we have setup Bazaar, we should head to the Squid code repository mirrored on Launchpad at https://code.launchpad.net/squid/. From here we can browse all the versions and branches of Squid. Let’s get ourselves familiar with the page layout:
In the previous screenshot, Series: trunk represents the development branch, which contains code that is still in development and is not ready for production use. The branches with the status Mature are stable and can be used right away in production environments.
Time for action – using Bazaar to obtain source code
Now that we are familiar with the various branches, versions, and releases. Let’s proceed to checking out the source code with Bazaar. To download code from any branch, the syntax for the command is as follows:
bzr branch lp:squid[/branch[/version]]
branch and version are optional parameters in the previous code. So, if we want to get branch 3.1, then the command will be as follows:
bzr branch lp:squid/3.1
The previous command will fetch source code from Launchpad and may take a considerable amount of time, depending on the Internet connection. If we are willing to download source code for Squid version 3.1.10, then the command will be as follows:
bzr branch lp:squid/3.1/3.1.10
In the previous code, 3.1 is the branch name and 3.1.10 is the specific version of Squid that we want to checkout.
What just happened?
We learned to fetch the source code for any Squid branch or release using Bazaar from Squid’s source code hosted on Launchpad.
Have a go hero – fetching the source code
Using the command syntax that we learned in the previous section, fetch the source code for Squid version 3.0.stable25 from Launchpad.
bzr branch lp:squid/3.0/3.0.stable25
- Explanation: If we browse to the particular version on Launchpad, the version number used in the command becomes obvious.
Using binary packages
Squid binary packages are pre-compiled and ready to install software bundles. Binary packages are available in the software repositories of almost all Linux/Unix-based operating systems. Depending on the operating system, only stable and sometimes well tested beta versions make it to the software repositories, so they are ready for production use.
Squid can be installed using the source code we obtained in the previous section, using a package manager which, in turn, uses the binary package available for our operating system. Let’s have a detailed look at the ways in which we can install Squid.
Installing Squid from source code
Installing Squid from source code is a three step process:
- Select the features and operating system-specific settings.
- Compile the source code to generate the executables.
- Place the generated executables and other required files in their designated locations for Squid to function properly.
We can perform some of the above steps using automated tools that make the compilation and installation process relatively easy.
Compiling Squid is a process of compiling several files containing C/C++ source code and generating executables. Compiling Squid is really easy and can be done in a few steps. For compiling Squid, we need an ANSI C/C++ compliant compiler. If we already have a GNU C/C++ Compiler (GNU Compiler Collection (GCC) and g++, which are available on almost every Linux/Unix-based operating system by default), we are ready to begin the actual compilation.
Compiling Squid is a bit of a painful task compared to installing Squid from the binary package. However, we recommend compiling Squid from the source instead of using pre-compiled binaries. Let’s walk through a few advantages of compiling Squid from the source:
- While compiling we can enable extra features, which may not be enabled in the pre-compiled binary package.
- When compiling, we can also disable extra features that are not needed for a particular environment. For example, we may not need Authentication helpers or ICMP support.
- configure probes the system for several features and enables or disables them accordingly, while pre-compiled binary packages will have the features detected for the system the source was compiled on.
- Using configure, we can specify an alternate location for installing Squid. We can even install Squid without root or super user privileges, which may not be possible with pre-compiled binary package.
Though compiling Squid from source has a lot of advantages over installing from the binary package, the binary package has its own advantages. For example, when we are in damage control mode or a crisis situation and we need to get the proxy server up and running really quickly, using a binary package for installation will provide a quicker installation.
Uncompressing the source archive
If we obtained the Squid in a compressed archive format, we must extract it before we can proceed any further. If we obtained Squid from Launchpad using Bazaar, we don’t need to perform this step.
tar -xvzf squid-3.1.10.tar.gz
tar is a popular command which is used to extract compressed archives of various types. On the other hand, it can also be used to compress many files into a single archive. The preceding command will extract the archive to a directory named squid-3.1.10.