5 min read

(For more resources related to this topic, see here.)

Getting ready

In this article we will use cURL to request and download a web page from a server.

How to do it…

  1. Enter the following code into a new PHP project:

    <?php

    // Function to make GET request using cURL
    function curlGet($url) {

    $ch = curl_init(); // Initialising cURL session

    // Setting cURL options
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
    curl_setopt($ch, CURLOPT_URL, $url);

    $results = curl_exec($ch); // Executing cURL session

    curl_close($ch); // Closing cURL session

    return $results; // Return the results
    }

    $packtPage = curlGet(‘http://www.packtpub.com/oop-php-5/book’);

    echo $packtPage;
    ?>

    
    
  2. Save the project as 2-curl-request.php (ensure you use the .php extension!).
  3. Execute the script.
  4. Once our script has completed, we will see the source code of http://www.packtpub.com/oop-php-5/book displayed on the screen.

How it works…

Let’s look at how we performed the previously defined steps:

  1. The first line, <?php, and the last line,?>, indicate where our PHP code block will begin and end. All the PHP code should appear between these two tags.
  2. Next, we create a function called curlGet(), which accepts a single parameter $url, the URL of the resource to be requested.
  3. Running through the code inside the curlGet() function, we start off by initializing a new cURL session as follows:

    $ch = curl_init();

    
    
  4. We then set our options for cURL as follows:

    curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
    // Tells cURL to return the results of the request (the source
    code of the target page) as a string.

    curl_setopt($ch, CURLOPT_URL, $url);
    // Here we tell cURL the URL we wish to request, notice that it is
    the $url variable that we passed into the function as a parameter.

    
    
  5. We execute our cURL request, storing the returned string in the $results variable as follows:

    $results = curl_exec($ch);

    
    
  6. Now that the cURL request has been made and we have the results, we close the cURL session by using the following code:

    curl_close($ch);

    
    
  7. At the end of the function, we return the $results variable containing our requested page, out of the function for using in our script.
    return $results;
  8. After the function is closed we are able to use it throughout the rest of our script.
  9. Later, deciding on the URL we wish to request, http://www.packtpub.com/oop-php-5/book , we execute the function, passing the URL as a parameter and storing the returned data from the function in the $packtPage variable as follows:

    $packtPage = curlGet(‘http://www.packtpub.com/oop-php-5/book’);

    
    
  10. Finally, we echo the contents of the $packtPage variable (the page we requested) to the screen by using the following code:

    echo $packtPage;

    
    

There’s more…

There are a number of different HTTP request methods which indicate the server the desired response, or the action to be performed. The request method being used in this article is cURLs default GET request. This tells the server that we would like to retrieve a resource.

Depending on the resource we are requesting, a number of parameters may be passed in the URL. For example, when we perform a search on the Packt Publishing website for a query, say, php, we notice that the URL is http://www.packtpub.com/books?keys=php. This is requesting the resource books (the page that displays search results) and passing a value of php to the keys parameter, indicating that the dynamically generated page should show results for the search query php.

More cURL Options

Of the many cURL options available, only two have been used in our preceding code. They are CURLOPT_RETURNTRANSFER and CURLOPT_URL. Though we will cover many more throughout the course of this article, some other options to be aware of, that you may wish to try out, are listed in the following table:

Option Name Value Purpose
CURLOPT_FAILONERROR TRUE or FALSE If a response code greater than 400 is returned, cURL will fail silently.
CURLOPT_FOLLOWLOCATION TRUE or FALSE If Location: headers are sent by the server, follow the location.
CURLOPT_USERAGENT A user agent string, for example: ‘Mozilla/5.0 (Macintosh; Intel Mac OS X 10.5; rv:15.0) Gecko/20100101 Firefox/15.0.1’ Sending the user agent string in your request informs the target server, which client is requesting the resource. Since many servers will only respond to ‘legitimate’ requests it is advisable to include one.
CURLOPT_HTTPHEADER An array containing header information, for example: array(‘Cache-Control: max-age=0’, ‘Connection: keep-alive’, ‘Keep-Alive: 300’, ‘Accept-Language: en-us,en;q=0.5’) This option is used to send header information with  the request and we will come across use cases for this in later recipes.

A full listing of cURL options can be found on the PHP website at http://php.net/manual/en/function.curl-setopt.php.

The HTTP response code

An HTTP response code is the number that is returned, which corresponds with the result of an HTTP request. Some common response code values are as follows:

  • 200: OK
  • 301: Moved Permanently
  • 400: Bad Request
  • 401: Unauthorized
  • 403: Forbidden
  • 404: Not Found
  • 500: Internal Server Error

Summary

This article covers techniques on making a simple cURL request. It is often useful to have our scrapers responding to different response code values in a different manner, for example, letting us know if a web page has moved, or is no longer accessible, or we are unauthorized to access a particular page.

In this case, we can access the response of a request using cURL by adding the following line to our function, which will store the response code in the $httpResponse variable:

$httpResponse = curl_getinfo($ch, CURLINFO_HTTP_CODE);


Resources for Article:


Further resources on this subject:


LEAVE A REPLY

Please enter your comment!
Please enter your name here