Parse a Google Page

Google logo

The necessary documentation about parsing a google page


Back to the general google documentation.


Parsing a page consists in the structured extraction of parts of the page. The end result is the ability to make the distinction between these different parts and to gather details on them to show them in another fashion.

Important notice about google update

The following examples can change at any time.

As soon as google changes its page structure, you may need to update the library. You can watch the repository on github to be warned of new releases as they come.

Classical Results

A google SERP can contain different type of result. Firstly they are divided in three distinct regions: natural (organic), paid (adwords) and graph results and each of them has its own results types. Graph result are not supported by the library.

There is a great diversity of results and the library gives you the api to work with them, here we document how you will work with.

Natural Results

Natural results (aka organic results) are main results of the page.

Each natural result has a position and some available data. You can access them the following way (see the foreach loop):

    use Serps\SearchEngine\Google\GoogleClient;
    use Serps\SearchEngine\Google\GoogleUrl;

    $googleClient = new GoogleClient($httpClient);

    $googleUrl = new GoogleUrl();
    $google->setSearchTerm('simpsons');

    $response = $googleClient->query($googleUrl);

    $results = $response->getNaturalResults();

    foreach($results as $result){
        // Here we iterate over the result list
        // Each result will have different data based on its type
    }

Each of the result from the loop will have the following methods available:

The difference between each result type is the list of data available with getDataValue($type) and getData(). See bellow for all available data per result type.

Natural Result Types

Result types can be accessed through the class NaturalResultType,

    use Serps\SearchEngine\Google\NaturalResultType;

    if($result->is(NaturalResultType::CLASSICAL)){
        // Do stuff
    }

    // You can also check many types at once
    // Here we check if the result is classical or image group

    if($result->is(NaturalResultType::CLASSICAL, NaturalResultType::IMAGE_GROUP)){
        // Do stuff
    }

From the resultSet you can also access all the results matching one of the given type:

    // Get all the results that are either classical or image_group
    $results = $results->getResultsByType(NaturalResultType::CLASSICAL, NaturalResultType::IMAGE_GROUP);

Classical

These results are the common natural results that have always existed in google.

Classical Results

Available with

Data

Example

    use Serps\SearchEngine\Google\NaturalResultType;

    $results = $response->getNaturalResults();

    foreach($results as $result){
        if($result->is(NaturalResultType::CLASSICAL)){
            $title = $result->title;
            $url   = $result->url;
        }
    }

Classical Large

This type is an extension of the classical result, with sitelinks in addition.

Largent Classical Results

Mobile version (since version 0.2):

Largent Classical Results on mobiles

Available with

Data

Example

    use Serps\SearchEngine\Google\NaturalResultType;

    $results = $response->getNaturalResults();

    foreach($results as $result){
        if($result->is(NaturalResultType::CLASSICAL_LARGE)){
            $title = $result->title;
            $url   = $result->url;
            $sitelinks = $result->sitelinks;
            foreach ($sitelinks as $sitelink) {
                $sitelinkTitle = $sitelink->title;
            }
        }
    }

Classical Video

This type an extension of the classical result, but it refers to a video result.

The video result can be illustrated with either a thumbnail or a large image.

Video Results

Available with

Data

Example

    use Serps\SearchEngine\Google\NaturalResultType;


    $results = $response->getNaturalResults();

    foreach($results as $result){
        if($result->is(NaturalResultType::CLASSICAL_VIDEO)){
            $title = $result->title;
            if($result->videoLarge){
                // ...
            }
        }
    }

Classical Illustrated

Classical results might have an additional CLASSICAL_ILLUSTRATED type when the results is illustrated with a thumbnail. Non large video results have this type as well.

Available with

Data

Image Group

Images that appear as a group of results.

Image Group Result

Mobile version (since version 0.2):

Image Group Result

Available with

Data

Example

    use Serps\SearchEngine\Google\NaturalResultType;


    $results = $response->getNaturalResults();

    foreach($results as $result){
        if($result->is(NaturalResultType::IMAGE_GROUP)){
            foreach($result->images as $image){
                $sourceUrl = $image->sourceUrl;
            }
        }
    }

Video Group

This type is present on mobile results and was added with version 0.2.

It shows some videos (usualy 10) arranged in a carousel.

Video Results

Available with

Data

Example

    use Serps\SearchEngine\Google\NaturalResultType;


    $results = $response->getNaturalResults();

    foreach($results as $result){
        if($result->is(NaturalResultType::VIDEO_GROUP)){
            foreach($result->videos as $video){
                $url = $video->url;
            }
        }
    }

Map

A result illustrated by a map and that contains sub-results.

Map Result

Available with

Data

Example

    use Serps\SearchEngine\Google\NaturalResultType;


    $results = $response->getNaturalResults();

    foreach($results as $result){
        if($result->is(NaturalResultType::MAP)){
            foreach($result->localPack as $place){
                $website = $place->website;
            }
        }
    }

Answer Box

Block that answers a question asked by the keywords.

Answer Box Results

Available with

Data

Example

    use Serps\SearchEngine\Google\NaturalResultType;

    $results = $response->getNaturalResults();

    foreach($results as $result){
        if($result->is(NaturalResultType::ANSWER_BOX)){
            $title = $result->title;
            $url   = $result->url;
        }
    }

Knowledge

Since version 0.2.1

Knowledge boxes that appear among mobile results.

Be aware that knowledge results are only included if they are present among the result list. That means that on non-mobile results knowledge results are not available because they are placed on the right of natural results.

Knowledge Results

Available with

Data

Example

    use Serps\SearchEngine\Google\NaturalResultType;

    $results = $response->getNaturalResults();

    foreach($results as $result){
        if($result->is(NaturalResultType::KNOWLEDGE)){
            $title = $result->title;
            $description   = $result->shortDescription;
        }
    }

Recent tweet list from an user matching the search keywords.

Tweets Carousel

Available with

Data

Example

    use Serps\SearchEngine\Google\NaturalResultType;


    $results = $response->getNaturalResults();

    foreach($results as $result){
        if($result->is(NaturalResultType::TWEETS_CAROUSEL)){
            $user = $result->user;
        }
    }

In the News

Recent news results.

These results do not exists anymore

In early 2017 Google deleted in the news results, they were replaced by "top stories" results. These results are now deprecated and might be deleted from serps in future releases.

In the News

Available with

Data

Example

    use Serps\SearchEngine\Google\NaturalResultType;


    $results = $response->getNaturalResults();

    foreach($results as $result){
        if($result->is(NaturalResultType::IN_THE_NEWS)){
            foreach($result->news as $news){
                $newsTitle = $title;
                $newsUrl = $url;
            }
        }
    }

Top Stories

Carousel of recent popular news.

Implemented in version 0.1.4 as a successor for "in the news"

Top stories might be present in 2 distinctive formats: carousel or vertical. Note that vertical form is very rare.

Carousel

Topstories carousel

Vertical

Topstories carousel

Available with

Data

Example

    use Serps\SearchEngine\Google\NaturalResultType;


    $results = $response->getNaturalResults();

    foreach($results as $result){
        if($result->is(NaturalResultType::TOP_STORIES)){
            foreach($result->news as $news){
                $newsTitle = $title;
                $newsUrl = $url;
            }
        }
    }

Flights

Flight sample from google flights

Flights

Available with

Data

No data is parsed from flight results. There is no plan to implement it because it's complex and not very useful.

Example

    use Serps\SearchEngine\Google\NaturalResultType;

    $results = $response->getNaturalResults();

    foreach($results as $result){
        if($result->is(NaturalResultType::FLIGHTS)){
            // Got a flight result
        }
    }

Adwords Results

The google client offers an Adwords parser.

Warning

Adwords parsing is still experimental!

    $adwordsResults = $response->getAdwordsResults();

    foreach($results as $result){
        // do stuff
    }

Adwords sections

Adwords results are composed from 3 distinct sections. These sections can be at the top, at the right or at the bottom of the natural results. See the schema:

Adwords positions

By default all results are available in the result set, if you need to get results from a section, you can use the section as a type filter:

    use Serps\SearchEngine\Google\AdwordsResultType;

    $adwordsResults = $response->getAdwordsResults();

    $topResults = $adwordsResults->getResultsByType(AdwordsResultType::SECTION_TOP);
    $rightResults = $adwordsResults->getResultsByType(AdwordsResultType::SECTION_RIGHT);
    $bottomResults = $adwordsResults->getResultsByType(AdwordsResultType::SECTION_BOTTOM);

    foreach($topResults as $result){
        // Do stuff...
    }

Adwords Types

Ads results are the basics results from adwords.

Adwords ads

Available with

Data

Example

    use Serps\SearchEngine\Google\AdwordsResultType;


    $results = $response->getAdwordsResults();

    foreach($results as $result){
        if($result->is(AdwordsResultType::AD)){
            $url = $result->url;
        }
    }

Shopping

These are the results from google shopping/merchant.

Google shopping

Available with

Data

Example

    use Serps\SearchEngine\Google\AdwordsResultType;

    $results = $response->getAdwordsResults();

    foreach($results as $result){
        if($result->is(AdwordsResultType::SHOPPING_GROUP)){
            foreach($result->products as $item){
                $title = $item->title;
            }
        }
    }

Additional info

A Google SERP contains even more information that the result list. Sometime they will be very helpful to get the most from the SERP.

Here is the list of these info currently supported by the parser.

Number of results

number of results

Represents the total number of results returned by the current search. The format of this number can change from country to country (61,000,000 or 61 000 000 or 6,10,00,000 etc...) We take care of returning this number as a integer no matter the initial format.

In some cases this number is not available (for instance with mobile layout)

    $numberOfResults = $response->getNumberOfResults();

    if(null === $numberOfResults){
        // D'oh!
    } elseif($numberOfResults < 2000) {
        // ...
    } else {
        // ...
    }

related searches

Google uses to give a list of related searches at the bottom of the page. The method getRelatedSearches will return a list of these items.

    $relatedSearches = $response->getRelatedSearches();
    foreach ($relatedSearches as $relatedSearch) {
        $url = $relatedSearch->url;
        $title = $relatedSearch->title;
    }

Custom parsing

Sometimes you need information that are not available in our parser.

First of all, search if someone already asked for this feature on the issue tracker.

If you don't find a trace of this feature, but you still consider that this feature is important, then open an issue and let's discuss it. This is very important because if the feature is implemented in the library it will take advantage of being updated on google updates, and you wont have to maintain it.


Back from the issue tracker, no one mentioned it and you still want to parse the information by yourself. Alright, here are the tools you need.

Query with css

The easiest way to do it for a web developer: with css.

    $response = $googleClient->query($googleUrl);

    // Returns \DOMNodeList
    $queryResult = $response->cssQuery('#someId');

    if ($queryResult->length == 1) {
        // You can query again to find items in the previous context.

        // Gets all items with the class 'someClass' within the element with the id 'someId'
        $queryResult = $response->cssQuery('.someClass', $queryResult->item(0));
    } else {
        // some errors...
    }

It works exactly as DOMXPath::query does. Actually the css is translated to xpath and DOMXPath::query is called on the dom element.

Query with xpath

That's very similar to the css way, except that you will use xpath.

    $response = $googleClient->query($googleUrl);

    $queryResult = $response->xpathQuery('descendant::div[@id="someId"]');

    if ($queryResult->length == 1) {
        // Gets all 'a' tags inside the element with the id 'someId'.
        $queryResult = $response->xpathQuery('a', $queryResult->item(0));
    } else {
        // some errors...
    }

There is also a shortcut to the xpath object.

    $response = $googleClient->query($googleUrl);

    $xpath = $response->getXpath();
    $xpath->query('someXpath');

Manipulate the DOM object

You can get the DOM object to manipulate it, or to save it in a file.

    $response = $googleClient->query($googleUrl);

    $dom = $response->getDom();

    // Writes the dom content in the file 'file.html'
    $dom->save('file.html');

view also: