Parse a Google Page

Back to the general google documentation.
Parsing a page consists in the structured extraction of parts of the page. The end result is the ability to make the distinction between these different parts and to gather details on them to show them in another fashion.
Important notice about google update
The following examples can change at any time.
As soon as google changes its page structure, you may need to update the library. You can watch the repository on github to be warned of new releases as they come.

A google SERP can contain different type of result. Firstly they are divided in three distinct regions: natural (organic), paid (adwords) and graph results and each of them has its own results types. Graph result are not supported by the library.
There is a great diversity of results and the library gives you the api to work with them, here we document how you will work with.
Natural Results
Natural results (aka organic results) are main results of the page.
Each natural result has a position and some available data. You can access them the following way (see the foreach loop):
use Serps\SearchEngine\Google\GoogleClient;
use Serps\SearchEngine\Google\GoogleUrl;
$googleClient = new GoogleClient($httpClient);
$googleUrl = new GoogleUrl();
$google->setSearchTerm('simpsons');
$response = $googleClient->query($googleUrl);
$results = $response->getNaturalResults();
foreach($results as $result){
// Here we iterate over the result list
// Each result will have different data based on its type
}
Each of the result from the loop will have the following methods available:
getTypes(): the types of the resultis($type): check if the result is of the given typegetDataValue($type): Get the given data from the result. Everything accessible withgetDataValueis also accessible with a property, e.g the two examples do the same thing:$result->getDataValue('url')and$result->urlgetData(): Get the all the data of the resultgetOnPagePosition(): Get the position of the result on the page (starting at 1)getRealPosition(): Get the global position of the result (starting at 1), that means it is aware of the pagination. Be aware that in some circumstances this number can be wrong because google might show more results on the previous pages.
The difference between each result type is the list of data available with getDataValue($type) and getData().
See bellow for all available data per result type.
Natural Result Types
Result types can be accessed through the class NaturalResultType,
use Serps\SearchEngine\Google\NaturalResultType;
if($result->is(NaturalResultType::CLASSICAL)){
// Do stuff
}
// You can also check many types at once
// Here we check if the result is classical or image group
if($result->is(NaturalResultType::CLASSICAL, NaturalResultType::IMAGE_GROUP)){
// Do stuff
}
From the resultSet you can also access all the results matching one of the given type:
// Get all the results that are either classical or image_group
$results = $results->getResultsByType(NaturalResultType::CLASSICAL, NaturalResultType::IMAGE_GROUP);
Classical
These results are the common natural results that have always existed in google.

Available with
NaturalResultType::CLASSICAL
Data
titlestring [A]urlstring: the url targeted on clicking the titledestinationstring [B]: either a url or a breadcrumb-like destinationdescriptionstring [C]isAmpboolean true if the results is anAMPresult
Example
use Serps\SearchEngine\Google\NaturalResultType;
$results = $response->getNaturalResults();
foreach($results as $result){
if($result->is(NaturalResultType::CLASSICAL)){
$title = $result->title;
$url = $result->url;
}
}
Classical Large
This type is an extension of the classical result, with sitelinks in addition.

Mobile version (since version 0.2):

Available with
NaturalResultType::CLASSICAL_LARGENaturalResultType::CLASSICAL
Data
titlestring [A]urlstring: the url targeted on clicking the titledestinationstring [B]: either a url or a breadcrumb-like destinationdescriptionstring [C]sitelinksarray:titlestring [D]urlstring: the url targeted on clicking the sitelink titledescriptionstring [E]
Example
use Serps\SearchEngine\Google\NaturalResultType;
$results = $response->getNaturalResults();
foreach($results as $result){
if($result->is(NaturalResultType::CLASSICAL_LARGE)){
$title = $result->title;
$url = $result->url;
$sitelinks = $result->sitelinks;
foreach ($sitelinks as $sitelink) {
$sitelinkTitle = $sitelink->title;
}
}
}
Classical Video
This type an extension of the classical result, but it refers to a video result.
The video result can be illustrated with either a thumbnail or a large image.

Available with
NaturalResultType::CLASSICAL_VIDEONaturalResultType::CLASSICAL
Data
titlestring [A]urlstring: the url targeted on clicking the titledestinationstring [B]: either a url or a breadcrumb-like destinationdescriptionstring [C]videoLargebool: true if the video is image is large (usually first result)videoCoverMedia Object: The video cover. Only if videoLarge is true
Example
use Serps\SearchEngine\Google\NaturalResultType;
$results = $response->getNaturalResults();
foreach($results as $result){
if($result->is(NaturalResultType::CLASSICAL_VIDEO)){
$title = $result->title;
if($result->videoLarge){
// ...
}
}
}
Classical Illustrated
Classical results might have an additional CLASSICAL_ILLUSTRATED type when the results is
illustrated with a thumbnail. Non large video results have this type as well.
Available with
NaturalResultType::CLASSICAL_ILLUSTRATED
Data
thumbMedia Object: the thumbnail
Image Group
Images that appear as a group of results.

Mobile version (since version 0.2):

Available with
NaturalResultType::IMAGE_GROUP
Data
imagesarray: the list of images that compose the image group, each image contains:sourceUrlstring: the url where the image was foundtargetUrlstring: the url reached on clicking the imageimagestring: the image data as specified by google (either an image url or a base64 encoded image)
moreUrlstring: The url corresponding to the google image searchisCarouselboolean: True if images have the form of a carousel (since version 0.2)
Example
use Serps\SearchEngine\Google\NaturalResultType;
$results = $response->getNaturalResults();
foreach($results as $result){
if($result->is(NaturalResultType::IMAGE_GROUP)){
foreach($result->images as $image){
$sourceUrl = $image->sourceUrl;
}
}
}
Video Group
This type is present on mobile results and was added with version 0.2.
It shows some videos (usualy 10) arranged in a carousel.

Available with
NaturalResultType::VIDEO_GROUP
Data
videosarray: the list of images that compose the image group, each image contains:titlestring: Title of the videourlstring: the url reached on clicking the itemimageMedia object: image of the video
Example
use Serps\SearchEngine\Google\NaturalResultType;
$results = $response->getNaturalResults();
foreach($results as $result){
if($result->is(NaturalResultType::VIDEO_GROUP)){
foreach($result->videos as $video){
$url = $video->url;
}
}
}
Map
A result illustrated by a map and that contains sub-results.

Available with
NaturalResultType::MAP
Data
localPackarray: The sub results for the map:titlestring [A]: Name of the placeurlstring [B]: Website of the sub-resultstreetstring [C]: The address of the sub-resultstarsstring [D]: The rating of the result as a numberreviewstring [E]: The review string as specified by google (e.g '1 review')phonestring [G]: The phone number
mapUrlstring [F]: The url to access the map search
Example
use Serps\SearchEngine\Google\NaturalResultType;
$results = $response->getNaturalResults();
foreach($results as $result){
if($result->is(NaturalResultType::MAP)){
foreach($result->localPack as $place){
$website = $place->website;
}
}
}
Answer Box
Block that answers a question asked by the keywords.

Available with
NaturalResultType::ANSWER_BOX
Data
titlestring [A]urlstring: the url targeted on clicking the titledestinationstring [B]: either a url or a breadcrumb-like destinationdescriptionstring [C]
Example
use Serps\SearchEngine\Google\NaturalResultType;
$results = $response->getNaturalResults();
foreach($results as $result){
if($result->is(NaturalResultType::ANSWER_BOX)){
$title = $result->title;
$url = $result->url;
}
}
Knowledge
Since version 0.2.1
Knowledge boxes that appear among mobile results.
Be aware that knowledge results are only included if they are present among the result list. That means that on non-mobile results knowledge results are not available because they are placed on the right of natural results.

Available with
NaturalResultType::KNOWLEDGE
Data
titlestring [A]shortDescriptionstring [B] nature of the element (character,
Example
use Serps\SearchEngine\Google\NaturalResultType;
$results = $response->getNaturalResults();
foreach($results as $result){
if($result->is(NaturalResultType::KNOWLEDGE)){
$title = $result->title;
$description = $result->shortDescription;
}
}
People Also Ask
Since version 0.2.3
List of questions that people also ask

Available with
NaturalResultType::PEOPLE_ALSO_ASK
Data
questionsarrayquestionstring
Example
use Serps\SearchEngine\Google\NaturalResultType;
$results = $response->getNaturalResults();
foreach($results as $result){
if($result->is(NaturalResultType::PEOPLE_ALSO_ASK)){
$questions = $result->questions;
foreach ($questions as $question) {
$questionText = $question->question;
}
}
}
Tweet Carousel
Recent tweet list from an user matching the search keywords.

Available with
NaturalResultType::TWEETS_CAROUSEL
Data
titlestring [A]urlstring: The url reach when clicking the titleuserstring: The author of the tweets
Example
use Serps\SearchEngine\Google\NaturalResultType;
$results = $response->getNaturalResults();
foreach($results as $result){
if($result->is(NaturalResultType::TWEETS_CAROUSEL)){
$user = $result->user;
}
}
In the News
Recent news results.
These results do not exists anymore
In early 2017 Google deleted in the news results, they were replaced by "top stories" results. These results are now deprecated and might be deleted from serps in future releases.

Available with
NaturalResultType::IN_THE_NEWS
Data
newsarraytitlestring [A]descriptionstring [B]urlstring: The url reached when clicking the title
Example
use Serps\SearchEngine\Google\NaturalResultType;
$results = $response->getNaturalResults();
foreach($results as $result){
if($result->is(NaturalResultType::IN_THE_NEWS)){
foreach($result->news as $news){
$newsTitle = $title;
$newsUrl = $url;
}
}
}
Top Stories
List of recent popular news.
Implemented in version
0.1.4as a successor for "in the news"Composed top stories for mobile were implemented with version
0.2.2
Top stories might be present in 3 distinctive formats: carousel, vertical, composed.
Carousel

Vertical

Composed (mobile)

Available with
NaturalResultType::TOP_STORIESNaturalResultType::COMPOSED_TOP_STORIES
Note: all top stories have the type NaturalResultType::TOP_STORIES. In addition of what composed top stories
also have NaturalResultType::COMPOSED_TOP_STORIES.
Data
isCarouselboolean: true when has a carouselisVerticalboolean: true when has vertical itemsnewsarraytitlestring [A]urlstring: The url reached when clicking the title
Example
use Serps\SearchEngine\Google\NaturalResultType;
$results = $response->getNaturalResults();
foreach($results as $result){
if($result->is(NaturalResultType::TOP_STORIES)){
foreach($result->news as $news){
$newsTitle = $title;
$newsUrl = $url;
}
}
}
About composed top stories
Composed top stories are mixing both of vertical and carousel news that appear on mobiles.
The composed results will have variables isCarousel and isVertical set to true.
When iterating over news you can check if the news is a carousel or a vertical result:
use Serps\SearchEngine\Google\NaturalResultType;
foreach($result->news as $news){
if ($news->is(NaturalResultType::TOP_STORIES_NEWS_CAROUSEL) {
// carousel
} elseif ($news->is(NaturalResultType::TOP_STORIES_NEWS_VERTICAL) {
// vertical
}
}
You can also distinctly get one of the result types:
use Serps\SearchEngine\Google\NaturalResultType;
$carouselResults = $result->news->getResultsByType(NaturalResultType::TOP_STORIES_NEWS_CAROUSEL);
Flights
Flight sample from google flights

Available with
NaturalResultType::FLIGHTS
Data
No data is parsed from flight results. There is no plan to implement it because it's complex and not very useful.
Example
use Serps\SearchEngine\Google\NaturalResultType;
$results = $response->getNaturalResults();
foreach($results as $result){
if($result->is(NaturalResultType::FLIGHTS)){
// Got a flight result
}
}
Adwords Results
The google client offers an Adwords parser.
Warning
Adwords parsing is still experimental!
$adwordsResults = $response->getAdwordsResults();
foreach($results as $result){
// do stuff
}
Adwords sections
Adwords results are composed from 3 distinct sections. These sections can be at the top, at the right or at the bottom of the natural results. See the schema:

By default all results are available in the result set, if you need to get results from a section, you can use the section as a type filter:
use Serps\SearchEngine\Google\AdwordsResultType;
$adwordsResults = $response->getAdwordsResults();
$topResults = $adwordsResults->getResultsByType(AdwordsResultType::SECTION_TOP);
$rightResults = $adwordsResults->getResultsByType(AdwordsResultType::SECTION_RIGHT);
$bottomResults = $adwordsResults->getResultsByType(AdwordsResultType::SECTION_BOTTOM);
foreach($topResults as $result){
// Do stuff...
}
Adwords Types
Ad
Ads results are the basics results from adwords.

Available with
AdwordsResultType::AD
Data
titlestring [A]urlstring: The url reach when clicking the titlevisurlstring [B]: The visual urldescriptionstring [C]
Example
use Serps\SearchEngine\Google\AdwordsResultType;
$results = $response->getAdwordsResults();
foreach($results as $result){
if($result->is(AdwordsResultType::AD)){
$url = $result->url;
}
}
Shopping
These are the results from google shopping/merchant.

Available with
AdwordsResultType::SHOPPING_GROUP
Data
productsarray: The product list. Each product contains the following items:titlestring [A]imagestring [B]: the image as specified by google - either an image url or a base64 encoded imageurlstring: The url reached when clicking the titletargetstring [C]: The target website as shown by googlepricestring [D]: The price as show by google
Example
use Serps\SearchEngine\Google\AdwordsResultType;
$results = $response->getAdwordsResults();
foreach($results as $result){
if($result->is(AdwordsResultType::SHOPPING_GROUP)){
foreach($result->products as $item){
$title = $item->title;
}
}
}
Additional info
A Google SERP contains even more information that the result list. Sometime they will be very helpful to get the most from the SERP.
Here is the list of these info currently supported by the parser.
Number of results

Represents the total number of results returned by the current search. The format of this number can change from country to country (61,000,000 or 61 000 000 or 6,10,00,000 etc...) We take care of returning this number as a integer no matter the initial format.
In some cases this number is not available (for instance with mobile layout)
$numberOfResults = $response->getNumberOfResults();
if(null === $numberOfResults){
// D'oh!
} elseif($numberOfResults < 2000) {
// ...
} else {
// ...
}
Related searches

Google uses to give a list of related searches at the bottom of the page. The method getRelatedSearches will return a list of these items.
$relatedSearches = $response->getRelatedSearches();
foreach ($relatedSearches as $relatedSearch) {
$url = $relatedSearch->url;
$title = $relatedSearch->title;
}
Custom parsing
Sometimes you need information that are not available in our parser.
First of all, search if someone already asked for this feature on the issue tracker.
If you don't find a trace of this feature, but you still consider that this feature is important, then open an issue and let's discuss it. This is very important because if the feature is implemented in the library it will take advantage of being updated on google updates, and you wont have to maintain it.
Back from the issue tracker, no one mentioned it and you still want to parse the information by yourself. Alright, here are the tools you need.
Query with css
The easiest way to do it for a web developer: with css.
$response = $googleClient->query($googleUrl);
// Returns \DOMNodeList
$queryResult = $response->cssQuery('#someId');
if ($queryResult->length == 1) {
// You can query again to find items in the previous context.
// Gets all items with the class 'someClass' within the element with the id 'someId'
$queryResult = $response->cssQuery('.someClass', $queryResult->item(0));
} else {
// some errors...
}
It works exactly as DOMXPath::query does. Actually the css is translated to xpath and DOMXPath::query is called on the dom element.
Query with xpath
That's very similar to the css way, except that you will use xpath.
$response = $googleClient->query($googleUrl);
$queryResult = $response->xpathQuery('descendant::div[@id="someId"]');
if ($queryResult->length == 1) {
// Gets all 'a' tags inside the element with the id 'someId'.
$queryResult = $response->xpathQuery('a', $queryResult->item(0));
} else {
// some errors...
}
There is also a shortcut to the xpath object.
$response = $googleClient->query($googleUrl);
$xpath = $response->getXpath();
$xpath->query('someXpath');
Manipulate the DOM object
You can get the DOM object to manipulate it, or to save it in a file.
$response = $googleClient->query($googleUrl);
$dom = $response->getDom();
// Writes the dom content in the file 'file.html'
$dom->save('file.html');
view also:
SERPS