Official Google Webmaster Central Blog: January 2017

Introducing the Mobile-Friendly Test API

Tuesday, January 31, 2017

With so many users on mobile devices, having a mobile-friendly web is important to us all. The Mobile-Friendly Test is a great way to check individual pages manually. We're happy to announce that this test is now available via API as well.

The Mobile-Friendly Test API lets you test URLs using automated tools. For example, you could use it to monitor important pages in your website in order to prevent accidental regressions in templates that you use. The API method runs all tests, and returns the same information - including a list of the blocked URLs - as the manual test. The documentation includes simple samples to help get you started quickly.

We hope this API makes it easier to check your pages for mobile-friendliness and to get any such issues resolved faster. We'd love to hear how you use the API -- leave us a comment here, and feel free to link to any code or implementation that you've set up! As always, if you have any questions, feel free to drop by our webmaster help forum.

Posted by John Mueller, Webmaster Trends Analyst, Google Switzerland

Protect your site from user generated spam

Friday, January 27, 2017

As a website owner, you might have come across some auto-generated content in comments sections or forum threads. When such content is created on your pages, not only does it disrupt those visiting your site, but it also shows some content that you may not want to be associated with your site to Google and other search engines.

In this blog post, we will give you tips to help you deal with this type of spam in your site and forum.

Some spammers abuse sites owned by others by posting deceiving content and links, in an attempt to get more traffic to their sites. Here are a few examples:

Comments and forum threads can be a really good source of information and an efficient way of engaging a site's users in discussions. This valuable content should not be buried by auto-generated keywords and links placed there by spammers.

There are many ways of securing your site’s forums and comment threads and making them unattractive to spammers:

Keep your forum software updated and patched. Take the time to keep your software up-to-date and pay special attention to important security updates. Spammers take advantage of security issues in older versions of blogs, bulletin boards, and other content management systems.

Add a CAPTCHA. CAPTCHAs require users to confirm that they are not robots in order to prove they're a human being and not an automated script. One way to do this is to use a service like reCAPTCHA, Securimage and Jcaptcha .
Block suspicious behavior. Many forums allow you to set time limits between posts, and you can often find plugins to look for excessive traffic from individual IP addresses or proxies and other activity more common to bots than human beings. For example, phpBB, Simple Machines, myBB, and many other forum platforms enable such configurations.
Check your forum’s top posters on a daily basis. If a user joined recently and has an excessive amount of posts, then you probably should review their profile and make sure that their posts and threads are not spammy.
Consider disabling some types of comments. For example, It’s a good practice to close some very old forum threads that are unlikely to get legitimate replies.
If you plan on not monitoring your forum going forward and users are no longer interacting with it, turning off posting completely may prevent spammers from abusing it.
Make good use of moderation capabilities. Consider enabling features in moderation that require users to have a certain reputation before links can be posted or where comments with links require moderation.
If possible, change your settings so that you disallow anonymous posting and make posts from new users require approval before they're publicly visible.
Moderators, together with your friends/colleagues and some other trusted users can help you review and approve posts while spreading the workload. Keep an eye on your forum's new users by looking on their posts and activities on your forum.
Consider blacklisting obviously spammy terms. Block obviously inappropriate comments with a blacklist of spammy terms (e.g. Illegal streaming or pharma related terms) . Add inappropriate and off-topic terms that are only used by spammers, learn from the spam posts that you often see on your forum or other forums. Built-in features or plugins can delete or mark comments as spam for you.
Use the "nofollow" attribute for links in the comment field. This will deter spammers from targeting your site. By default, many blogging sites (such as Blogger) automatically add this attribute to any posted comments.
Use automated systems to defend your site. Comprehensive systems like Akismet, which has plugins for many blogs and forum systems are easy to install and do most of the work for you.

For detailed information about these topics, check out our Help Center document on User Generated Spam and comment spam. You can also visit our Webmaster Central Help Forum if you need any help.

Posted by Anouar Bendahou, Search Quality Strategist, Google Ireland

What Crawl Budget Means for Googlebot

Monday, January 16, 2017

Recently, we've heard a number of definitions for "crawl budget", however we don't have a single term that would describe everything that "crawl budget" stands for externally. With this post we'll clarify what we actually have and what it means for Googlebot.

First, we'd like to emphasize that crawl budget, as described below, is not something most publishers have to worry about. If new pages tend to be crawled the same day they're published, crawl budget is not something webmasters need to focus on. Likewise, if a site has fewer than a few thousand URLs, most of the time it will be crawled efficiently.

Prioritizing what to crawl, when, and how much resource the server hosting the site can allocate to crawling is more important for bigger sites, or those that auto-generate pages based on URL parameters, for example.

Crawl rate limit Googlebot is designed to be a good citizen of the web. Crawling is its main priority, while making sure it doesn't degrade the experience of users visiting the site. We call this the "crawl rate limit," which limits the maximum fetching rate for a given site.

Simply put, this represents the number of simultaneous parallel connections Googlebot may use to crawl the site, as well as the time it has to wait between the fetches. The crawl rate can go up and down based on a couple of factors:

Crawl health: if the site responds really quickly for a while, the limit goes up, meaning more connections can be used to crawl. If the site slows down or responds with server errors, the limit goes down and Googlebot crawls less.
Limit set in Search Console: website owners can reduce Googlebot's crawling of their site. Note that setting higher limits doesn't automatically increase crawling.

Crawl demand Even if the crawl rate limit isn't reached, if there's no demand from indexing, there will be low activity from Googlebot. The two factors that play a significant role in determining crawl demand are:

Popularity: URLs that are more popular on the Internet tend to be crawled more often to keep them fresher in our index.
Staleness: our systems attempt to prevent URLs from becoming stale in the index.

Additionally, site-wide events like site moves may trigger an increase in crawl demand in order to reindex the content under the new URLs.

Taking crawl rate and crawl demand together we define crawl budget as the number of URLs Googlebot can and wants to crawl.

Factors affecting crawl budget According to our analysis, having many low-value-add URLs can negatively affect a site's crawling and indexing. We found that the low-value-add URLs fall into these categories, in order of significance:

Faceted navigation and session identifiers
On-site duplicate content
Soft error pages
Hacked pages
Infinite spaces and proxies
Low quality and spam content

Wasting server resources on pages like these will drain crawl activity from pages that do actually have value, which may cause a significant delay in discovering great content on a site.

Top questions Crawling is the entry point for sites into Google's search results. Efficient crawling of a website helps with its indexing in Google Search.

Q: Does site speed affect my crawl budget? How about errors?
A: Making a site faster improves the users' experience while also increasing crawl rate. For Googlebot a speedy site is a sign of healthy servers, so it can get more content over the same number of connections. On the flip side, a significant number of 5xx errors or connection timeouts signal the opposite, and crawling slows down.
We recommend paying attention to the Crawl Errors report in Search Console and keeping the number of server errors low.

Q: Is crawling a ranking factor?
A: An increased crawl rate will not necessarily lead to better positions in Search results. Google uses hundreds of signals to rank the results, and while crawling is necessary for being in the results, it's not a ranking signal.

Q: Do alternate URLs and embedded content count in the crawl budget?
A: Generally, any URL that Googlebot crawls will count towards a site's crawl budget. Alternate URLs, like AMP or hreflang, as well as embedded content, such as CSS and JavaScript, including AJAX (i.e. XHR) calls, may have to be crawled and will consume a site's crawl budget. Similarly, long redirect chains may have a negative effect on crawling.

Q: Can I control Googlebot with the "crawl-delay" directive?
A: The non-standard "crawl-delay" robots.txt directive is not processed by Googlebot.

Q: Does the nofollow directive affect crawl budget?
A: It depends. Any URL that is crawled affects crawl budget, so even if your page marks a URL as nofollow it can still be crawled if another page on your site, or any page on the web, doesn't label the link as nofollow.

Q: Do URLs I disallowed through robots.txt affect my crawl budget in any way?

A: No, disallowed URLs do not affect the crawl budget.

For information on how to optimize crawling of your site, take a look at our blogpost on optimizing crawling from 2009 that is still applicable. If you have questions, ask in the forums!

Posted by Gary, Crawling and Indexing teams

Webmaster Central Blog

Introducing the Mobile-Friendly Test API

Protect your site from user generated spam

What Crawl Budget Means for Googlebot

Labels

Archive

Feed

Subscribe via email