Understand form indexation – When can Google follow your forms?

by Peter Young on January 29, 2009 · 0 comments

Over the last couple of months, there have been a number of improvements on Google including longer search snippets within the SERP’s, the increase and evolution of blended search, the continuing developments in terms of flash indexation, however I would suggest one with considerable implications is the developments in terms of form based indexation.

There still appears to be some considerable uncertainty regarding how and whether forms can be indexed, if some recent discussions are anything to go by. One thing should be made clear before we start, Google is getting incredibly clever and evolved, however its indexation of forms is not clear cut, and there are a number of factors which have to be considered. This is two fold:

  • Indexation considerations – In situations where Google can follow and index content behind forms, consideration will need to be given to whether you want that content indexed, and if necessary remediation is required.
  • Privacy considerations – One of the best lines that came out of SES London 2008, was the line that SEO’s were now responsible for protecting sentive data and privacy. Given that form indexation opens up new potential avenues of entry to potential content, consideration therefore has to be given to any sensitive data which may previously have been inaccessible due to indexation limitations.

So  I mentioned earlier that Google couldn’t follow all forms. So at this point I will take some time to explain what criteria would allow Google to parse the form.

  • It should be noted that Google can only retrieve GET forms(not POST forms) and avoid forms that require any kind of user information. This is likely to reduce significantly the amount of forms which can be retrieved, as many forms submit via a POST form action.
  • Fetches to sites are still limited, and Google omit any forms which require a pasword submision.
  • As mentioned above, Google always adheres to any robots.txt commands which includes any nofollow, and noindex directives
  • When Google encounters a <FORM> element on a high-quality site, they may choose to do a small number of queries using the form. For text boxes, the crawler automatically choose words from the site that has the form; for select menus, check boxes, and radio buttons on the form, chosen from among the values of the HTML. Having chosen the values for each input, we generate and then try to crawl URLs that correspond to a possible query a user may have made

So why would Google want to be able to follow forms. Well according to Google themselves:

This experiment is part of Google’s broader effort to increase its coverage of the web. In fact, HTML forms have long been thought to be the gateway to large volumes of data beyond the normal scope of search engines. The terms Deep Web, Hidden Web, or Invisible Web have been used collectively to refer to such content that has so far been invisible to search engine users. By crawling using HTML forms (and abiding by robots.txt), we are able to lead search engine users to documents that would otherwise not be easily found in search engines, and provide webmasters and users alike with a better and more comprehensive search experience.

Search Engine Optimisation professionals therefore have to be more wary therefore in order to ensure indexation. I would suggest the ability to do something (like follow forms) should still not be at the expense of SEO best practise – the old ‘Just because you can do it, doesn’t mean you should analogy rings true here’ – and as such the integration of any ‘SEO friendly form’ functionality should not come to the detriment of internal linkage.

Contrary to that, if you do have content you do want indexed that is behind a form, compliment this with your traditional SEO friendly on-page tactics. Internal linkage has its part to play, and relevant anchors will continue to play their part in the SEO process for the forseeable future.

Further reading

Official Google Webmaster Blog

Matt Cutts – Solved: Another common website problem

Google+ Comments

Leave a Comment

Previous post:

Next post: