28 October 2008

Working to Make RIAs Visible to Search Engines

Getting noticed by search engines is a problem common to all rich Internet applications (RIAs).

Search engines were not built with RIAs in mind, and they are not able to index RIA content. This is not a big problem for internal applications, but if you want Web-surfing prospects to find your RIA-based Web site, you have to make it visible to searches.

Search engines work by delegating a robot to visit a Web site's home page (derived from links from someone else's Web page). The bot proceeds to "walk" the pages of the site, harvesting text, which it indexes to the page's URL. It recursively follows any links on that page until it has searched the whole site.

But as I've said many times, a RIA doesn't have pages. It is a single, very smart "page" that morphs to display ever-changing content. The search bot looking at this single page sees only some form of binary or programmatic content that doesn't give up its textual content easily. The bot sees neither text to harvest nor links to which it can walk. It can learn nothing about the content of the RIA's various views.

Obviously, RIA platform vendors are aware of this problem. They have been working with search engine companies to create revised file formats for their RIA contents that search engines can see into. Adobe has one for Flex and Flash, but there is no evidence yet that search engine companies have implemented it. Furthermore, this approach is probably limited to rather simplistic indexing because it sees only keywords for the RIA root site itself and not for its many "pages."

Let me suggest a guerrilla approach that takes advantage of the deep-linking property that some RIA development platforms support. The idea is to generate a parallel HTML version of the RIA pseudopages and use deep linking to index into them. This is only practical for small sites or for sites that dynamically render structured data, because in these cases the HTML can be programmatically generated.

How does a search engine find this generated HTML site map? The answer is that RIAs always have a single, very small HTML home page that bootstraps the RIA. Furthermore, HTML has a simple facility for enabling browsers that don't allow graphics (the RIA looks like graphics to the browser) to render an optional text version of the page. It turns out that search bots always follow the path toward the text version, so they will take a right turn at this spot and dig into the textual context, which now points to our generated HTML pages.

For more information about these emerging strategies, do a Web search for search engine optimization (SEO) strategies and RIAs. SEO rules will also guide you through the textual idioms that search engines avoid.

I welcome your comments on this Advisor and encourage you to send your insights to jtibbetts@cutter.com.

-- John Tibbetts, Senior Consultant, Cutter Consortium

Working to Make RIAs Visible to Search Engines

Advice and Analysis

The Cutter Edge is a free biweekly email service that gives you information and advice that you can put to work immediately for your organization. Issues are written by Cutter Consortium's journal and Senior Consultants.

Sign Up for the Cutter Edge

Advisor Free Trial

Sign up for a free, 4-week trial to any or all of our Advisor newsletters.

Sign Up