The Safety of Internet Search Engines

May 12, 2006

Abstract

We compare safety of leading search engines, using SiteAdvisor's automated Web site ratings. We find most leading search engines similar in the safety of the sites they link to, though MSN is the safest and Ask lags noticeably behind. Across search engines, we find sponsored results significantly less safe than search engines' organic results. We find heightened risks for certain keywords, including those frequently searched by kids and novice users. We began this project in January 2006, and our analysis uses search engine results and SiteAdvisor safety data from April 2006.

% of Red1 and Yellow2 Sites...

By Search Engine
% of Red and Yellow Sites - by Search Engine

By Type of Result
% of Red and Yellow Sites - by Result Type

Key Findings
  • All the major search engines returned risky sites in their search results for popular keywords.
  • Overall, MSN search results had the lowest percentage (3.9%) of dangerous sites while Ask search results had the highest percentage (6.1%). Google was in between (5.3%).
  • Sponsored results contained two to four times as many dangerous sites as organic results.
  • There was little correlation between search result placement and safety. Page 1 results were only moderately safer than results for pages 2-5.
  • Dangerous sites soared to as much as 72% of results for certain risky keywords. Particularly dangerous keywords include "free screensavers", "bearshare", "kazaa", "download music", and "free games."
  • We estimate that US consumers make 285 million clicks to hostile sites every month as a result of search engine results.

Our core advice: It's a jungle out there. Users should be careful where they go and what they do when choosing sites based on search engine results. Despite search engines' efforts, we see too many sites trying to deceive unsuspecting users. These tricky sites span a range of content areas, keywords, and business models – so there is no simple advice as to how to stay safe. Users can't count on search engines to protect them; to the contrary, we find that search result rankings often do not reflect site safety. Users are at especially high risk when visiting search engine advertisers -- even though search engines are well equipped to impose strict guidelines on sites buying prominent placement.

 

1 - "Red" rated sites failed SiteAdvisor's safety tests. Examples are sites that distribute adware, send a high volume of spam, or make unauthorized changes to a user's computer.

2 - "Yellow" rated sites engage in practices that warrant important advisory information based on SiteAdvisor's safety tests. Examples are sites which send a high volume of "non-spammy" email, display many popup ads, or prompt a user to change browser settings.

Overview

Where Internet users go, attackers follow. Users embrace e-mail; then spammers fill their inboxes with junk mail. With the rise in online commerce, phishers trick them into giving up their passwords. Users find handy downloadable applications; adware vendors bundle them with pop-up-spewing add-ons.

The rise of Internet search brings a new type of risk. Hostile Web sites might seek to harm users or take advantage of them – whether through spyware, spam, scams, or other bad practices – because search engines often do not filter these sites from their results. Consider this scenario:

Suzy wants to perform Beyonce's "Crazy in Love" for her school talent show. To make sure she dresses the part, she performs a Google search for "celebrity photos." When she clicks the first search result, celebritypictures.duble.com, she is quickly prompted to install an adware-bundled ActiveX control in order to browse the site's contents. Eager to view photos of her celebrity role model, she accepts the installation of a new browser toolbar and a pop-up serving adware program.

In principle, search engines' listing rules, ranking rules, and advertising policies might shield users from some bad practices, and users' good judgment could protect them from others. But as we look at the Web, we see many instances when search engines lead users to dangerous content. Our analysis of search engine safety finds bad practices among 5% of search results for popular keywords, or roughly one site per page of search results.

The rise of paid search results brings additional complications: Profit motivations have shifted search engines' ranking methodologies. Prominent results often reflect solely a site's willingness to pay rather than its quality, relevance, or safety.

Methodology

To compare the safety of search engines' listings, we compiled 1394 popular keywords using lists of common searches from Google Zeitgeist, Yahoo!, AOL, Lycos, Wordtracker, and other industry sources. Some lists included adult search terms, which we excluded to maintain consistent keyword content. We considered the first five pages of results for each keyword from each of the five biggest search engines: Google, Yahoo!, AOL, MSN, and Ask.

We analyze search engine results, noting which sites were listed where (by search engine, keyword, page, position) and how they were labeled (organic versus sponsored). We assess the safety of listed sites by consulting SiteAdvisor's Web safety database.3 SiteAdvisor safety ratings are based on automated tests that analyze Web sites for exploits, downloads containing spyware, adware, or other unwanted programs, pop-ups, links to dangerous sites, and e-mail submission forms. SiteAdvisor's automated tests are supplemented by feedback from volunteer reviewers, comments from Web site owners and input from SiteAdvisor analysts. Using SiteAdvisor's safety ratings, we assess search engines' results, analyzed along a number of axes.

Diagram of methodology
We took popular keyword lists, ran them through five popular search engines, noted where the results
occurred on the page, and then cross-referenced those results with SiteAdvisor safety ratings.

If SiteAdvisor rates a site as "yellow" or "red," there's a good chance that typical users will be concerned about the safety of the rated site. A red rating warns users that a site poses a security threat, including the misuse of e-mail addresses, scams, exploits, and downloads containing spyware, adware, or other unwanted programs. A yellow rating is given to sites that pass most of SiteAdvisor's safety tests but still employ practices warranting a user to exercise caution. SiteAdvisor's FAQ has details on SiteAdvisor's methods – including more information on the specific problems SiteAdvisor detects, and more on how SiteAdvisor's robots work.

We weight all links equally – reflecting that users tend to treat sponsored and organic links identically.

The remainder of this article presents our most notable findings.

3 - SiteAdvisor Inc. was founded in April 2005. The company was acquired by McAfee, Inc. in April 2006.

Comparing Search Engines

Though many users rely on the top few search engines, there are many other search engines in common use. Existing research tends to focus on users choosing a search engine to obtain the most relevant or useful results. Relevance is a natural way to choose a search engine, but users might also consider choosing a search engine based on safety. After all, even the most relevant results may not be desirable if they bring substantial risks of harm. We therefore begin our analysis by comparing safety of the leading search engines.

Our analysis reveals some significant differences among the major search engines. Overall, our tests show MSN's results to be the safest of the tested search engines. This may reflect, at least in part, an explicit publicly-documented MSN effort to remove unsafe sites. Least safe are results at Ask – where unsafe sites are more than 56% more frequent than at MSN (6.1% versus 3.9%).

Percent of red and yellow results by search engine
Percentage of red and yellow results by search engine

Search engine safety performance varies across certain subsets of our keyword list. For example, Yahoo! returns a lower percentage of dangerous results when searching for words in the Yahoo! 2005 Top Searches list than when searching for words in Google's Zeitgeist listings. In contrast, Google, AOL, and Ask perform better when searching for Google Zeitgeist keywords as opposed to those in the Yahoo! 2005 Top Searches list. MSN performed consistently for both of these keyword lists. The Yahoo! 2005 Top Searches list contains a higher percentage of celebrity and entertainment terms than the Google Zeitgeist list, implying that Yahoo! is a safer choice for these categories.

Percentage of red and yellow results in Google Zeitgeist searches Percentage of red and yellow results in Yahoo! 2005 Top Searches

Percentage of red and yellow results in Google Zeitgeist searches vs. Yahoo! 2005 Top Searches

More specific keyword subsets reveal greater variance in safety performance. We use Google Zeitgeist to group keywords into categories – lists of five to ten keywords in a variety of categories. MSN ranks safest for 23 out of 63 keyword categories (including "tabloid fodder" and "video games"), while Ask only ranks safest for 5 categories (including "popular sports" and "hot cars"). Yahoo! proves the safest for "games" keywords (such as "Halo 2" and "RuneScape"), while AOL ranks safest for "digital music " keywords (such as "bittorrent" and "iTunes"). Google returns the safest results for "look it up" keywords (such as "lyrics" and "weather"), but returns the most dangerous results for "tech toys" keywords (such as "iPod nano" and "Nintendo Revolution"). See Risky Keywords and Categories (below).

On the whole, we see little basis to conclude that any search engine is much safer than any other; safety rankings vary too much from search to search. But, overall, MSN outperforms the others. We recommend extra caution when searching at Ask.

Results in Perspective

At first glance, a 4%-6% incidence of red and yellow sites in search results may not appear alarming. But even a single visit to a dangerous site can have serious and lasting implications for the average Internet user:

  • Sites using browser exploits can insert unwanted code on a user's PC, which may cause serious security breaches and render a user's PC essentially inoperable. For example, we found exploit site celebritypro(dot)com when searching for "Halle Berry" at Google. This site uses security exploits to install software onto a user's PC without consent.
  • Sites which include downloads with adware or spyware can clutter a user's PC with unwanted programs that serve intrusive advertising pop-ups, track users' browsing habits, and cause operating difficulties. A single download at ratloaf.com (found in top search results for "screensavers" at Yahoo!) can come bundled with three different adware/spyware programs.
  • Sites which misuse personal information can cause endless spam and threaten the safety of financial and other personal information. A single sign-up at rewardsgateway.com (found in search results for "iPods" at Google) can lead to 303 e-mails per week.

It is estimated that US Internet users conduct 5.7 billion searches per month. Suppose each search yields one and only one click to one of the sites listed in the results. Then even a 5% incidence of red/yellow sites would mean 285 million clicks to these sites every month from search engines.

With spam, adware, and spyware costing consumers and corporations increasing amounts of time and money, we believe that the incidence of red and yellow sites in search engine results is extremely significant and is a contributing factor to the problems of spam, adware, spyware, and other online threats.

Organic versus Paid Results
Google organic (left) and sponsored (top, right) search results for the keyword phrase 'free iPods.'
Google organic results (left) and sponsored listings (top, right) for the keyword phrase "free iPods." Sponsored results have a higher percentage of dangerous sites than organic results.

Today's search engines combine two dramatically different kinds of results. Search engines' "main" results are organic listings – search engines' best assessment of what Web pages are most relevant to users' search requests. But search engines also show sponsored listings, where inclusion reflects a site's willingness to pay to be listed.

These different kinds of listings yield different risks to users. Organic listings are generally added, selected, and ranked without substantial human involvement; search engines' automated systems pick and present sites. Without any human evaluating site safety, users might reasonably worry that organic results could take them to unsafe sites.

In contrast, search engines' sponsored links seem to offer an aura of safety: Search engines post detailed editorial policies as to who may advertise and how. (See Google's Editorial Guidelines and Yahoo!'s Sponsored Search Listing Guidelines.)

Despite these special rules for search engine advertising, our testing indicates that organic sites are, overall, substantially safer than sponsored listings. Take the example of "free iPods", where first page results yield many more red sites in sponsored results compared to organic results.

Across all search terms we analyze, a Google ad is on average more than twice as likely to take a user to an unsafe site than is a Google organic link. At Ask, the difference is especially pronounced: Their sponsored results are almost four times as risky as their organic listings.

Red and yellow sites appear in sponsored results at 2-3 times the rate of organic results.
Red and yellow sites appear in sponsored results at two to four times the rate of organic results.

We are troubled by the untrustworthiness of search engines' ads. At first glance, search engines' voluminous rules would seem a virtual guarantee of good outcomes. In some areas, search engines seem to have made strong headway – such as for online pharmacies, where a SquareTrade evaluation process assures that only legitimate companies can buy ads. But search engines' policies don't really speak to the problems at hand. For example, search engines sell ads to sites that send users literally hundreds of e-mails per week. (Included in our search results are consumerincentivezone.com, freegiftworld.com, and lookdog.com.) Search engines also sell ads to sites that infect users' computers with adware programs that open numerous annoying pop-up ads. (Included in our search results are scenicreflections.com, screenscenes.com, and totallyfunfreegames.com.) Search engines' editorial rules largely ignore these practices, and even where they do discuss these issues, enforcement seems to be lax.

In contrast, search engines' organic listings reflect the Web's assessment of the quality and usefulness of a site, as measured by who links to whom. Spammers, spyware-pushers, and other pariahs may be able to buy search engine ads, but they tend to fare worse in organic listings.

We're not the first to note untrustworthy ads. For example, noted security expert Richard Smith has complained about this problem, after a bogus weather program infected his wife's computer via a misleading Google ad.

Why don't search engines get tough on untrustworthy ads? One explanation is that it's a difficult task: Search engines lack automated link-based analysis of advertisers' trustworthiness – the only thing keeping organic results (relatively) safe. If search engines won't or can't use link analysis to vet their advertisers, search engines might have to invest staff time in manually determining advertisers' reputations, and search engines may hesitate to incur the associated costs. Separately, some analysis indicates that search engines make big money selling ads to untrustworthy of sites – many millions of dollars each year.

Risky Keywords and Categories

When searching the Web, users face risks that vary dramatically according to what categories they search for. A large proportion of malicious sites are concentrated in certain high risk categories; searching within these danger zones exposes users to a high probability of ending up in the dark alleys of the Web.

The technology trade press confirms our sense that certain parts of the Web tend to be unsafe. For example, a recent TechTarget article encourages users to avoid spyware by " Stay[ing] away from any questionable sites, including pornography, gambling, hacking or other off-beat sites." Similarly, Security Pipeline tells users to "stay on your guard" when visiting potentially-unsafe domains, such as song-lyric sites, game, and hobby sites.

The incidence of red and yellow sites for the 25 most dangerous Google Zeitgeist categories.
The incidence of red and yellow sites, within Google results, for five top dangerous Google Zeitgeist categories.

Our analysis confirms the basic advice of TechTarget and Security Pipeline. For example, users searching for digital music at Google face 75 times as many risky sites as users searching for news. (We reach this conclusion by comparing the frequency of unsafe sites within "news outlet" searches [as reported by Google Zeitgeist data] with the frequency of unsafe sites within Zeitgeist's "digital music" keyword list.)

Results within categories also differ noticeably between search engines, and some search engines are noticeably safer than others for specific categories. For example, only 0.2% of Yahoo! results for Google Zeitgeist "games" keywords are rated red or yellow, compared with 8.9% of AOL results. Unsafe search results for "movie-related" keywords range from 2.5% for MSN to 8.6% for Ask.

Overall, the most dangerous search term is "free screensavers," which returns results that are 57% red or yellow on average. Search engines differ greatly in their results for this keyword: Yahoo! returns 72.2% red or yellow results, compared with 37.7% at AOL. So users are almost twice as likely to stumble onto a risky screensaver site using Yahoo! versus AOL.

Percentage of red and yellow results for “free screensavers”, the most dangerous keyword search phrase tested.
Percentage of red and yellow results for "free screensavers," the most dangerous search phrase tested.

In general, many of the riskier keywords tend to be associated with downloads and file sharing. Google's top five riskiest keyword searches are "free screensavers" (64.0%), ""Bearshare" (57.0%), "screensavers" (54.6%), "Winmx" (50.5%), and "limewire (46.4%).

Even keywords that are generally not regarded as risky yield relatively high rates of red and yellow sites. A Google search for "care bears" leads to 14.6% red or yellow sites. "Birthday cards" leads to 15.6%, "south beach diet" leads to 14.8%, and "weather" leads to 14.0%.

Our testing confirms the core facts behind standard advice to avoid "risky" categories. But we still think that advice is far from practical. (See our earlier Not So Practical Web Safety Advice.) Asking users to give up broad swaths of the Web imposes great limitations and substantial responsibility, while offering little insight as to how to stay safe while nonetheless enjoying the Web. In addition, danger lurks beyond the generally accepted "risky" categories, so users can never really let down their guard.

Analysis by Result Page

In an attempt to stick with safe search engine results, some users limit themselves to top results. (See e.g. an iProspect study finding that 62% of searchers click a result within the first page of listings.) Lower-ranked sites might not be as good as top sites, users seem to think, so visiting only top sites perhaps offers an appearance of safety. Unfortunately, our analysis indicates that this strategy is largely unjustified.

In our testing, we find little safety benefit from sticking to top search results. The first organically ranked results are marginally safer than the tenth results and page 1 results are slightly safer than pages 2 through 5. But the benefit is slight Google's page 1 results are just 0.05% safer than pages 2-5.

Page 1 results versus results on pages 2-5.  There was little difference in site safety by page.
Page 1 results versus results on pages 2 through 5. There was little difference in site safety by page.

Parting Thoughts

We're alarmed by the scope of these problems – by the many ways search engines lead users to sites that turn out to be untrustworthy or worse. But we look at the online pharmacy example with substantial optimism: There, search engines saw a problem, designed a solution, and implemented it in a way that offers users real protection. Could similar solutions emerge to protect users from spyware, spam, and other Internet plights? We're cautiously optimistic.

We see plenty of sites we think search engines could properly kick out of their listings and out of their ad networks. Our Utopian Internet is spyware-free and spam-protected, and it has no place for sites that try to charge users for software that's actually free. Exploit sites are even more noxious – so we'd like to see search engines redouble their efforts to keep exploits out of their results. Perhaps this is an overly ambitious fantasy, but we see room for substantial improvement.

Meanwhile, there's a real problem out there – tens of thousands of sites that, in SiteAdvisor's analysis, pose serious risks of harming users. Navigating the Web via a search engine won't prevent users from stumbling onto one of these sites, and search results provide users with little indication of site safety. Users can exert some control by choosing one search engine over another or by choosing organic results instead of sponsored results, but users still need more information. Otherwise, it's only a matter of time before users end up on dangerous sites, where just one bad click can produce harmful consequences.

We don't control what sites do, and we don't control what search engines do. But we do have a way to help: SiteAdvisor arms users with knowledge about sites' practices, so users know where they're heading before it's too late.

 

Overall Analysis by Search Engine

Analysis by Keyword Type (Zeitgeist versus Yahoo! Top Searches, etc.)

Analysis by Result Type (sponsored versus organic)

Analysis by Zeitgeist Keyword Group (safest and riskiest keyword categories)

Analysis by Result Page (page 1 versus others)

Analysis by Individual Keyword:

  Tables of site safety by keyword Charts of site safety by keyword 
With site details Without site details
Google   All results, organic, sponsored All results, organic, sponsored All results, organic, sponsored
Yahoo All results, organic, sponsored All results, organic, sponsored All results, organic, sponsored
MSN All results, organic, sponsored All results, organic, sponsored All results, organic, sponsored
AOL All results, organic, sponsored All results, organic, sponsored All results, organic, sponsored
Ask All results, organic, sponsored All results, organic, sponsored All results, organic, sponsored

Resources

Search Engine Statistics

Search Engine Ratings and Reviews - SearchEngineWatch - indexes analyses of search engine market shares

Analysis of search engine query volume - Nielsen/NetRatings

Search Tools - Trends & Statistics - ClickZ

Search Engine Mega-List

Keynote analysis of search engine experience

Search Engine User Behavior

A Matter of Trust: What Users Want from Web Sites - Consumer Reports

Different Engines, Different Results: Web Searches Not Always Finding What They're Looking for Online - Dogpile

Search Engine User Behavior Study - iProspect

How Are We Searching the World Wide Web? A Comparison of Nine Search Engine Tansaction Logs - Jansen and Spink

Accurately Interpreting Clickthrough Data as Implicit Feedback - Joachims, Granka, Pan, Hembrooke, and Gay

Spyware: The Threat of Unwanted Software Programs is Changing the Way People Use the Internet - Pew Internet & American Life Project

Internet Security Resources

State of the Net - Consumer Reports - analysis of the incidence and costs of security threats

Paid Search Results Often Not Worth the Click - IDG News

Sunbelt Blog

Spyware Warrior

Search Engine Advertisement Guidelines

Google - AdWords Editorial Guidelines

Yahoo! - Sponsored Search Listing Guidelines

MSN - AdCenter Search Ads Content Guidelines

AOL - Ad Specs - Policies and Guidelines

Ask - Sponsored Listings - Guidelines

The SiteAdvisor Web Safety Tool

siteadvisor.com