The Ghosts of Downloads Past
Posted by Hannah Rosenbaum on March 10, 2006 12:15 PM
The safety of Web sites can be fluid. Let's say we find a safety issue and rate a site red. Then the site cleans up its act. We test again, find everything is ok, and rate the site green. These instances couldn’t please us more -- we encourage Web site owners to review our site analyses and to take action to make their sites safer. We've blogged before about how the Web site owner of lionking.org improved his site’s e-mail practices and caused his site’s rating to go from red to yellow. (As soon as our re-testing is complete, we presume the site will earn a green rating.) We’re eager to see other sites use our safety ratings to identify problems and improve their sites.
But it can take time for sites to really come clean, so ratings rarely change overnight. Sometimes safety improvements take time to be effective (as in the case of the lionking.org) and sometimes surface improvements don’t fix underlying problems. For example, eliminating a bad download requires more than simply removing a site’s link to the download. The download may no longer be directly accessible by clicking around the site, but that doesn't mean that it can't be accessed by average Web surfers.
As an example, let's look at the Web site of Katholieke Universiteit Leuven, a University located in Belgium (SiteAdvisor Analysis: kuleuven.ac.be). The site originally earned a red rating from SiteAdvisor due to a download which bundled an invasive browser plug-in. The Web site owner and loyal students immediately expressed their concern about our site warning. The Web site owner quickly informed us that as a result of our analysis, he had removed links to the questionable download from the site. Very encouraging.
Now You See it, Now You Don't. And Now You Do Again.
At first glance, the problem download did indeed appear to be removed from Katholieke Universiteit Leuven's site. Manually clicking around the site, we couldn’t find it. But our bots did. Lurking beneath the surface, the download was still alive and well. So, our rating for K.U. Leuven remained red.
Now what’s really the harm of a hidden download? If the site doesn’t provide an obvious link to it, does it really matter if it is still hosted somewhere by that site?
Yes.
You may not be able to find the elusive download through a link on the Web site, but there are other paths that could still make the download very much accessible:
1) The download has a unique address, so anyone who may know (or who bookmarked) the address can find it by navigating directly to it from their browser.
2) A search engine crawler may have found it and therefore it could still show up in search results.
3) Any number of other Web sites could be referencing and linking to the download even if the host site isn’t.
So, as long as the site is still hosting the download, the site remains accountable for the download’s safety.
After clarifying our methodology with K.U. Leuven’s Web site owner, he promptly removed any trace of the download in question. Our subsequent analysis confirmed the download’s disappearance and determined the site to be safe. It’s now rated green.
Ghost in the Machine
Like K.U. Leuven’s Web site owner, many site administrators may not realize that their sites are still haunted by the ghosts of downloads past. Jay, a concerned member of the support team at WeatherBug (SiteAdvisor Analysis: weatherbug.com) alerted us that our download analysis of WeatherBug’s site included Weatherbug version 5.02, a version that he claimed was two years out of date and no longer offered. This older version of WeatherBug’s software caused the site to receive a red SiteAdvisor rating. The newer version, which includes the MyWebSearch toolbar but was otherwise free from safety issues, would have caused the site to receive a yellow rating.
Apparently, while not explicitly offered on WeatherBug’s Web site, the older version of the download still existed somewhere on its domain. Once again, even though there were no direct links to the download from WeatherBug's site, the red culprit could still be accessed by others. After realizing this, WeatherBug removed the offending download, and the site’s rating is now yellow.
Time for Some Spring Cleaning?
The Web changes quickly, but it can also create nearly indelible footprints. Once you put something out there, it can be found and referenced by any number of other sites. Cleaning up your site requires more than simply sweeping links under the rug. So, roll up your sleeves and give your site a good scrub of any download ghosts, particular those which may have been of somewhat ill-repute. And if our safety ratings can help you identify where in the attic to look for those ghosts, that's fine with us. We’re suckers for happy endings.

Comments
Few things...
1. Does the siteadvisor crawler respect the robots.txt file? If so can't a webmaster "fake" a better rating by just adding an entry to the file to prevent the siteadvisor crawler from looking for the download(s)? Also, I can't find anything about the siteadvisor crawler details (at least it's not obvious where the information can be found...). How would I go about preventing the crawler from examining my site? Would that incur a "yellow"/"red" rating?
2. Why isn't blog.siteadvisor.com listed on siteadvisor.com? It's not enough that just siteadvisor.com is listed, lumping "sub-domains" with the main site can be mis-leading.
3. Who watches the watchmen?
Posted by: Mememe | March 11, 2006 08:29 AM
Great reading, keep up the great posts.
Peace, JiggaDigga
Posted by: JiggaDigga | April 7, 2006 12:47 AM
How often is a site checked?
I wonder: What if a site turns "red" a few days after being declared "green"? Will it then take days, weeks, months to correct the fake-colour..?
In any case, steadvisor is a great idea!
Posted by: Mads Dam | April 12, 2006 04:14 PM