Main | January 2006 »

December 22, 2005

What's the Buzz? Tell me what's a-happening.

Posted by Shane Keats at 01:24 PM

It’s tough getting much work done this week as I and everyone else here in Boston scrambles to finish last minute holiday shopping and head off to spend time with our loved ones. Security companies, however, must always be vigilant. Today, I’m starting what I hope will become a SiteAdvisor tradition, a glimpse, each holiday, at the Web's holiday land mines.

For this mini-study, I spent some time on Yahoo Buzz, a terrific snapshot into the minds and surfing habits of our fellow man. What follows is a little expose' on some of the Grinches Who Are Trying to Steal Christmas.

Traffic and Weather on the 1’s

If you want to figure out what clothes to pack for Christmas week in Chicago, here are some links you should consider avoiding when you search for ‘weather conditions,’ a top holiday search term according to Yahoo.

Starware.com, advertises heavily for 'weather conditions' in order to get users to download their toolbar. That software gets a yellow flag from SiteAdvisor not for bundling adware, but for making more than 60 changes to a hard drive and swapping default search pages. Likewise, WeatherStudio.com and WeatherBug.com offer weather toolbars that take extensive liberties with users’ registries and hard drives. Just give us the weather, OK?

All that Glitters Is Not Gold

Jewelry is a top keyword this time of year as you might imagine. Pity a legit retailer like Zales. In the brave new world of Ad-Words, Googling ‘Zales’ yields zales.onlinerewardscenter.com as a top sponsored link. They (and their advertisers) send 26 emails per week to folks who sign up for their “services."

zales_safegoogle.gif

Another retailer, Dhanish.com, offers jewelry and more. In fact, it offers its own toolbar. Now, there are a lot of sketchy e-com sites out there, and this hardly qualifies as the worst. But why a jewelry site needs to offer a toolbar is beyond me. Especially one that makes nearly 200 changes to my registry. A gift that keeps on giving perhaps?

Two Sketchy Downloads, And a Partridge In a Pear Tree

Here’s a terrific demonstration of how our link analysis can be helpful to the general Web user. Search for ‘Christmas lyrics" and one of the hits you’re likely to get is metrolyrics.com. They don’t do anything bad in and of themselves. No spam. No spyware. But take a look at their linking patterns: red all over.

metrolyrics_linking.gif

Browse Metro for any length of time and without even knowing it, you could end up on any of these red sites. Unwittingly wander onto mp3lyrics.org and instead of learning what the 11th day of Christmas will bring, you could end up with an inbox like this:

mp3lyricsorg_inbox.gif

(By the way, that's how these publishing affiliates often work -- by directing their own users (aka traffic) to other sites that will pay them for the people sent their way. Often, obscuring the real names of the routes (the URLs) the user takes means he or she will make download or sign-up decisions without complete knowledge. SiteAdvisor's link analysis sheds light where the bad guys don't want it shed.)


Rated VX

Looking for a review or theater times for the 'Chronicles of Narnia'? Best Offers Networks, one of the top sponsored links for that search, suggests you can get free download of the film from their site. Aside from some piracy issues, the only thing a BestOffers download will get you is VX2, which, according to Pest Patrol, “monitors web pages requested and data entered into forms, sends this information to its home server, and opens pop-up advertisement windows. It also has the capability to update itself and install other software." (VX2, by the way, is brought to you by Direct Revenue. Is this their way of offering the web a Merry Christmas?)

Something tells me that C.S. Lewis didn’t mean the Narnia books to be an allegory for how broken our online advertising system has become.


Have Yourself a SiteAdvisor Xmas

The Web offers enormous value and opportunity. It makes shopping easier. It makes sharing information easier. It makes connecting with friends and family easier. It also makes the bad guys’ jobs easier.

xmas_safegoogle.gif

One of the Yahoo Buzz top search terms this week is Xmas. As you can see, Xmas is hit or miss, safety wise. We here at SiteAdvisor wish you a merry and safe Christmas.

December 30, 2005 Update

Post Christmas update. I checked back in with Yahoo Buzz to see the climbers and decliners. Ipods must have been a hot item this year. Search term "Free itunes" was up +457% as of 12/27. Do me a favor though, and stay away from the top red link below. It re-directs you to everyfreegift.com. Get your free itunes from Apple, not here. Happy New Year.

Yahoo_FreeIpod_XMas.gif

December 20, 2005

What We've Learned So Far

Posted by Kelly Ford at 05:35 PM

You know when you talk to a friend and they sometimes sort of just keep nodding and get that glazed look in their eyes? You can tell that nothing you’re saying is really sinking in. (or does that only happen to us when we start yammering on about the latest adware distribution site we discovered during last night’s automated crawl?)

While our obsession with Web safety may sometimes induce Glazed Eye Syndrome at dinner parties, we want to assure you that when you have something to say about our site, our software, our data, or our approach to online safety, you have our full attention.

Since we launched our Preview Version, we’ve received lots of valuable feedback from our testers. Some feedback has been submitted through the various feedback areas of our site. Other feedback was directly e-mailed to us from friends. Some came in forum discussions about our product. We pay close attention to all of it.

Today we wanted to share some of your early feedback, clarify a few points of confusion, and tell you about some product improvements we’re already working on as a direct result of your suggestions:

* One forum poster noted that we missed a site’s IE exploit attempt, and was absolutely right. Our bad for not making it clear that we are still testing browser exploit detection in our labs. None of the data on our site includes exploit detection yet. We hope to include exploit data in our January release.

* A question came up about how we differentiate between primary domains (e.g.: www.site.com) and subdomains (e.g.:www.shop.site.com) which sometimes are different enough that they could be considered completely separate sites. We realize that a home page doesn't completely encapsulate a domain. We're currently building a list of sites for which that is the case. You can also submit a subdomain for us to prioritize for testing.

* One user wrote in to say that a member of his family who is ‘red/green’ color-blind (the most common form of color blindness) had trouble distinguishing our red and green safety ratings. As a result of that feedback we’ll be adding an option in a subsequent release to give users a choice of an alternate color or pattern for our safety ratings.

* The question came up about how we would incorporate user ratings, and whether users might be able to “game‿ our test results by claiming bad sites were good or vice versa. We’re working on an approach to this issue and will probably introduce a rating system so that users who have provided helpful, verified feedback in the past will have their comments count more heavily than other users. We’re also going to be introducing a ‘SiteAdvisor expert’ (working name) user classification to recognize users who show particular interest and expertise in helping us improve our site ratings.

* We have developed an API for developers to access our data, but have not yet released it.

* Some forum posters noted that we had a site report for 127.0.0.1. As much as we like recursion, we agree that this isn't too helpful. Someone must have submitted this to our crawling queue, and our obedient crawler promptly crawled itself. We've added a check for non-external ips and URLs to prevent this in the future.

* And finally, we know that there are many sites and downloads which our crawlers haven't yet tested. We just scaled up from roughly 10 to 80 servers (to the delight of our Dell rep) which should substantially increase our site coverage very soon. In the meantime, you can also submit a Web site or submit a download to us for prioritized testing.

Thanks for all your feedback to date... and please keep it coming!

December 16, 2005

Red By Association

Posted by Shane Keats at 06:37 PM

When we started SiteAdvisor last spring, we thought that our job would be relatively straightforward: sign up for stuff and download stuff and tell you the results. We knew it would be hard to do in practice, but at least it was a relatively predictable problem to tackle. Along the way towards implementation, we realized that the nature of the Web’s sketchy and suspicious practices was more complex, less transparent and more dangerous than any of us first thought.

Michael Kearns is a computer science professor at the University of Pennsylania. Before he joined UPenn, he spent a decade doing artificial intelligence and machine learning research at AT&T Labs and Bell Labs. He’s one of a handful of true pioneers in these fields.

Now, one of the byproducts of the millions of tests our Web bots conduct is an enormous data set we’ve built, not just of adware bundles or spam factories, but of relationships between Web sites. Michael and his grad student Jenn Wortman helped us approach this data in a novel way. Take a look at Screensaver.com for a second.

screensaver_home_small.gif

We initially rated screensaver.com ‘Green’ – safe to use for browsing, signing-up and downloading. Yet after downloading screen savers from here, our PC started popping up contextual ads.

Here’s what’s really happening:

screensaver dot com freeze highlight_small.jpg


From my user perspective, I’m on a site called screensaver.com, downloading a piece of software from them. From a technical perspective, however, my PC is actually calling a host computer run by freeze.com. Not only don’t I notice this, but even if I do, it won’t help. As an average user, I don’t know anything about freeze.com.

But our database does. What Michael and Jenn helped us realize is that we could use the data from our Web crawl to help users understand where they really are on the Web. This guidance will in turn help users make better, more informed decisions about whom and what they can trust online.

Defining Links
Enter Matt Gattis, a young developer who joined us from MIT. “What defines a ‘bad’ link?" Matt asked. He developed an algorithm for measuring the degree of association between two sites by looking at their linking relationships. And because machines running Matt’s code can’t be fooled by link obfuscation and other social engineering tricks, SiteAdvisor is able to see patterns and relationships that were effectively invisible to the human eye. What we’ve done with link analysis is make the Web more transparent. In fact, we think we’ve created something kind of cool.

The Weakest Links
Here’s how SiteAdvisor’s link analysis works in practice. Take a look at our link diagram for Screensaver.com:

ScreenSaver.com-links.gif

Among many other things, our link analysis shows some basic relationships between sites. For example, the short arrow to freeze.com documents that the biggest ‘target’ for screensaver’s out-bound links is freeze. (In fact, freeze bought screenscaver.com in 2003 from risoftsystems, another red flagged friend. According to a freeze.com press release the sale included a five year “sponsorship contract highlighting RISS products.")

Improving the Odds
In an ideal world, users get full disclosure. Web sites not only tell the user what’s being installed, they disclose where the install is coming from in a way that’s meaningful to the non-technical user. I for one am not holding my breath. As a practical matter, without our link data, users are effectively browsing while blind. Clicking through to an unknown site is like betting it all on black. Heaven forbid if the marble lands on red. I’m not here to argue against aimless browsing; I love the serendipitous Web discovery. The problem with surfing blindly is that within three or four clicks, you can find yourself in places where all safety bets are off.

With SiteAdvisor, I know if the site I’m on engages in link practices that can land me in hot water. Browsing with our link analysis data is like going to a party where the only person you know happens to be the most social person in the room. He can tell you who’s friends with whom, who’s hooking up and who has trouble holding their liquor. Good person to know.

December 09, 2005

The Down Low on Nasty Downloads

Posted by Kelly Ford at 02:15 PM

It is the software with a million names: Spyware. Adware. Contextual advertising software. Behavioral targeting code. The ungainly but lawyerly Potentially Unwanted Program. Malware.

I’m not raising the nomenclature issue to be flip. Being labeled “spyware" can mean millions of dollars in lost revenue for a program’s publisher. Labeling something “spyware" can mean millions of dollars in legal fees for the one doing the labeling. The money issue alone makes these important debates to have, no doubt.

But for the average Web consumer, all this name calling is supremely unhelpful. When a user is facing a download decision, he just wants to know whether it’s going to muck up his machine. This spring, SiteAdvisor set out to develop a way to alleviate the mystery (and the misery) that goes along with these decisions. 100,000+ tested downloads later, we think we’ve got something that will really help the average Web user. In fact, when it comes to popular downloads, we believe we’ve got the only truly objective, comprehensive dataset on what they do to users’ computers.

kazaa.jpg

Testing, Testing, One, Two, Three
Before I can tell you how we test downloads, I need to tell you what downloads we test. For SiteAdvisor purposes, a download is a program which can make your computer do something significant. In geekspeak, we look for executables like exe’s, scr’s and msi’s. Compressed files are also extracted and scanned for executables.

Now there are lots of files that can be downloaded that we don’t test for. At least not yet. For example, we don’t analyze audio or video files or Microsoft Word documents or graphic formats. So we’re not testing Jane’s resume or John’s photos from his trip to the Grand Canyon. If you think there’s a file format we should be testing, let us know. And if there’s a specific download you’d like us to test, if you’re curious about an untested download from MyFavoriteGames.com, for example, submit the link by going to their SiteAdvisor summary page.

Mount Up, Troops
So, on to the tests themselves. Once again, our ‘bots take center stage. Every day, thousands of times a day, our brave digital warriors power up their PCs and go forth to expose themselves to the best and worst the Web has to offer.

Once we find a program to download, we install it onto a “clean" PC. What’s a clean PC? SiteAdvisor designed a system using "virtual machines" that allows us, in effect, to use a "new" computer once and only once to test one and only one download. This way, we are absolutely certain that whatever happens to that machine can only be the result of that one software installation.

Adware Inc.jpg

How bad is it, Doc?
After we find and install the program, we run the computer through a series of tests, measuring and documenting our findings at each step of the way. Essentially, we’re taking the computer’s temperature. Is it sick? If so, how badly?

With the program running, we put the PC through its browsing paces, visiting a series of Web sites selected because they’re popular and because they’re the kind of sites (i.e. travel, financial, gaming) that commonly trigger advertising. We also look for and document whether our browser settings have changed. For example, have our home page or search engine defaults been reset? Our goal is to show you how your browsing experience will be affected if you install the software in question.

nuisance_meter_old.gif

We also summarize the download’s overall impact on a computer by displaying its 1-to-10 Nuisance Score. The one above is for an Aaliyah screensaver we downloaded from EntertainmentWallpaper.com. The Nuisance Score is SiteAdvisor’s proprietary synthesis of all the data we’ve collected on a download. It’s an at-a-glance guide to help you decide whether to download a program. Low scores result from minor nuisances like changed home pages. Higher scores result from bundled things like adware or viruses. Bundling more than one low-score nuisance can push a rating into the red zone as well.

You talking to me?
Often, malicious or annoying software can be identified by its digital "signature," the unique changes it makes to a computer's operating system. Since we use new computers for each download, our system registry always starts clean. If we detect any changes made there or to our system files, we show you every addition, deletion and modification. ScenicReflections offers a "Soothing Sunsets" screensaver, for example, that may look quiet on your monitor, but behind the scenes, it's anything but.

registry-changes-old.gif

Likewise, SiteAdvisor watches and documents which network servers are contacted by the downloaded program. The presence of network traffic alone does not signal badness. It’s which servers are being called and how many of them are associated with malware. Again, the goal of this data is to give you a common sense check against software that takes "liberties" with your Internet connection. For example, we downloaded one program that contacted more than 50 servers.

network_activity_old.gif

Best Face Forward?
Like my email blog earlier in the week, this is another long piece of writing, but I had a lot of ground to cover. I hope it gives you a good sense of how we arrive at our test results for program downloads. One question that I get a lot is whether our ratings ever change. Some people point to the noise being made these days by contextual advertising companies who claim they’re cleaning up their acts. One of the great things about working here is that we can put those claims to the test. But that’s for the future.

--Shane Keats

December 06, 2005

Tracking Spam Back to its Roots, SiteAdvisor Style

Posted by Kelly Ford at 06:00 PM

One of the things we spend a lot of time thinking about is how to help you better understand the practical implications of your everyday online activities, so you can make more informed decisions about where you search, browse, and share your personal information.

We emphasize practical, because there are many cases when a Web site or e-mailer is complying with the letter of the law even as they install intrusive adware on your system or send you a barrage of annoying advertisements by e-mail. The law may not be on your side in these cases, but SiteAdvisor is.

Today we’ll explain how we track and display the e-mail practices of Web sites in order to help you avoid what most people would call “spam" (even if the law doesn’t always strictly agree with that definition).

Bot # 213, grab another new e-mail address off the shelf
In our last entry we introduced you to our automated Web bots who spend all day, every day, crawling the Web looking for things to test. When our crawlers encounter a Web site with a page asking for an e-mail address, they’re only too happy to comply. They choose a brand-spankin’ new e-mail address, specially constructed so that the odds of accidentally sending to that address are extremely low. After entering the new e-mail on the page, they take note to never use that e-mail address again. Then they continue filling in other information if requested, and click 'submit.' (Sidenote: We don’t submit fake information on these forms; we actually use real data.) At that point, we know that any e-mail we receive at the e-mail address our bots chose is a direct result of our signing up on that site.

Real people don’t read every bit of tricky fine print, so our bots don’t either
One thing we don’t do when we provide our e-mail address on Web sites is try to uncheck every pre-selected box which may be asking if we want to receive ‘partner offers’ or other such spam-bombs. Our philosophy, consistent with what responsible e-mail marketers consider to be a best practice, is that consumers should be asked to choose what they want ( “opt-in") rather than be forced to say what they don’t want ( “opt-out.")

Plenty of sites may claim that it was your fault you forget to uncheck all those ‘send me this’ and ‘send me that’ boxes when you originally signed up. But let’s face it, many times those opt-out boxes are easy to overlook. (One of my personal pet peeves: I’m completing a registration form and remember to opt-out of ‘partner communications’, but receive an error after clicking because I forgot my home phone or my credit card’s expiration date or whatever… and when the site bumps me back to correct the problem, it has also rechecked all those marketing boxes again. Grrrrrrr).

Let the barrage begin
Having provided our unique e-mail address to a Web site, we can now track the volume and spamminess of future e-mails that the site sends us, and report that information back to you.

email thermometer.jpg

E-mail volume is pretty straightforward. Our computers log every e-mail we receive and map it back to the original place it came from. We then compute the resulting rate (e-mails per week or per month) that you can expect to receive if you signed up at the same site. We even show you a sample in-box to document our tests and help you better understand what your own in-box is likely to look like after you sign up at a particular site.

Sample-Inbox-Old.gif

Spamminess is a bit trickier. To determine an e-mail's spamminess score, we start our analysis with a software program called SpamAssassin. SpamAssassin rates individual e-mails on a variety of criteria and assigns scores based on whether the e-mail is more or less likely to be what most people would consider spam. For example, SpamAssassin evaluates an e-mail's commercial content and whether the e-mail employs tricks known to be used by spammers attempting to get through spam filters. Specific spammy actions trigger higher scores.

Somewhat confusingly, SpamAssassin scores range all over the place. I’ve seen scores of -4 (for a non-spam e-mail that also comes from a bonded sender) to +40 (for a spammy e-mail that fails all sorts of spamminess tests). Any individual e-mail that scores +5 or more is likely to be considered spam by most people.

SiteAdvisor shows you the average SpamAssassin ratings for all the e-mails we receive from a particular sender. This gives you an overall sense of how spammy your inbox is likely to become after submitting your e-mail address to a particular e-mail sender. We report this exact average on our detailed site analysis pages but also simplify it visually on our ‘Low to High’ spamminess scale, as shown above. (So you know, the vast majority of our scores fall between 0 and +20.)

Out with the Inbox
One way to explain all this is to look at a couple of individual e-mails.

We received one e-mail headlined: “Gas Bills Paid For One Year - Valued at $1800". Inside, we found the following blinking “Click Here Now!" graphic:

free gas spam_60percent.JPG

SpamAssassin gave the e-mail a score of +20.5 because it triggered so many of their spam sensors. For example, unbeknownst to the human reader, the e-mail is riddled with invisible text. Embedded in the e-mail are lots of links, often a sign that it’s full of commercial pitches. The image dominates the “square footage" of the e-mail, again signaling strong commercial content. And the sender is part of a black list of known spammers compiled by anti-spam activists.

Each of these clues, along with several others, bumps up the e-mail’s score well past the +5 it needs to earn to be considered spammy.

By contrast, here’s part of an e-mail we received after signing up with a home builder:

Thank you for contacting Dominion Homes. This is a confirmation email to let you know that your inquiry has been received and a Dominion Homes representative will follow up with you shortly. We look forward to learning more about how Dominion Homes can help you in your new home search.

First off, the e-mail didn’t trigger any spam sensors. More than that, SpamAssassin’s statistical analysis of the actual words used reduced its overall score to -2.6.

Let’s say, for the sake of the this blog entry, that both e-mails came from the same sender. We would average the scores together to +8.95. Given that score, we would consider this provider to have questionable e-mail practices.

Hold your breath…
Where all this really gets interesting is analyzing the various players involved in each piece of e-mail which reaches us, particularly e-mail with a high spamminess rating. By comparing the sender of each e-mail with the place where we originally signed up, we can determine which companies are directly or indirectly doing business with one another to sell, rent, or give away e-mail addresses. We are practically bursting to share some of the fascinating things we’ve already discovered on this topic, but sorry- you’ll just have to wait. This blog entry is long enough already.

December 02, 2005

A Little Help from Our Friends

Posted by Kelly Ford at 09:00 AM

Yesterday we somewhat quietly launched a password-protected ‘Preview Version’ of our Web site and software, and we’ve invited friends and colleagues to try it prior to our public launch in early 2006. (Want to join the fun and volunteer to be a Preview Version tester?)

Notice we didn’t call our Preview Version ‘beta’, even though our previous development version was called ‘alpha.’ Beta just feels a little dated these days. Remember Betamax? Plus, having lapped the roman alphabet, all the hurricanes this year are now giving greek letters a bad name.

Anyway, the point of the Preview Version is to hear from you, our pioneering early users, about what you like, what you don’t, and what we can do better. If it’s a cool feature and you love it, let us know. If the whole thing just seems confusing and makes your head hurt, we’ll probably blame it mostly on you, but really, we can take it.

You’d be amazed how much changed even in the few days prior to yesterday’s launch, often as a result of helpful feedback from friends who aren’t thinking about this stuff 24/7 like we have been. Perspective is great.

As an example: One group of folks we’d like to say a special ‘thanks but so-long' to are the bots we featured on our homepage during our alpha version. The idea was to give some front page credit to our tireless automated Web crawlers who have already tested a million plus Web sites, downloads, and e-mail registration forms. But some workers, particularly those which run on alternating current, are perhaps best left out of the limelight.

bots.jpg
So long, troopers. You’re axed from the homepage. Still counting on you for a few hundred million more tests next year though, so back to work.

The end of a brief moment of glory for our happy bots came recently when a friend, quite rightly, noted, “Guys, come on -- regular people don’t associate robots with Web safety. Just illustrate on the homepage how your product works."

Great ideas are so obvious in retrospect.

So, we want to thank you in advance for helping us during this Preview Version period to surface all those great improvement ideas which will be just SO obvious looking back a few months from now.

Already have some ideas or feedback? Please send it to us. We’re listening.

December 01, 2005

Welcome to the Site Advisor blog!

Posted by Kelly Ford at 08:00 AM

Who the heck are we?

Well, first, we are a group of people who wanted to create a company to help fight the bad guys who are making it harder to enjoy and trust the Web.

Second, we’re mostly people with technical backgrounds who wanted to take on this very formidable challenge.

Third, we are the kind of people who love data. We like knowing that registering with Site X will result in getting 310 e-mails per week, or that downloading Toolbar Y will also install adware on your machine which will give you pop-ups out the wazoo.

Don’t get us wrong here –- our goal is not to stay in the technical weeds and be known for spewing out a bunch of obscure data about the safety of Web sites. Our objective is to make user-friendly safety recommendations that are crystal clear but which are also backed up by lots of cool testing techniques and water-tight analysis.

Oh, and we like conversation too, peppered with a healthy bit of controversy. We encourage both Web site users and Web site owners to comment on, add to, or challenge our ratings. Other than filtering for a few four letter words, we’ll post comments, unedited, to create an atmosphere of open discussion and continual improvement.

A natural question to ask is: don’t we already have enough tools for online safety and security? Certainly there are a lot of tools out there. Anti-virus, anti-spam, spyware removers, personal firewalls. But these tools almost all deal with problems after they start occurring. You sign up for a new newsletter and then receive endless spam. Or you get infected with spyware and your machine slows to a crawl. SiteAdvisor is different. It helps you make better decisions before the bad stuff happens.

Here’s an analogy that might help. When you move into a new city, you change the locks on your doors (e.g. you install a firewall and anti-virus software). But when you walk around at night, you use your “street smarts" to avoid dark alleys and questionable neighborhoods. Over time, you talk to old timers and neighbors and learn what streets to avoid. On the Web, it’s different. For one thing, the Web is simply too big. No matter how long you surf, there will always be neighborhoods new to you. For another, it’s too easy for the bad guys to try to scam you with fake sites which look almost identical to the real thing. Most peoples’ “street smart" sensors simply aren’t sensitive enough to pick up the clues.

SiteAdvisor is your street smarts helper. We’re trying to help both novice and advanced Internet users make better decisions on the Web –- where to safely browse, what to safely download, where it is safe to give out personal information or shop.

We have lots to tell you about exactly how we're going about the challenge of helping you stay safe online, and we have even more to tell you about the good, the bad, and the ugly that we've already found out there. But all in good time.

Before we sign off: We just launched a Preview Version of our Web site and our software today. (If you'd like to request a password for it, you can do so here.) We’ll be using your feedback during our Preview Version test to refine and improve our Web site and our software prior to our launch in early 2006. More on how we could use your help as Preview Version testers in our next entry.