ADOTAS – Considered a major privacy violation since a University of California, San Diego study publicized the practice last year, history sniffing or stealing refers to computer code embedded in tracking beacons that reads links on a page to get clues into browser history. You know the drill: pink means you’ve visited before, blue means you haven’t.
I think sniffing is a better term because stealing connotes actually hijacking a user’s history folder. But like most things in online behavioral advertising, it’s painted several thick shades of gray — and anti-tracking advocates are likely to use the more damning term to plead their case.
You can imagine that Epic Marketplace was nonplussed when Jonathan Mayer of the Stanford Security Lab (SSL) within the Stanford Law School Center for Internet and Society claimed he caught Epic history stealing (his phrase) on websites Flixster and Charter.net.
The SSL was testing the JavaScript instrumentation in its new web measurement platform when it found the violations. The crew reverse-engineered the history-sniffing script and reported the following features:
- The script is fast. Thousands of links are tested per second.
- Links are added in an invisible iframe; there is no apparent effect on the page layout.
- The script dynamically loads lists of URLs and associated interest segments using JSONP.
- Progress is stored in a cookie so the script can resume where it left off.
- The script sets a cookie indicating when it was last run; it will not history steal more than once every twenty-four hours.
- If history stealing is still in progress when the window is closed (e.g. the user navigates to another page) the script sends its findings before ending execution.
- The script slows down if a URL list takes over two seconds to process.
- To prevent multiple history stealing attempts in parallel, the script uses a mutex cookie.
- The script does not directly report the URLs that it detects the user has visited; it sends a deduplicated list of the interest segments associated with the visited URLs.
Epic has fired back:
The practice described in the blog, better labeled as “segment verification” (indeed, as admitted in the blog, no URLs or URL history is actually collected) provides companies with a way to measure the accuracy of the data that a company purchases from data vendors without compromising consumer privacy. NO data obtained from segment verification is personally identifiable information (PII), nor is that data ever merged with other data points that are, or may be, personally identifiable.
Mayer and his Stanford Security Lab team are building a platform for measuring dynamic web content, which also may serve as an automated enforcement tool for the Do Not Trackuniversal web tracking opt-out app that detects third-party tracking via methods such as cookies and fingerprinting.
Last week the team caused a stir in the behavioral targeting world — testing out this platform, the Stanford researchers identified beacons for 64 of the 75 companies signed up for the Network Advertising Initiative’s self-regulated online behavioral advertising program. After loading content featuring the beacon, the researchers opted out of OBA tracking via the NAI website and reloaded the content. Then they enabled the Do Not Track app and reloaded the content again. The team discovered that half of the NAI companies left their tracking cookies in place while 10 companies deleted their cookies.
But the industry responded that Mayer and his crew were seriously missing the nuance of the situation — not all tracking cookies are for behavioral targeting. Networks like Vibrant Media and Undertone argued that their practices were in line with NAI guidelines, which allow cookies to remain for collection of non-OBA data used in frequency capping and other practices unrelated to behavioral targeting.
NAI Executive Director Chuck Curran echoed these claims and while arguing that the Stanford study was unfairly blurring the lines between industry self-regulatory behavioral targeting initiatives and across-the-board tracking roadblocks:
[It’s] important to draw a fair distinction between existing industry self regulatory commitments to limit ad targeting based on user interests, and the views of proponents of newly emerging “Do Not Track” technologies who argue that advertising companies should be stopped from collecting any data. We’ve been following with interest the recent introduction by browser manufacturers of “Do Not Track” features that promise in various degrees to limit browser data collection. A robust debate is going on about how these browser features might be integrated with the existing industry self-regulatory programs, but for now there is no universally agreed upon definition of what “Do Not Track” means.
These two incidents certainly suggest that Do Not Track initiatives and industry self-regulatory efforts are at loggerheads, and an already muddy issue is getting increasingly messier. Consider in addition that people have recently noticedthat websites are under no obligation to respect Firefox’s Do Not Track feature — it’s not really in publishers’ interest regarding their own data collection.
Is a compromise between DNT and industry opt-out realistic? Or should we start taking bets on which side will win? I.e., better sweet-talk government regulators and legislators.