Anonymity Browser fingerprinting. How to track users on Network. [PART 1]

Serafim · Feb 28, 2021

I've always been bothered by the way Google AdSense compulsively served contextual ads based on my old search queries. It seems that quite a lot of time has passed since the search, and cookies and browser cache were cleared more than once, but ads remained. How did they keep tracking me? It turns out that there are plenty of ways to do this.

Short introduction

User identification, tracking, or simply web tracking involves calculating and setting a unique identifier for each browser that visits a particular site. In General, initially it was not intended as some kind of universal evil and, like everything else, has a reverse side, that is, it is designed to bring benefits. For example, allow site owners to distinguish regular users from bots, or provide the ability to store user preferences and apply them during a subsequent session. But at the same time, the advertising industry really liked this opportunity. As you well know, cookies are one of the most popular ways to identify users. And they have been actively used in the advertising industry since the mid-nineties.

Since then, a lot has changed, technology has gone far ahead, and currently tracking users with cookies alone is not limited. In fact, there are many ways to identify users. The most obvious option is to set some identifiers, such as cookies. The next option is to use data about the PC used by the user, which can be obtained from the HTTP headers of sent requests: address, type of OS used, time, and so on. Finally, you can distinguish the user by their behavior and habits (cursor movements, favorite sections of the site, and so on).

Explicit identifiers

This approach is quite obvious. all that is required is to store some long-lived identifier on the user's side, which can be requested during a subsequent visit to the resource. Modern browsers provide enough ways to do this transparently for the user. First of all, these are good old cookies. Then there are the features of some plugins that are similar in functionality to cookies, for exampleLocal Shared Objects, in flash or Isolated StorageSilverlight. HTML5 also includes several client-side storage mechanisms, includinglocalStorage, Fileand IndexedDB API. In addition to these locations, unique tokens can also be stored in cached resources on the local machine or cache metadata (Last-Modified,ETag). In addition, you can identify the user by using fingerprints obtained from Origin Bound certificates generated by the browser for SSL connections, data contained in SDCH dictionaries, and metadata from these dictionaries. In a word, there are plenty of opportunities.

Cookies

When it comes to storing some small amount of data on the client side, cookies are the first thing that usually comes to mind. the Web server sets a unique identifier for the new user, storing it in cookies, and for all subsequent requests, the client will send it to the server. Although all popular browsers have long been equipped with a user-friendly interface for managing cookies, and the Network is full of third-party utilities for managing them and blocking them, cookies are still actively used for tracking users. The fact is that very few people view and clean them (remember the last time you did this). Perhaps the main reason for this is that everyone is afraid to accidentally delete the necessary "cookie", which, for example, can be used for authorization. Although some browsers allow you to restrict the installation of third-party cookies, the problem persists, since browsers often consider cookies received via HTTP redirects or other methods during page content loading to be" native". Unlike most of the mechanisms we'll discuss later, the use of cookies is transparent to the end user. In order to "mark" a user, it is not even necessary to store a unique identifier in a separate cookie - it can be collected from the values of several cookies or stored in metadata, such as Expiration Time. Therefore, at this stage, it is quite difficult to figure out whether a specific cookie is used for tracking or not.

Local Shared Objects

Adobe Flash uses the LSO mechanism to store data on the client side . It is an analog of cookies in HTTP, but unlike the latter, it can store not only short fragments of text data, which, in turn, complicates the analysis and verification of such objects. Before version 10.3, the behavior of flash cookies was configured separately from the browser settings: you had to visit the Flash settings Manager located on the site macromedia.com(by the way, it is still available at the following linkToday, this can be done directly from the control panel. In addition, most modern browsers provide fairly tight integration with the flash player: for example, when deleting cookies and other site data, lsos will also be deleted. On the other hand, the interaction of browsers with the player is still not so close, so setting the browser policy for third-party cookies will not always affect flash cookies (on the Adobe website, you can see how to manually disable them).
Deleting data from localstorage in Firefox.

Isolated Silverlight storage

The Silverlight software platform has quite a lot in common with Adobe Flash. So, an analog of Local Shared Objectsa flash drive is a mechanism called Isolated Storage. However, unlike the flash, the privacy settings here are not tied to the browser in any way, so even if the cookies and browser cache are completely cleared, the data stored in Isolated Storage, will still remain. But even more interesting is that the storage is shared by all browser Windows (except those opened in Incognito mode) and all profiles installed on the same machine. As with LSO, there are no technical barriers to storing session IDs. However, given that it is not yet possible to reach this mechanism through the browser settings, it has not become so widely used as a repository for unique identifiers.

Where to look for isolated Silverlight storage

HTML5 and data storage on the client

HTML5 provides a set of mechanisms for storing structured data on the client. These include localStorage, the File API, and IndexedDB. Despite their differences, they are all designed to provide permanent storage of arbitrary chunks of binary data tied to a specific resource. Plus, unlike HTTP and Flash cookies, there are no significant restrictions on the size of stored data. In modern browsers, the HTML5 storage is located along with other site data. However, it is very difficult to guess how to manage the storage via the browser settings. For example, to delete data from localStorage in Firefox, the user will have to choose offline website data or site preferences and set the time interval to everything. Another unusual feature that is unique to IE is that data exists only for the lifetime of tabs opened at the time of saving them. Plus, the above mechanisms don't really try to follow the restrictions that apply to HTTP cookies. For example, you can write to localStorageand read from it via cross-domain frames, even if third-party cookies are disabled.
Configuring local storage for Flash Player.

Cached objects

Everyone wants the browser to work fast and without brakes. Therefore, it has to store the resources of the visited sites in the local cache (so as not to request them during a subsequent session). Although this mechanism was clearly not intended to be used as a random access storage, it can be turned into one. For example, the server can return a JavaScript document to the user with a unique identifier inside its body and set it in the headers Expires / max-age= the distant future. This way, the script and its unique identifier will be stored in the browser cache. After that, it can be accessed from any page on the Network, simply by requesting the script to be downloaded from a known URL. Of course, the browser will periodically use the header to ask If-Modified-Sinceif a new version of the script is available. But if the server returns the 304 code (Not modified), then the cached copy will be used forever. What else is interesting about the cache? There is no concept of "third-party" objects, as, for example, in the case of HTTP cookies. At the same time, disabling caching can seriously affect performance. And automatic detection of tricky resources that store some identifiers/tags is difficult due to the large volume and complexity of JavaScript documents found on the Web. Of course, all browsers allow the user to manually clear the cache. But as practice shows (even our own example), this is not done so often, if at all.

Anonymity Browser fingerprinting. How to track users on Network. [PART 1]

Serafim

Short introduction​

Explicit identifiers​

Cookies​

Local Shared Objects​

Isolated Silverlight storage​

HTML5 and data storage on the client​

Cached objects​