Usage data collected by content providers for the usage reports to be sent to customers should meet the basic requirement that only intended usage is recorded and that all requests that are not intended by the user are removed.
Because the way usage records are generated can differ across platforms, it is impractical to describe all the possible filters and techniques used to clean up the data. This Code of Practice, therefore, specifies only the requirements to be met by the data to be used for building the usage reports.
Only successful and valid requests MUST be counted. For web server log files successful requests are those with specific W3C Status Codes, codes (200 and 304). The standards for return codes are defined and maintained by W3C (http://www.w3.org/Protocols/HTTP/HTRESP.html). If key events are used, their definition MUST match the W3C standards. (For more information see The Friendly Guide to Release 5: Technical Notes for Content Providers.)
The intent of double-click filtering is to remove the potential of over-counting which could occur when a user clicks the same link multiple times, typically due to a slow internet connection. Double-click filtering applies to Total_Item_Investigations, Total_Item_Requests, No_License and Limit_Exceeded. See Sections 7.3 and 7.4 below for information about counting unique items and titles. The double-click filtering rule is as follows:
Double-clicks, i.e. two clicks in succession, on a link by the same user within a 30-second period MUST be counted as one action. For the purposes of COUNTER, the time window for a double-click on any page is set at a maximum of 30 seconds between the first and second mouse clicks. For example, a click at 10:01:00 and a second click at 10:01:29 would be considered a double-click (one action); a click at 10:01:00 and a second click at 10:01:35 would count as two separate single clicks (two actions).
A double-click may be triggered by a mouse-click or by pressing a refresh or back button. When two actions are made for the same URL within 30 seconds the first request MUST be removed and the second retained.
Any additional requests for the same URL within 30 seconds (between clicks) MUST be treated identically: always remove the first and retain the second.
There are different ways to track whether two requests for the same URL are from the same user and session. These options are listed in order of increasing reliability, with Option 4 being the most reliable.
Some COUNTER Metric_Types count the number of unique items that had a certain activity, such as a Unique_Item_Requests or Unique_Item_Investigations.
For the purpose of COUNTER metrics, an item is the typical unit of content being accessed by users, such as articles, book chapters, book sections, whole books (if delivered as a single file), and multimedia content. The item MUST be identified using the unique ID which identifies the work (e.g. chapter or article) regardless of format (e.g. PDF, HTML, or EPUB). If no item-level identifier is available, then use the item name in combination with the identifier of the parent item (i.e. the article title + ISSN of the journal, or chapter name + ISBN of the book).
The rules for calculating the unique item counts are as follows:
If multiple transactions qualifying for the Metric_Type in question represent the same item and occur in the same user-sessions, only one unique activity MUST be counted for that item.
A user session is defined any of the following ways: by a logged session ID + transaction date, by a logged user ID (if users log in with personal accounts) + transaction date + hour of day (day is divided into 24 one-hour slices), by a logged user cookie + transaction date + hour of day, or by a combination of IP address + user agent + transaction date + hour of day.
To allow for simplicity in calculating session IDs, when a session ID is not explicitly tracked, the day will be divided into 24 one-hour slices and a surrogate session ID will be generated by combining the transaction date + hour time slice + one of the following: user ID, cookie ID, or IP address + user agent. For example, consider the following transaction:
The above replacement for a session ID does not provide an exact analogy to a session. However, statistical studies show that the result of using such a surrogate for a session ID results in unique counts are within 1–2 % of unique counts generated with actual sessions.
Some COUNTER Metric_Types count the number of unique titles that had a certain activity, such as a Unique_Title_Requests or Unique_Title_Investigations.
For the purpose of COUNTER metrics, a title represents the parent work that the item is part of. When the item is a chapter or section, the title is the book. The title MUST be identified using a unique identifier (e.g. an ISBN for a book) regardless of format (e.g. PDF or HTML).
The rules for calculating the unique title counts are as follows:
If multiple transactions qualifying for the Metric_Type in question represent the same title and occur in the same user-session only one unique activity MUST be counted for that title.
A user session is defined any of the following ways: by a logged session ID + transaction date, by a logged user ID (if users log in with personal accounts) + transaction date + hour of day (day is divided into 24 one-hour slices), by a logged user cookie + transaction date + hour of day, or by a combination of IP address + user agent + transaction date + hour of day.
To allow for simplicity in calculating session IDs, when a session ID is not explicitly tracked, the day will be divided into 24 one-hour slices and a surrogate session ID will be generated by combining the transaction date + hour time slice + one of the following: user ID, cookie ID, or IP address + user agent. For example, consider the following transaction:
The above replacement for a session ID does not provide an exact analogy to a session. However, statistical studies show that the result of using such a surrogate for a session ID results in unique counts are within 1–2 % of unique counts generated with actual sessions.
Content providers that offer databases where a given content item (e.g. an article) is included in multiple databases MUST attribute the Investigations and Requests metrics to just one database. The following recommendations may be helpful when choosing when ambiguity arises:
Search activity generated by federated search engines MUST be categorized separately from searches conducted by users on the host platform.
Any searches generated from a federated search system MUST be included in the separate Searches_Federated counts within Database Reports and MUST NOT be included in the Searches_Regular or Searches_Automated counts.
The most common ways to recognize federated search activity are as follows:
COUNTER provides lists of user agents that represent the most common federated search tools. See Appendix G.
Search activity generated by discovery services and other systems where multiple databases not explicitly selected by the end user are searched simultaneously MUST be counted as Searches_Automated on Database Reports. Such searches MUST be included on the Platform Reports as Searches_Platform, but only as a single search regardless of the number of databases searched.
Example: A user searches a content site where the librarian has pre-selected 20 databases for business and economics searches. For each search conducted by the user:
Activity generated by internet robots and crawlers MUST be excluded from all COUNTER usage reports. COUNTER provides a list of user agent values that represent the crawlers and robots that MUST be excluded. Any transaction with a user agent matching one on the list MUST NOT be included in COUNTER reports.
COUNTER maintains the current list of internet robots and crawlers at https://github.com/atmire/COUNTER-Robots
Only genuine, user-driven usage MUST be reported. COUNTER reports MUST NOT include usage that represents requests of full-text content when it is initiated by automatic or semi-automatic bulk download tools where the downloads occur without direct user action.
Text and data mining (TDM) is a computational process whereby text or datasets are crawled by software that recognizes entities, relationships, and actions. (STM Statement on Text and Data Mining)
TDM does NOT include straightforward information retrieval, straightforward information extraction, abstracting and summarising activity, automated translation, or summarising query-response systems.
A key feature of TDM is the discovery of unknown associations based on categories that will be revealed as a result of computational and linguistic analytical tools.
Principles for reporting usage:
Detecting activity related to TDM:
TDM activity typically requires a prior agreement between the content provider and the individual or organization downloading the content for the purpose of text mining. The content provider can isolate TDM-related traffic using techniques like:
Harvesting of content for TDM without permission or without using the endpoint or protocol supplied by the content provider MUST be treated as robot or crawler traffic and MUST be excluded from all COUNTER reports.