mobile-menu mobile-menu-arrow Menu
 

The COUNTER Code of Practice for Research Data

The Code of Practice for Research Data Usage Metrics standardizes the generation and distribution of usage metrics for research data, enabling for the first time the consistent and credible reporting of research data usage.

COUNTER welcomes input and feedback from the community on this first iteration, so that it can be further developed and refined.

 

 
Code of Practice
Text size:   /
Glossary

Appendix A: Glossary of Terms for CoPRD

Aligned as much as possible with the COUNTER Code of Practice Release 5 glossary.

Abstract See Description.
Access_Method A COUNTER attribute indicating whether the usage related to investigations and requests was generated by a human user browsing and searching a website (Regular) or by a computer (Machine).
Author(s) See Creator
Collection A curated collection of metadata about content items.
Component A uniquely identifiable constituent part of a content item composed of more than one file (digital object).
Content item A generic term describing a unit of content accessed by a user of a content host. Typical content items include articles, books, chapters, datasets, multimedia, etc.
Content provider An organization whose function is to commission, create, collect, validate, host, distribute, and trade information in electronic form.
Creator(s) The person/people who wrote/created the datasets whose usage is being reported-
Data repository A content provider that provides access to research data.
Data type The field identifying type of content. The Code of Practice for Research Data Usage Metrics only recognizes the Data type Dataset.
Dataset An aggregation of data, published or curated by a single agent, and available for access or download in one or more formats, with accompanying metadata. Other term: data package.
Description A short description of a dataset. Accessing the description falls into the usage category of Investigations.
DOI (digital object identifier) The digital object identifier is a means of identifying a piece of intellectual property (a creation) on a digital network, irrespective of its current location (IDF).
Double-click A repeated click or repeated access to the same resource by the same user within a period of 30 seconds. COUNTER requires that double-clicks must be counted as a single click.
Host types A categorization of Content Providers used by COUNTER. The Code of Practice for Research Data Usage Metrics uses the following host types:

●      Repository

●      Data Repository

Internet robot, crawler, spider An identifiable, automated program or script that visits websites and systematically retrieves information from them, often to provide indexes for search engines rather than for research. Not all programs or scripts are classified as robots.
Investigation A category of COUNTER metric types that represent a user accessing information related to a dataset (i.e. a description or detailed descriptive metadata) or the content of the dataset itself.
Log file analysis A method of collecting usage data in which the web server records all of its transactions.
Machine A category of COUNTER Metric Types that represents a machine accessing content, e.g. a script written by a researcher. This does not include robots, crawlers and spiders.
Master reports Reports that contain additional filters and breakdowns beyond those included in the standard COUNTER reports.
Metadata A series of textual elements that describes a content item but does not include the item itself. For example, metadata for a dataset would typically include publisher, a list of names and affiliations of the creators, the title and description, and keywords or other subject classifications.
Metric types, Metric_Type An attribute of COUNTER usage that identifies the nature of the usage activity.
ORCID (Open Researcher and Contributor ID) An international standard identifier for individuals (i.e. authors) to use with their name as they engage in research, scholarship, and innovation activities.
Persistent Identifier (PID) Globally unique identifier and associated metadata for research data, or other entities (articles, researchers, scholarly institutions) relevant in scholarly communication.
Platform An interface from an aggregator, publisher, or other online service that delivers the content to the user and that counts and provides the COUNTER usage reports.
Provider ID A unique identifier for a Content Provider and used by discovery services and other content sites to track usage for content items provided by that provider.
Publication date, Publication_Date An optional field in COUNTER item reports and Provider Discovery Reports. The date of release by the publisher to customers of a content item.
Publisher An organization whose function is to commission, create, collect, validate, host, distribute and trade information online and/or in printed form.
Regular A COUNTER Access_Method. Indicates that usage was generated by a human user browsing/searching a website, rather than by a computer.
Reporting period, Reporting_Period The total time period covered in a usage report.
Request A category of COUNTER Metric Types that represents a user accessing the dataset content.
Session A successful request of an online service. A single user connects to the service or database and ends by terminating activity that is either explicit (by leaving the service through exit or logout) or implicit (timeout due to user inactivity). (NISO).
SUSHI An international standard (Z39-93) that describes a method for automating the harvesting of reports. Research Data SUSHI API Specification is an implementation of this standard for harvesting Code of Practice for Research Data Usage Metrics reports.
Total_Dataset_Investigations A COUNTER Metric_Type that represents the number of times users accessed the content of a dataset, or information describing that dataset (i.e. metadata).
Total_Dataset_Requests A COUNTER Metric_Type that represents the number of times users requested the content of a dataset. Requests may take the form of viewing, downloading, or emailing the dataset provided such actions can be tracked by the content provider’s server.
Transactions A usage event.
Unique_Dataset_Investigations A COUNTER Metric Type that represents the number of unique “Datasets” investigated in a user-session.
Unique_Dataset_Requests A COUNTER Metric Type that represents the number of unique datasets requested in a user-session.
User A person who accesses the online resource.
User agent An identifier that is part of the HTTP/S protocol that identifies the software (i.e. browser) being used to access the site. May be used by robots to identify themselves.
Version Multiple versions of a dataset are defined by significant changes to the content and/or metadata, associated with changes in one or more components.
Year of publication Calendar year in which a dataset is published.

 

 

7.0 Processing Rules for Underlying Reporting Data

Usage data for usage report generation should ensure that only intended usage is recorded and that all requests not intended by the user A person who accesses the online resourceare excluded.

Because the way usage records are generated can differ across platforms, it is impractical to describe all the possible filters Limits or restrictions placed on the usage to be included in a COUNTER report usually expressed as a name-value pair, i.e. “Access_Type=Controlled”and techniques used to clean up the data. This Code of Practice therefore specifies only the requirements to be met by data used for building usage reports.

7.1 Return codes

Return codes in this Code of Practice for Research Data Data that supports research findings and may include “Databases”, spreadsheets, tables, raw transaction logs, etc.Usage Metrics are not different from the specifications in the COUNTER Code of Practice Release 5. Successful and valid requests MUST be counted. Successful requests are those with specific HTTP status codes indicating successful retrieval of the content (200 and 304). HTTP status codes are defined and maintained by IETF (Fielding & Reschke, 2014).

7.2 Double-click Repeated click on the same link by the same user A person who accesses the online resourcewithin a period of 30 seconds. COUNTER requires that that double-clicks that occur in an interval of 30 seconds or less must be removed.Filtering

The intent of double-click Repeated click on the same link by the same user A person who accesses the online resourcewithin a period of 30 seconds. COUNTER requires that that double-clicks that occur in an interval of 30 seconds or less must be removed.filtering is to prevent over-counting which may occur when a user A person who accesses the online resourceclicks the same link multiple times in succession, e.g. when frustrated by a slow internet connection. Double-click filtering applies to all metric types. The double-click Repeated click on the same link by the same user A person who accesses the online resourcewithin a period of 30 seconds. COUNTER requires that that double-clicks that occur in an interval of 30 seconds or less must be removed.filtering rule is as follows:

A “double-click” is defined as repeated access to a web accessible resource by the same user A person who accesses the online resourcewithin a session, within a time period. Double-clicks on a link by the same user A person who accesses the online resourcewithin a 30-second period MUST be counted as one action. For the purposes of the Code of Practice for Research Data Data that supports research findings and may include “Databases”, spreadsheets, tables, raw transaction logs, etc.Usage Metrics, the time window for a double-click Repeated click on the same link by the same user A person who accesses the online resourcewithin a period of 30 seconds. COUNTER requires that that double-clicks that occur in an interval of 30 seconds or less must be removed.on any page is set at a maximum of 30 seconds between the first and second mouse clicks. For example, a click at 10.01.00 and a second click at 10.01.29 would be considered a double-click (one action); a click at 10.01.00 and a second click at 10.01.35 would count as two separate single clicks (two actions).

A double-click Repeated click on the same link by the same user A person who accesses the online resourcewithin a period of 30 seconds. COUNTER requires that that double-clicks that occur in an interval of 30 seconds or less must be removed.may be triggered by a mouse-click or by pressing a refresh or back button. When two actions are made for the same URL within 30 seconds the first request A category of COUNTER “Metric Type” that represent a user A person who accesses the online resourceaccessing content (i.e. full text of an “Article”)MUST be removed and the second retained.

Any additional requests for the same URL within 30 seconds (between clicks) MUST be treated identically: always remove the first and retain the second.

There are different ways to track whether two requests for the same URL are from the same user A person who accesses the online resourceand session. These options are listed in order of increasing reliability, with Option 4 being the most reliable.

  1. If the user A person who accesses the online resourceis identified only through their IP address, that IP combined with the browser’s user-agent (presented in the HTTP header) MUST be used to trace double-clicks. Multiple users on a single IP address Internet Protocol (IP) address of the computer on which the session is conducted and may be used by content providers as a means of authentication and authorization and for identifying the institution a user is affiliated with. The identifying network address (typically four 8-bit numbers, aaa.bbb.cc.dd) of the user's computer or proxy.with the same browser user-agent An identifier that is part of the HTTP/S protocol that identifies the software (i.e. browser) which is being used to access the site. May be used by robots to identify themselves.can occasionally lead to separate clicks from different users being logged as a double-click Repeated click on the same link by the same user A person who accesses the online resourcewithin a period of 30 seconds. COUNTER requires that that double-clicks that occur in an interval of 30 seconds or less must be removed.from one user. This will only happen if the multiple users are clicking on exactly the same content within a few seconds of each other. One-hour slices MUST be used as sessions.
  2. When a session cookie A data file that a web server can set to a browser to track activity by that browser and attribute that usage to a session.is implemented and logged, the session cookie A data file that a web server can set to a browser to track activity by that browser and attribute that usage to a session.MUST be used to identify double-clicks.
  3. When a user A person who accesses the online resourcecookie is available and logged, the user A person who accesses the online resourcecookie MUST be used to identify double-clicks.
  4. When an individual has logged in with their own profile, their username MUST be used to trace double-clicks.

7.3 Counting Unique Datasets

Some metric types An attribute of COUNTER usage that identifies the nature of the usage activity.count the number of unique items that had a certain activity, such as a Unique_Dataset_Requests or Unique_Dataset_Investigations.

For the purpose of metrics, a dataset See “Data_Type”.is the typical unit of content being accessed by users. The dataset See “Data_Type”.MUST be identified using a unique identifier such as a DOI, regardless of format.

The rules for calculating the unique dataset See “Data_Type”.counts are as follows:

Multiple activities qualifying for the metric type in question representing the same dataset See “Data_Type”.and occurring in the same user-sessions MUST be counted as only one “unique” activity for that dataset.

A “User Session” is defined as activity by a user A person who accesses the online resourcein a period of one hour. It may be identified in any of the following ways: by a logged session A successful “Request” of an online service. It starts when a user connects to the service A branded group of online information products from one or more vendors that can be subscribed to/licensed and searched as a complete service, or at a lower level (e.g. a collection)or “Database” and ends by terminating activity that is either explicit (by leaving the service A branded group of online information products from one or more vendors that can be subscribed to/licensed and searched as a complete service, or at a lower level (e.g. a collection)through exit or logout) or implicit (timeout due to user inactivity). (NISO)ID + transaction date, by a logged user A person who accesses the online resourceID (if users log in with personal accounts) + transaction date + hour of day (day is divided into 24 one-hour slices), by a logged user A person who accesses the online resourcecookie + transaction date + hour of day, or by a combination of IP address + user agent + transaction date + hour of day.

To allow for simplicity in calculating User A person who accesses the online resourceSessions when a session ID A unique identifier that identifies a unique user A person who accesses the online resourcesession and is used to group usage events for double-click; “Unique_Item” and “Unique_Title” filtersis not explicitly tracked, the day will be divided into 24 one-hour slices and a surrogate session ID A unique identifier that identifies a unique user A person who accesses the online resourcesession and is used to group usage events for double-click; “Unique_Item” and “Unique_Title” filterswill be generated by combining the transaction A usage event.date + hour time slice + one of the following: user ID, cookie ID, or IP address + user agent. For example, consider the following transaction:

  • Transaction date/time: 2017-06-15 13:35
  • IP address: 192.1.1.168
  • User agent: Mozilla/5.0
  • Generated session A successful “Request” of an online service. It starts when a user connects to the service A branded group of online information products from one or more vendors that can be subscribed to/licensed and searched as a complete service, or at a lower level (e.g. a collection)or “Database” and ends by terminating activity that is either explicit (by leaving the service A branded group of online information products from one or more vendors that can be subscribed to/licensed and searched as a complete service, or at a lower level (e.g. a collection)through exit or logout) or implicit (timeout due to user inactivity). (NISO)ID: 192.1.1.168|Mozilla/5.0|2017-06-15|13

The above surrogate session ID A unique identifier that identifies a unique user A person who accesses the online resourcesession and is used to group usage events for double-click; “Unique_Item” and “Unique_Title” filtersdoes not provide an exact analogy to a session. However, statistical studies show that the result of using such a surrogate session ID A unique identifier that identifies a unique user A person who accesses the online resourcesession and is used to group usage events for double-click; “Unique_Item” and “Unique_Title” filtersresults in unique counts are within 1– 2 % of unique counts generated with actual sessions.

7.4 Attributing Usage when Item Collective term for content which is reported at a high level of granularity, e.g. “Full Text Article” (original or a review of other published work); an “Abstract” or digest of a “Full Text Article”; a sectional HTML Article formatted in HTML so as to be readable by a web browser Hypertext Markup Language. A form of text markup readable by web browserspage; supplementary material associated with a “Full Text Article” (e.g. a supplementary data set), or non-textual resources, such as an image, a video, audio, dataset, piece of code, chemical structure or reaction.Appears in More Than One Database

Content providers that offer databases where a given dataset See “Data_Type”.is included in multiple databases MUST attribute the Investigations and Requests metrics to just one database. They could use a consistent method of prioritizing databases or pick the database randomly.

7.5 Internet Robots and Crawlers

The intent is to exclude web robots and spiders but include usage by humans accessing content through a scripting language or automated tool, whether interactively or standalone.

Web robots and crawlers intended for search A user-driven intellectual query, typically equated to submitting the “Search” form of the online service A branded group of online information products from one or more vendors that can be subscribed to/licensed and searched as a complete service, or at a lower level (e.g. a collection)to the server.indexing and related applications SHOULD be excluded via the application of a blacklist of known user A person who accesses the online resourceagents for these robots. This blacklist MUST NOT include general purpose user A person who accesses the online resourceagents that are commonly used by researchers (e.g., python, curl, wget, and Java), and the blacklist will be maintained as a subset of the COUNTER Code of Practice Release 5 list of internet robots and crawlers (COUNTER-Robots, 2017). Generally, user agents reflecting programmatic access to specific datasets will not be included in the blacklist.

Usage counts by scripted and automated processes MUST NOT be excluded unless they can demonstrably be shown to originate from a blacklisted agent, such as an IP address Internet Protocol (IP) address of the computer on which the session is conducted and may be used by content providers as a means of authentication and authorization and for identifying the institution a user is affiliated with. The identifying network address (typically four 8-bit numbers, aaa.bbb.cc.dd) of the user's computer or proxy.of a known search A user-driven intellectual query, typically equated to submitting the “Search” form of the online service A branded group of online information products from one or more vendors that can be subscribed to/licensed and searched as a complete service, or at a lower level (e.g. a collection)to the server.agent. New or unknown user A person who accesses the online resourceagents SHOULD be counted unless there is demonstrable evidence that they represent solely a web indexing agent.

7.6 Machine Access

Many researchers access and analyze data using scripts or automated tools, especially large data sets, and excluding those uses would be inaccurate and bias the counts. The Access_Method of type Machine is used to distinguish this kind of access.

7.6.1 Principles for reporting usage

  • The Code of Practice for Research Data Data that supports research findings and may include “Databases”, spreadsheets, tables, raw transaction logs, etc.Usage Metrics does not record machine use itself, as most of this activity takes place after a dataset See “Data_Type”.has been downloaded. All we can do is track the count of datasets downloaded using machines.
  • Usage associated with machine access activity MUST be tracked by assigning an Access_Method of Machine.
  • Usage associated with machine activity MUST be reported using the Dataset See “Data_Type”.Master Report by identifying such usage as “Access_Method=Machine”.

7.6.2 Detecting machine activity

For the purpose of reporting usage according to the Code of Practice for Research Data Data that supports research findings and may include “Databases”, spreadsheets, tables, raw transaction logs, etc.Usage Metrics, machine access does not require prior permission and/or the use of specific endpoints or protocols. This is in contrast to the COUNTER Code of Practice Release 5.

The distinction between legitimate machine use and robot or web crawler Any automated program or script which visits websites and systematically retrieves information from them, often to provide indexes for search engines.traffic is made based on the user A person who accesses the online resourceagent (see Section 7.5).

 


How to become Counter Compliant

All academic libraries across the world use and trust COUNTER usage reports to inform renewal and new purchasing decisions, to inform faculty about the value of the library and its resources and to understand user behaviour and improve the user experience.

Counter will help publishers and vendors to become compliant. The The Friendly Guide for Providers Release 5  and Technical Notes and will provide the information you will need to start the process. Content providers transitioning from Release 4 to Release 5 compliance will also find transition timeline useful in their planning.

Audit Process

To comply with the Code of Practice, publishers and vendors must be independently audited within six months of signing the Declaration of COUNTER Compliance, and annually thereafter.

There are three approved COUNTER auditors:

COUNTER will also accept an audit by any Chartered Accountant (UK), CPA (USA) or their equivalent elsewhere.

 
Release 5 Queries COP Register Members Guides Members

Gold and Silver Sponsors