The Code of Practice for Research Data Usage Metrics standardizes the generation and distribution of usage metrics for research data, enabling for the first time the consistent and credible reporting of research data usage.
COUNTER welcomes input and feedback from the community on this first iteration, so that it can be further developed and refined.
Aligned as much as possible with the COUNTER Code of Practice Release 5 glossary.
|Access_Method||A COUNTER attribute indicating whether the usage related to investigations and requests was generated by a human user browsing and searching a website (Regular) or by a computer (Machine).|
|Collection||A curated collection of metadata about content items.|
|Component||A uniquely identifiable constituent part of a content item composed of more than one file (digital object).|
|Content item||A generic term describing a unit of content accessed by a user of a content host. Typical content items include articles, books, chapters, datasets, multimedia, etc.|
|Content provider||An organization whose function is to commission, create, collect, validate, host, distribute, and trade information in electronic form.|
|Creator(s)||The person/people who wrote/created the datasets whose usage is being reported-|
|Data repository||A content provider that provides access to research data.|
|Data type||The field identifying type of content. The Code of Practice for Research Data Usage Metrics only recognizes the Data type Dataset.|
|Dataset||An aggregation of data, published or curated by a single agent, and available for access or download in one or more formats, with accompanying metadata. Other term: data package.|
|Description||A short description of a dataset. Accessing the description falls into the usage category of Investigations.|
|DOI (digital object identifier)||The digital object identifier is a means of identifying a piece of intellectual property (a creation) on a digital network, irrespective of its current location (IDF).|
|Double-click||A repeated click or repeated access to the same resource by the same user within a period of 30 seconds. COUNTER requires that double-clicks must be counted as a single click.|
|Host types||A categorization of Content Providers used by COUNTER. The Code of Practice for Research Data Usage Metrics uses the following host types:
● Data Repository
|Internet robot, crawler, spider||An identifiable, automated program or script that visits websites and systematically retrieves information from them, often to provide indexes for search engines rather than for research. Not all programs or scripts are classified as robots.|
|Investigation||A category of COUNTER metric types that represent a user accessing information related to a dataset (i.e. a description or detailed descriptive metadata) or the content of the dataset itself.|
|Log file analysis||A method of collecting usage data in which the web server records all of its transactions.|
|Machine||A category of COUNTER Metric Types that represents a machine accessing content, e.g. a script written by a researcher. This does not include robots, crawlers and spiders.|
|Master reports||Reports that contain additional filters and breakdowns beyond those included in the standard COUNTER reports.|
|Metadata||A series of textual elements that describes a content item but does not include the item itself. For example, metadata for a dataset would typically include publisher, a list of names and affiliations of the creators, the title and description, and keywords or other subject classifications.|
|Metric types, Metric_Type||An attribute of COUNTER usage that identifies the nature of the usage activity.|
|ORCID (Open Researcher and Contributor ID)||An international standard identifier for individuals (i.e. authors) to use with their name as they engage in research, scholarship, and innovation activities.|
|Persistent Identifier (PID)||Globally unique identifier and associated metadata for research data, or other entities (articles, researchers, scholarly institutions) relevant in scholarly communication.|
|Platform||An interface from an aggregator, publisher, or other online service that delivers the content to the user and that counts and provides the COUNTER usage reports.|
|Provider ID||A unique identifier for a Content Provider and used by discovery services and other content sites to track usage for content items provided by that provider.|
|Publication date, Publication_Date||An optional field in COUNTER item reports and Provider Discovery Reports. The date of release by the publisher to customers of a content item.|
|Publisher||An organization whose function is to commission, create, collect, validate, host, distribute and trade information online and/or in printed form.|
|Regular||A COUNTER Access_Method. Indicates that usage was generated by a human user browsing/searching a website, rather than by a computer.|
|Reporting period, Reporting_Period||The total time period covered in a usage report.|
|Request||A category of COUNTER Metric Types that represents a user accessing the dataset content.|
|Session||A successful request of an online service. A single user connects to the service or database and ends by terminating activity that is either explicit (by leaving the service through exit or logout) or implicit (timeout due to user inactivity). (NISO).|
|SUSHI||An international standard (Z39-93) that describes a method for automating the harvesting of reports. Research Data SUSHI API Specification is an implementation of this standard for harvesting Code of Practice for Research Data Usage Metrics reports.|
|Total_Dataset_Investigations||A COUNTER Metric_Type that represents the number of times users accessed the content of a dataset, or information describing that dataset (i.e. metadata).|
|Total_Dataset_Requests||A COUNTER Metric_Type that represents the number of times users requested the content of a dataset. Requests may take the form of viewing, downloading, or emailing the dataset provided such actions can be tracked by the content provider’s server.|
|Transactions||A usage event.|
|Unique_Dataset_Investigations||A COUNTER Metric Type that represents the number of unique “Datasets” investigated in a user-session.|
|Unique_Dataset_Requests||A COUNTER Metric Type that represents the number of unique datasets requested in a user-session.|
|User||A person who accesses the online resource.|
|User agent||An identifier that is part of the HTTP/S protocol that identifies the software (i.e. browser) being used to access the site. May be used by robots to identify themselves.|
|Version||Multiple versions of a dataset are defined by significant changes to the content and/or metadata, associated with changes in one or more components.|
|Year of publication||Calendar year in which a dataset is published.|
Master Reports include all relevant metrics and attributes; they are intended to be customizable through the application ofSection 3.3.1 for details on Host Types).and other configuration options, allowing users to create a report specific to their needs. The Master Report used in the Code of Practice for Usage Metrics are shown in Table 3.1, along with its Report ID, Report Name and Host Types who are expected to provide these reports (see
MUST be provided
|DSR||Dataset Master Report||A granular customizable report showing activity at the level of thethat allows the to apply and select configuration options.||Repository
Code of Practice forUsage Metrics reports can be delivered in tabular form or as machine-readable JSON file via the protocol. The tabular form MUST be a tab-separated-value Unicode text file. The machine-readable format MUST comply with the Research Data API Specification (See Section 8).
All reports have the same layout and structure. Note that the Research DataAPI Specification includes the same elements with the same or similar names; therefore, understanding the tabular reports translates to an understanding of what is REQUIRED in reports retrieved via SUSHI.
All reports have a header. In tabular reports, the header is separated from the body with a blank row. Beneath that is the body of the report with column headings. The contents of the body will vary by report. All of this is discussed in more detail below.
The first 10 rows of a tabular report contain the header, and the 11th row is always blank. The COUNTER Code of Practice Release 5 rows Institution_Name and Institution_ID are not used. The header information is presented as a series of name-value pairs, with the names appearing in Column A and the corresponding values appearing in Column B. All tabular reports have the same names in Column A. Column B entries will vary by report.
|Element Name||Description of value to provide||Example|
|Report_Name||The name of the report as it appears in Sections 3.1 and 3.2 of this document. Must beReport.||Dataset Report|
|Report_ID||The unique identifier for the reports that is used inrequests.||dsr-12hd-zt65|
|Release||The Code of Practice forUsage Metrics release this report complies with. Must be RD1.||RD1|
|Metric_Types||A semicolon-space (“; “) delimited list ofrequested for this report. Note that even though a Metric Type was requested, it might not be included in the body of the report if no report items had usage of that type.||Unique_Dataset_Investigations; Unique_Dataset_Requests|
|Report_Filters||A series of zero or more reportapplied on the reported usage, excluding metric types (which appear in a separate row). Typically, a affects the amount of usage reported. Entries appear in the form of “filter_Name=filter_Value” with multiple filter name-value pairs separated with a semicolon-space (“; “) and multiple filter values for a single filter name separated by the vertical pipe (“|”) character.||Access_Method=Regular;
|Report_Attributes||A series of zero or more report attributes applied to the report. Typically, a
affects how the usage is presented but does not change the numbers.
Entries appear in the form of “attribute_name=attribute_value” with multiple attribute name-value pairs separated with a semicolon-space (”; ”) and multiple attribute values for a single attribute name separated by the vertical pipe (“|”) character.
|Exceptions||An indication of some difference between the usage that was created and the usage that is being presented in the report. The format for the exception values are: “Error_No: Exception_Description” (Data). The Error_No and Exception_Description MUST match values provided in Table B.1 of Appendix B. The data is OPTIONAL.
Note that for tabular reports, only the limited set ofwhere usage is returned will apply.
|3040: Partial Data Returned (request was for 2016-01-01 to 2016-12-31; however, usage is only available to 2016-08-30).
3040: Partial Data Returned
|Reporting_Period||The date range for the usage represented in the report, in the form of: “begin_date=yyyy-mm-dd”; “end_date=yyyy-mm-dd”. Should conform with ISO 8601 (ISO 8601:2004 – Data elements and interchange formats, 2004).
The begin_date MUST be the first day of the month, whereas the end_date can be the last day of the month for a complete monthly report, or any other day in the month for a partial monthly report (See Section 3.3.7)
|Created||The date the usage was prepared, in the form of “yyyy-mm-dd” according to ISO 8601 (ISO 8601:2004 – Data elements and interchange formats, 2004).||2016-10-11|
|Created_By||The name of the organization or system that created the report||DataONE|
|(blank row)||Row 11 MUST be blank|
COUNTER Code of Practice Release 5 introduced several new elements and attributes in order to help organize the information in a single, consistent, and coherent Code of Practice. The Code of Practice forUsage Metrics uses a subset of these elements and attributes relevant for research data.
Research data usage reports are provided by different types of content hosts, and the usage reporting needs vary by host type. Although the “Host Type” does not appear on the report, the Code of Practice uses “Host Types” throughout this document to help content providers identify which reports, elements, metric types, and attributes are relevant to them.
The Code of Practice forUsage Metrics uses the following host types:
|Host Type Category||Description||Example|
|Repository||Athat hosts multiple research output types including research data. Institutional repositories are typically in this category.||Figshare
|Data Repository||Arepository hosting only research data. Disciplinary repositories are typically in this category.||CDL Dash,
Dryad Digital Repository
The COUNTER Code of Practice Release 5 reports scholarly information in many ways. These major groupings are referred to as Data Types. Only the Dataset Data Types are used by the Code of Practice forUsage Metrics. Reporting of collections is restricted to pre-set collections that are defined like databases.
|Data Type||Description||Host Types||Reports (Abbrev)|
The followingare defined to enable reporting. There is no significant difference to the COUNTER Code of Practice Release 5.
Investigations and Requests of Items and Titles
This group of Metric Types represents activities where datasets were retrieved (Requests) or information about a dataset (e.g. metadata) was examined (Investigations). Anyactivity that can be attributed to a Dataset will be considered an Investigation, including downloading or viewing the Dataset. Requests are limited to activity related to retrieving or viewing the Dataset itself.
The that begin with Total_ mean that if a was accessed multiple times in a session, the metric would increase by the number of times the Dataset was accessed (minus any adjustments for double-clicks).
Unique_Dataset metrics help eliminate the effect different styles ofinterface may have on usage counts. If the same was accessed multiple times in a given session, the corresponding metric can only increase by 1 to simply indicate that the was accessed in the session.
|Metric Type||Description||Host Type||Reports|
|Total_Dataset_Investigations||Total number of times aor information related to a was accessed and the data in megabytes that was transferred. Double click are applied to these transactions. Investigations (counts and volume) are reported for each version of the and for the cumulative total across versions.||Repository
|Unique_Dataset_Investigations||Number of datasets investigated in unique user-sessions. If investigations for multiple components of the sameoccur in the same user-session, there MUST be only one “unique” activity counted for that Dataset. Investigations (counts and volume) are reported for each version of the and for the cumulative total across versions.||Repository
|Total_Dataset_Requests||Total number of times awas retrieved (the content was accessed or downloaded in full or a section of it) and the data in megabytes that was transferred. Double-click applied. Requests (counts and volume) are reported for each version of the and for the cumulative total across versions.||Repository
|Unique_Dataset_Requests||Number and dataof Datasets requested in unique user-sessions. If requests for multiple components of the same occur in the same user-session, there MUST be only one “unique” activity counted for that Dataset. Requests (counts and volume) are reported for each version of the and for the cumulative total across versions.||Repository
In order to track content usage by machines, and to keep that usage separate fromusage by humans, the Access_Method attribute is used.
|Regular||Refers to activities on aor that represent typical behavior.||Repository
|Machine||Refers to activities on aor that represent typical machine behavior. This includes only legitimate machine access and excludes internet robots and crawlers (see Section 7.8).||Repository
Analyzingusage by the age of the content is also desired. The “YOP” usage attribute represents year of publication.
|yyyy||The Year of Publication for theas a four-digit year. If the is not known, use a value of 0001.||Repository
Thecan end before the last day of the month, in which case the report for that month will be partial. This enables incremental updates of usage reporting during the course of a month. These incremental updates always replace the previous report for that month. Reporting of usage broken down by day is not supported in this release of the Code of Practice for Usage Metrics.
Inclusion of zero-usage reporting for everything, including unsubscribed content, could make reports unmanageably large.