mobile-menu mobile-menu-arrow Menu
 
 

Annex J: List of internet robots, crawlers and spiders

The growing use of web crawler robots have the potential to inflate usage statistics. Only genuine, user-driven usage should be reported in COUNTER usage reports. Usage of full-text articles that is initiated by automatic or semi-automatic bulk download tools, such as Quosa or Pubget) should only be recorded when the user has clicked on the downloaded full-text article in order to open it.

Activity generated by internet robots and crawlers must be excluded from all COUNTER usage reports.

This list of internet robots, crawlers and spiders was published on April 2016 and updated July 2016. Please note it is rationalised, removing some previously redundant entries (e.g. the text ‘bot’ – msnbot, awbot, bbot, turnitinbot, etc. – which is now collapsed down to a single entry ‘bot’). The list is displayed below and also available here as a downloadable file: counter_robots_list_oct-2016. Each line in the file contains a regular expression (regex), which is a type of text string that describes a search pattern. When using the exclusion list, all the regexes should be matched case-insensitively. For further information on regular expression matching, see: http://www.regular-expressions.info/quickstart.html.
Please use the form below to let us know of any user agents that should be included in this list or to suggest other amendments.

bot

spider

crawl

^.?$

[^a]fish

^IDA$

^ruby$

^voyager\/

^@ozilla\/\d

^ÆƽâºóµÄ$

^ÆƽâºóµÄ$

alexa

Alexandria(\s|\+)prototype(\s|\+)project

AllenTrack

almaden

appie

Arachmo

architext

aria2\/\d

arks

^Array$

asterias

atomz

BDFetch

Betsie

biadu

biglotron

BingPreview

bjaaland

Blackboard[\+\s]Safeassign

blaiz\-bee

bloglines

blogpulse

boitho\.com\-dc

bookmark\-manager

Brutus\/AET

bwh3_user_agent

CakePHP

celestial

cfnetwork

checkprivacy

China\sLocal\sBrowse\s2\.6

cloakDetect

coccoc\/1\.0

Code\sSample\sWeb\sClient

ColdFusion

combine

contentmatch

ContentSmartz

core

CoverScout

curl\/7

cursor

custo

DataCha0s\/2\.0

daumoa

^\%?default\%?$

Dispatch\/\d

docomo

Download\+Master

DSurf

easydl

EBSCO\sEJS\sContent\sServer

ELinks\/

EmailSiphon

EmailWolf

EndNote

EThOS\+\(British\+Library\)

facebookexternalhit\/

favorg

FDM(\s|\+)\d

feedburner

FeedFetcher

feedreader

ferret

Fetch(\s|\+)API(\s|\+)Request

findlinks

^FileDown$

^Filter$

^firefox$

^FOCA

Fulltext

Funnelback

Fetch(\s|\+)API(\s|\+)Request

findlinks

^FileDown$

^Filter$

^firefox$

^FOCA

Fulltext

Funnelback

GetRight

geturl

GLMSLinkAnalysis

Goldfire(\s|\+)Server

google

grub

gulliver

gvfs\/

harvest

heritrix

holmes

htdig

htmlparser

HttpComponents\/1.1

HTTPFetcher

http.?client

httpget

httrack

ia_archiver

ichiro

iktomi

ilse

Indy Library

^integrity\/\d

internetseer

intute

iSiloX

java

jeeves

jobo

kyluka

larbin

libcurl

libhttp

libwww

lilina

link.?check

LinkLint-checkonly

^LinkParser\/

^LinkSaver\/

linkscan

linkwalker

livejournal\.com

LOCKSS

LongURL.API

ltx71

link.?check

lwp

lycos[\_\+]

mail.ru

MarcEdit.5.2.Web.Client

mediapartners\-google

megite

MetaURI[\+\s]API\/\d\.\d

Microsoft(\s|\+)URL(\s|\+)Control

Microsoft Office Existence Discovery

Microsoft Office Protocol Discovery

Microsoft-WebDAV-MiniRedir

mimas

mnogosearch

moget

motor

^Mozilla$

^Mozilla.4\.0$

Microsoft Office Protocol Discovery

^Mozilla\/4\.0\+\(compatible;\)$

^Mozilla\/4\.0\+\(compatible;\+ICS\)$

^Mozilla\/4\.5\+\[en]\+\(Win98;\+I\)$

^Mozilla.5\.0$

^Mozilla\/5.0\+\(compatible;\+MSIE\+6\.0;\+Windows\+NT\+5\.0\)$

^Mozilla\/5\.0\+like\+Gecko$

^Mozilla\/5.0(\s|\+)Gecko\/20100115(\s|\+)Firefox\/3.6$

^MSIE

MuscatFerre

myweb

 

nagios

^NetAnts\/\d

netcraft

netluchs

ng\/2\.

Ning

no_user_agent

nomad

nutch

ocelli

Offline(\s|\+)Navigator

onetszukaj

^Opera\/4$

OurBrowser

parsijoo

pear.php.net

perman

PHP\/

pioneer

playmusic\.com

playstarmusic\.com

^Postgenomic(\s|\+)v2

powermarks

PycURL

python

Qwantify

rambler

Readpaper

redalert|robozilla

rss

scan4mail

scientificcommons

scirus

scooter

Scrapy\/1\.0\.3

^scrutiny\/\d

SearchBloxIntra

shoutcast

slurp

sogou

speedy

Strider

sunrise

T\-H\-U\-N\-D\-E\-R\-S\-T\-O\-N\-E

tailrank

Teleport(\s|\+)Pro

Teoma

titan

^Traackr\.com$

twiceler

ucsd

ultraseek

^undefined$

^unknown$

URL2File

urlaliasbuilder

urllib

^user.?agent$

validator

virus.detector

voila

^voltron$

w3af.org

w3c\-checklink

Wanadoo

Web(\s|\+)Downloader

WebCloner

webcollage

WebCopier

Webinator

weblayers

Webmetrics

webmirror

webreaper

WebStripper

WebZIP

Wget

wordpress

worm

www.gnip.com

WWW\-Mechanize

xenu

Xenu(\s|\+)Link(\s|\+)Sleuth

y!j

yacy

yahoo

yandex

zeus

zyborg

^\

Report a Change

Help us keep the information on this page accurate.
Publishers, please tell us if an update is needed; libraries, please let us know if you spot an issue.

 
 
 
About COP Register Members Guides Members

Gold and Silver Sponsors