Scientometrics 2.0: Toward new metrics of scholarly impact on the social Web
spacer

spacer



spacer
The growing flood of scholarly literature is exposing the weaknesses of current, citation–based methods of evaluating and filtering articles. A novel and promising approach is to examine the use and citation of articles in a new forum: Web 2.0 services like social bookmarking and microblogging. Metrics based on this data could build a “Scientometics 2.0,” supporting richer and more timely pictures of articles’ impact. This paper develops the most comprehensive list of these services to date, assessing the potential value and availability of data from each. We also suggest the next steps toward building and validating metrics drawn from the social Web.

Contents

1. Introduction
2. Scientometrics: The state of the art
3. Science and Web 2.0
4. Sources for scientometrics 2.0
5. Conclusions

 


 

1. Introduction

One of the key problems facing scholarship today is the growth in the size of its literature (Rowlands and Nicholas, 2005). Scientists read 50 percent more papers than they did in the 1970s, spending less time on average with each one (Renear and Palmer, 2009). “Article overload” hits evaluators like tenure and promotion (T&P) committees as well; often there is simply too much work being published, in fields too specialized, for evaluators to fairly examine a scientist’s work (Monastersky, 2005). Evaluators often rely on numerically–based shortcuts drawn from the closely related fields (Hood and Wilson, 2001) of bibliometrics and scientometrics — in particular, Thompson Scientific’s Journal Impact Factor (JIF). However, despite the popularity of this measure, it is slow (Brody and Harnad, 2005); narrow (Anderson, 2009); secretive and irreproducible (Rosner, et al., 2007); open to gaming (Falagas and Alexiou, 2008); and based on journals, not the articles they contain.

Today, citations are no longer the only source of impact metrics. The Web can be mined for impact indicators, just as the JIF mines the citations recorded in the Journal Citation Report (Thelwall, 2008). Under the banner of “webometrics,” researchers have examined links to scholarly articles from sources like personal Web pages (Vaughan and Shaw, 2005) and online syllabi (Kousha and Thelwall, 2008). Others have exploited the migration of articles from paper–based to electronic representations to build metrics of influence based on articles’ downloads (Kurtz, et al., 2005; Bollen, et al., 2009). Both approaches take advantage of the pervasiveness and importance of new tools (Web pages, search engines, e–journals) to inform broader, faster, and more open metrics of impact.

Just as the early growth of Web–supported webometrics and usage–based metrics, the current emergence of “Web 2.0” presents a new window through which to view the impact of scholarship. These days, scholars who would not cite an article or add it to their Web pages may bookmark, tweet, or blog it. Arguably, these and related activities reflect impact and influence in ways that have until now eluded measurement. Many Web 2.0 tools offer real–time access to structured data via application programming interfaces (APIs) and capture diverse types of usage and diverse audiences. These qualities give Web 2.0–based metrics unique value for the construction of real–time filters to help tame article overload. In addition, this approach empowers evaluators and evaluated alike with broader, more finely textured, and more up–to–date pictures of articles’ impact. Using data from Web 2.0 tools, researchers could investigate models that distill multiple metrics, or build visualizations of activity across tools; ultimately, the study of these data can build a rich “scientometrics 2.0.”

While several writers have recognized the potential of scientometrics 2.0, none have set out a comprehensive and detailed list of sources aimed at informing future empirical work. This article attempts to provide such a list. After a brief literature review, we present seven categories of Web 2.0 tools that might be productively mined: bookmarking, reference managers, recommendation services, comments on articles, microblogging, Wikipedia, and blogging. In each category, we sample research in that area, list specific tools, and assess the availability of data. In concluding the article, we offer a map for future work, including the mining, aggregation, and description of data from Web 2.0 tools.

 

spacer

2. Scientometrics: The state of the art

2.1. Citation counting and the JIF

Created by Garfield (1972), the Journal Impact Factor (JIF) is a measure of a journal’s average citations per article. Though originally conceived as a way to assess journals, it is now often used to establish the value of the articles published in those journals, and by extension the quality of individual scientists’ work (Fuyuno and Cyranoski, 2006). It is becoming increasingly apparent, however, that the JIF has serious shortcomings when used for any of these purposes. Perhaps the most important weakness is that the JIF examines journals, not articles; Seglen [1] noted “[t]he citedness of journal articles … does not seem to be detectably influenced by the status of the journal in which they are published.” The JIF is a proprietary measure whose results have defied duplication (Rossner, et al., 2007; Rossner, et al., 2008); moreover, results can be — and are — easily gamed (Falagas and Alexiou, 2008). As many have shown (see MacRoberts and MacRoberts, 2009, for review), much scientific impact goes uncited; the JIF ignores this. Most importantly, perhaps, the timeliness of the JIF is limited by the long time it takes for an article to accumulate citations. Alternative citation–based metrics like the Eigenfactor (Bergstrom, 2007) and H–index (Hirsch, 2005) correct many of the JIF’s flaws, but still suffer from this delay.

2.2. Metrics on Web 1.0: Webometrics and usage–based metrics

Researchers interested in scholarly communication were quick to recognize that the Web, as a “nutrient–rich resource space for scholars” [2], offered new opportunities to examine and measure scholarly impact. The Web would open up our ability to measure researchers’ “scientific ‘street cred’” [3] and develop “… an embryology of learned inquiry.” [4] Data from the Web would “… give substance to modes of influence which have historically been backgrounded in narratives of science.” [5] Pursuit of these goals has followed two main strands: the analysis of Web citations and of article usage data.

Scholars in the field of webometrics have undertaken Web citation analysis, “using the Web to count how often journal articles are cited.” [6] For instance, Vaughan and Shaw (2005; 2008) look at the relationship between Web mentions of articles’ titles and traditional text citations, finding significant correlation. Kousha and Thelwall (2007) also uncover moderate correlation in seven distinct disciplines between Web/URL citations (mentions of either an article’s title or URL) and text citations as recorded by the Institute for Scientific Information, whose numbers inform the JIF. More focused work has specifically investigated Web citations from syllabi to articles, suggesting that this could measure an article’s impact on teaching (Kousha and Thelwall, 2008).

A related approach is the analysis of server download data for articles. The migration of academic literature to the Web allows us to examine views or downloads for most articles; instead of measuring an article’s impact on authors (who may or may not cite it), usage data supports measurement of impact on readers. Marek and Valauskas (2002) examined the logs of the journal First Monday to identify classic papers that were repeatedly downloaded between 1999 and 2001 as an alternative to citation analysis. Reporting on the work of the MESUR project, a comprehensive examination of over 300 million user interaction events across a wide range of disciplines and publishers, Bollen, et al. (2009) argue persuasively for the robustness and broad usefulness of metrics based on this usage data. Several investigations have found articles’ early downloads to correlate well with later citation, including Brody, et al. (2006), Watson (2009), Perneger (2004), and a Nature Neuroscience editorial (Anonymous, 2008).

Despite these valuable and encouraging results, however, both these approaches are constrained by the weakness of the of Web 1.0 paradigm. Web citation analysis relies on search engines for data, which may or may not have usable APIs (Google doesn’t). They return unstructured HTML, requiring laborious hand processing to “… extract meaning from the chaos” (Thelwall, 2003). Likewise, usage data may be difficult to obtain from publishers; moreover, the data are difficult to interpret, given the ease of generating artificial downloads and the problem of downloads that are never read. Both these approaches are valuable, and continue to present exciting possibilities for both research and the creation of practical metrics. However, their shortcomings also suggest the need for more structured, distributed, and easily accessed data sources.

 

spacer

3. Science and Web 2.0

3.1. Science and the social Web

Though the term “Web 2.0” has been derided as a marketing invention, it does to some extent express a new, cohesive idea (Cormode and Krishnamurthy, 2008). There are many ways of defining Web 2.0, (O’Reilly, 2005), but perhaps its most important feature is participation (Ding, et al., 2009), with applications like Twitter, Digg, and Delicious as its archetypes; the term “social Web” has become more or less as a synonym. Given the social and communicative nature of science, it is little surprise that many scientists have become active participants in this new Web, often using services and tools created specifically for scholarship. Table 1 lists a sample of such tools (note that that this is only to give a rough idea; many more services exist, and categories could easily be subdivided or swapped in many instances).

 

Table 1: A partial list of popular Web 2.0 tools, and similar tools aimed at scholars.
DescriptionGeneral–use applicationScholarship–specific application
Social bookmarkingDelicious
(delicious.com/)
CiteULike
(www.citeulike.org/,
Connotea
(www.connotea.org/)
Social collection managementiTunes
(www.apple.com/itunes/)
Mendeley
(www.mendeley.com/,
Zotero
(www.zotero.org/)
[reference managers]
Social news/recommendationsDigg
(digg.com/),
Reddit
(www.reddit.com/),
FriendFeed
(friendfeed.com/)
Faculty of 1000
(facultyof1000.com/),
[similar, but curated]
Publisher–hosted comment spaces (e.g., blog comments)Most Web 2.0 applicationsBritish Medical Journal
(www.bmj.com/),
PloS
(www.plos.org/),
BioMed Central
(www.biomedcentral.com/),
Bioinformatics (Oxford University Press journal)
(bioinformatics.oxfordjournals.org/)
MicrobloggingTwitter
(twitter.com/)
 
User–edited referenceWikipedia
(www.wikipedia.org/)
Encyclopedia of Life
(www.eol.org/),
Scholarpedia
(www.scholarpedia.org/),
Citizendium
(en.citizendium.org/)
BlogsWordpress.com
(wordpress.com/),
Blogger
(https://www.blogger.com)
Research Blogging
(researchblogging.org/),
Blogger
(https://www.blogger.com)
Social networksFacebook
(www.facebook.com/),
MySpace
(www.myspace.com/),
Orkut
(www.orkut.com/)
Nature Networks
(network.nature.com/),
VIVOweb
(vivoweb.com/);
Data repositoriesDBPedia
(dbpedia.org/About)
GenBank
(www.ncbi.nlm.nih.gov/genbank/)
Social videoYouTube
(www.youtube.com/),
Vimeo
(www.vimeo.com/)
SciVee
(www.scivee.tv/)

 

Some have lamented that scholars’ participation in Web 2.0 is surprisingly low, perhaps because the incentive structure of science fails to reward this sort communication (Nielsen, 2009). Even so, scientists’ participation in Web 2.0 is not insignificant, and is likely to continue to increase. Many of the scholarly tools listed above are reporting dramatic growth (described in more detail below), and it seems likely that this growth will continue as a “born–digital” generation moves into tenured positions.

3.2. Calls for metrics based on Web 2.0 tools

A growing number of commentators are calling for measures of scholarly impact drawn from Web 2.0 data. In a relatively early piece, Jensen (2007) argued that a variety of Web measures will need to be compiled to establish scholarly “Authority 3.0.” Taraborelli (2008) noted that social media, especially social bookmarking, creating a “soft peer review” to supplement traditional, labor–intensive review practices. He predicted that “popularity indicators from online reference managers will eventually become a factor as crucial as citation analysis for evaluating scientific content.” [7] Patterson (2009) presented the efforts of the online publisher PloS to aggregate and display article–level metrics from a variety of sources including downloads, citations, social bookmarks, blog comments, article comments, and “star” ratings. Neylon and Wu (2009) argued for using social Web metrics to power article filters to deal with information overload. They decried the slowness of citation measures, and proposed measuring citation counts, download statistics, comments, bookmarking, and expert ratings. Quoted in Cheverie, et al. (2009), Norman proclaimed the arrival of digital scholarship and, with it, broader Web–based evaluation of impact. He encouraged the creation of metrics using data from “downloads and link indexes, reviews, publication awards, scholastic bookmarking, and tagging (e.g., the ‘Slashdot index’), or … academic networks like LinkedIn,” [8] and suggested that these metrics could well be automated. Unsurprisingly, this has been a topic of interest to academic bloggers, as well. Anderson (2009) observed that “citation is occurring in new ways, and scientific thinking is not always propagated via the published scientific article.” He proposed the measurement of articles’ impact using Twitter, blogs, video, and Wikipedia.

 

Table 2: Calls for Web 2.0 metrics of scholarship.
SourceSuggested Web 2.0 sources for metricsMain use
M. Jensen (2007)Tags, “discussions in blogspace, comments in posts, reclarification, and continued discussion.”Establishing scholars’ authority
Taraborelli (2008)Social bookmarking: CiteULike, ConnoteaAugmenting or replacing peer review
Anderson (2009)Twitter, blogs, video and “Wikipedia, or any of the special ‘–pedias’ out there”Broadening the scope of the JIF
Neylon and Wu (2009)Zotero, Mendeley, CiteULike, Connotea, Faculty of 1000, article commentsFiltering articles
Norman in Cheverie, et al. (2009)“scholastic bookmarking, and tagging (e.g., the ‘Slashdot index’) … academic networks like LinkedIn”Tenure and promotion
Patterson (2009)“… social bookmarks; blog coverage; and the Comments, Notes and ‘Star’ ratings that have been made on the article.”“[A]ssessing research articles on their own merits …”

 

3.3. Gaming social metrics

Wherever there are metrics, there will be attempts to game them. Because it is so easy to participate in social media, the gaming of metrics based on these tools is of particular concern. What is to keep an eager author from giving her own article hundreds of Diggs or Wikipedia citations? What will keep publishers from contracting with companies like “Subvert and Profit,” (subvertandprofit.com), which sell votes from registered users of Digg, Facebook, and others from 40 cents to a dollar a piece? These “pay–to–say” (Blackshaw, 2006) campaigns yield artificially–generated grassroots enthusiasm, or “astroturf” (Klotz, 2007), and they are a significant concern for academic metrics based on social media. A complete answer to the problem of social media spam is well beyond the scope of this paper. It is important to note, however, that the concern is not new, that there are established solutions, and that research in this area remains active.

History suggests that while gaming social metrics may not be solved, it can be controlled. For example, advertisers have assaulted Google search results with “black–hat SEO” (Ntoulas, et al., 2006) and “Google bombing” (Tatum, 2005), and have attacked e–mail with automatically generated spam; legitimate users and administrators have successfully responded to both with statistical filters of increasing subtlety and complexity. Similar statistical techniques can help control social media gaming. For instance, Digg uses statistical techniques and a vigilant community to spot users abusing the system, often with great success; “Spike the Vote,” a service claiming to be a “bulletproof” way to game Digg ended up sold on eBay for less than US$1500 (Arrington, 2007). The automated WikiScanner tool (wikiscanner.virgil.gr) exposed and helped correct corporate tampering with Wikipedia articles (Borland, 2007).

Research is continuing in ways to apply and improve validation techniques for social media; for instance Yardi, et al. (2010) find “structural network differences between spam accounts and legitimate users.” [9] One particular virtue of an approach examining multiple social media ecosystems is that data from different sources could be cross-calibrated, exposing suspicious patterns invisible in single source. While additional work in this area is certainly needed, there is evidence to suggest that social metrics, properly and cautiously interpreted, could be relatively robust despite attempts to game them.

 

spacer

4. Sources for scientometrics 2.0

While these broad calls to action demonstrate a growing scholarly interest in Web 2.0 metrics, none of them have included a practical list of data sources aimed at directly supporting research. Such a list would include the technical possibilities of each system, evidence of its use among scholars, and its face validity as a measure of impact. We give this list in sections 4.1 through 4.7 below. Each of these seven sections examines a single category of scholarly applications from Table 1: bookmarking, reference management, recommendation, comments on articles, microblogging, Wikipedia, and blogging. An eighth section (4.8) more briefly discusses the potential of social networks, open data repositories, and social video. While a single–article format requires a somewhat preliminary treatment, this can nonetheless serve as a useful jumping–off point for future work.

4.1. Bookmarking

Social bookmarking may be the best–developed scholarly Web 2.0 application. Connotea, one of the two main services in this space, launched in 2004, “[u]nashamedly inspired by del.icio.us” (Lund, et al., 2005). The other service, CiteULike, was also launched in 2004 (Hammond, et al., 2005). Today, about 1/6 of new MEDLINE articles are bookmarked in CiteULike (Good, et al., 2009). According to Kevin Emamy of CiteULike, the database includes over two million posts (compared to around 650,000 for Connotea) (Fenner, 2009). Impact extends beyond registered users; Emamy claims that five people browse the site for every one registered member (Fenner, 2009). The general–audience service Delicious (delicious.com) may also be used for scholarship (Hammond, et al., 2005); Ding, et al., (2009) found that “[s]cientific domains, such as bioinformatics, biology, and ecology are also among the most frequently occurring tags” [10], suggesting at least some scholarly use.

Much research into social bookmarking has examined tags and tagging. Ding, et al. (2009) explored tags and change in Delicious as well as Flickr and YouTube; they found tags showed changing interests from year to year, and between services. It might well be productive to track scholars’ interests the same way; CiteULike already does this with its “Citegeist” service. Other research has examined the value of tags as annotations; Good, et al. (2009) compare tags to MeSH headings, finding that coverage is much sparser, but that in some instances tags provided a richer description of articles. Beltrão (2006) created a mock journal filled entirely with articles tagged “evolution” on Connotea; although 50 percent of articles came from lower–impact, “specialized journals,” the imaginary journal had more citations per paper than Nature or Science. Other work has investigated collaborative filtering algorithms to make recommendations based on shared CiteULike bookmarks (Bogers and van den Bosch, 2008). All this work suggests that the act of social bookmarking carries some significance that most likely reflects scholarly impact in some way. Social bookmarking datasets are highly accessible; Connotea has an API (www.connotea.org/wiki/WebAPI) and CiteULike offers database dumps for researchers (www.citeulike.org/faq/data.adp).

4.2. Reference managers

Although many scholars still use text files to store references (Marshall, 2008), reference managers like EndNote and RefWorks are becoming common. While many such tools are strictly Web 1.0, others are “defrosting” (Hull, et al., 2008) their frozen contents by unwrapping social features like public collections. Mendeley (Henning and Reichelt, 2008) is an excellent example; they provide a free client that indexes and organizes a user’s collection of PDF articles. At the same time, the software collects data on the user’s library that can be used to recommend new articles and potential collaborators. Mendeley has experienced incredible growth over the last year; they claim a database of 100,000 users and eight million research papers, and at current claimed growth (doubling every 10 weeks), they will own a database larger than Thomson Reuters’ Web of Science in 2010 (O’Hear, 2009). A similar tool is Zotero, a plug–in for the Firefox browser. Earlier versions without sharing features have been favorably reviewed (Lucas, 2008), but there is little scholarly investigation of the current, more social version. However, the value of both Zotero and Mendeley for metrics of impact seems clear, given the significance oof a scholar’s decision to include a resource in her personal library. According to the Zotero forum an API is forthcoming (Stillman, 2009), and the Mendeley forum likewise announced an API in construction (Reichelt, n.d.).

4.3. Recommendation systems

Recommendation systems can be somewhat artificially split into two sub–areas: general Web site recommendation tools, and domain–specific academic ones.

The prototypical service in the general Web site recommendation space is Slashdot (slashdot.org); founded in 1997 (Gómez, et al., 2008), Slashdot is notable for applying a Web 2.0–style model very early. Reddit (www.reddit.com) and Digg (digg.com) are newer entries. Digg in particular has enjoyed success, its “Digg this!” button becoming part of the landscape of the Web. StumbleUpon (www.stumbleupon.com) works similarly, allowing uses to share recommendations for Web sites; however, it tailors recommendations for individual users.

A related application is the “social aggregation service” (Gupta, et al., 2009), FriendFeed (friendfeed.com). This tool aggregates users’ postings from a variety of social media, supporting discussions around each. Although aggregated items like Twitter posts are better mined at their sources, the number and type of comments for each posting might be examined, as well as items posted with FriendFeed’s browser bookmarklet. FriendFeed might be especially interesting to investigate, as our experience indicates it supports a particularly active community of scientists.

Among services that recommend Web sites, Slashdot has attracted the lion’s share of scholarly investigation, likely because of its longevity. In an early effort, Baoill (2000) applied Habermas’ model of idealized public debate; later, Gómez, et al., (2008) applied social network analysis to Slashdot threads and users, finding similarities to previously studied networks. Looking at Digg, Lerman and Galstyan (2008) demonstrated that early “Diggs” predict later importance of news stories, encouraging inquiry into similar predictive validity for scholarly publications. However, while Norman suggests using a “Slashdot index” to measure scholarly impact (Cheverie, et al., 2009), we know of no research specifically aimed at tracking scholarly articles’ mentions on recommendation sites like these. Neither StumbleUpon nor Slashdot list public APIs; however, FriendFe

gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.