EPrints.org
spacer

Self-Archiving FAQ

for the Budapest Open Access Initiative (BOAI)

ROARMAP Registry of Open Access Mandatory Archiving Policies  -- 
Model institutional self-archiving policy --  ROAR Registry of Open Access Repositories -- 
Bibliography of Findings on the Open Access Impact Advantage -- 
Citebase the citation-based scientometric search engine --  Paracite citation seeker -- 
Powerpoints (to be used in promoting open-access provision)

What-is/why/how FAQs:

What is self-archiving?
What is the Open Archives Initiative (OAI)?
What is OAI-compliance?
What is an Eprint Archive?
How can I or my institution create an Eprint Archive?
How can an institution facilitate the filling of its Eprint Archives?
What is the purpose of self-archiving?
What is the difference between distributed and central self-archiving?
What is the difference between institutional and central Eprint Archives?
Who should self-archive?
What is an Eprint?
Why should one self-archive?
What should be self-archived?
Is self-archiving publication?
What about copyright?
What if my copyright transfer agreement explicitly forbids self-archiving?
Peer-review reform: Why bother with peer review?
Is self-archiving legal?
What if the publisher forbids preprint self-archiving?

What-to-do FAQs:

What can researcher/authors do to facilitate self-archiving?
What can researchers' institutions do to facilitate self-archiving?
What can libraries do to facilitate self-archiving?
What can research funders do to facilitate self-archiving?
What can publishers do to facilitate self-archiving?

"I-worry-about..." 38 prima facie concerns (subgrouped thematically):

I. 10. Copyright
            32. Poisoned Apple
            36. Re-Use
            37. Permissions
II. 7. Peer review
            5. Certification
            6. Evaluation
            22. Tenure/Promotion
            13. Censorship
III. 29. Sitting Pretty
            4. Navigation (info-glut)
            34. Priorities
            35. Royalties
IV. 1. Preservation
            2. Authentication
            3. Corruption
            23. Version control
            25. Mark-up
            26. Classification
            16. Graphics
            15. Readability
            21. Serendipity
            18. Libraries'/Librarians' future
            33. IRs: OA or DL?
            38. Locus
V. 19. Learned Societies' future
VI. 17. Publishers' future
            9. Downsizing
            8. Paying the piper
            14. Capitalism
            24. Napster
            31. Waiting for Gold
VII. 20. University conspiracy
            30. Rechanneling toll-savings
            28. Affordability
VIII. 12. Priority
            27. Secrecy
IX. 11. Plagiarism

What is self-archiving?

To self-archive is to deposit a digital document in a publicly accessible website, preferably an OAI-compliant Eprint Archive. Depositing involves a simple web interface where the depositer copy/pastes in the "metadata"  (date, author-name, title, journal-name, etc.) and then attaches the full-text document.  Self-archiving takes only about 10 minutes for the first paper and even less time for all subsequent papers. Some institutions even offer a proxy self-archiving service, to do the keystrokes on behalf of their researchers. Software is also being developed to allow documents to be self-archived in bulk, rather than just one by one.

What is the Open Archives Initiative (OAI)?

The Open Archives Initiative (OAI) has designed a shared code for metadata tags (e.g., "date," "author," "title," "journal" etc.).  See the OAI FAQ. The full-text documents may be in different formats and locations, but if they use the same metadata tags they become "interoperable." Their metadata can be "harvested " and all the documents can then be jointly searched and retrieved as if they were all in one global collection, accessible to everyone.

What is OAI-compliance?

OAI-compliance means using the OAI metadata tags. A document can be OAI-compliant and an Eprint archive can be OAI-compliant. All OAI-compliant documents in OAI-compliant archives are interoperable. This means distributed documents can be treated as if they were all in one place and one format.

What is an Eprint Archive?

An Eprint Archive is a collection of digital documents. OAI-compliant Eprint Archives share the same metadata, making their contents interoperable with one another. Their metadata can then be harvested into global "virtual" archives, such as OAIster, that are seamlessly navigable by any user (just as a commercial index or abstract database is navigable, but with full-text access).

How can I or my institution create an Eprint Archive?

Free Eprints software (itself using only free software) has been designed so institutions or even individuals can create their own OAI-compliant Eprint Archives . Setting up the archive only needs some space on a web server. Installing the Eprints software is relatively easy, and being made easier with each successive release of the software. It requires a little webmaster time to set up, and a little webmaster time to maintain. This investment is very small. The real challenge is not creating or maintaining an Eprint Archive, but ensuring that it is promptly filled with its target contents, which, for the BOAI, consists of pre-peer-review preprints and peer-reviewed, accepted postprints.

See the Institutional Archives Registry and List as well as the Registry of Institutional Self-Archiving Policies

How can an institution facilitate the filling of its Eprint Archives?

(1) Install OAI-compliant Eprint Archives .

(2) Adopt a university-wide policy that all faculty maintain and update a standardised online curriculum vitae (CV) for institutional record-keeping and annual performance review.

See the Institutional Archives Self-Archiving Policy Registry and List

(3) Mandate that the full digital text of all refereed publications should be deposited in the Institution's OA Repository and linked to their entry in the author's online CV. (Make it clear to all faculty how self-archiving is in the interest of their own research and standing , maximizing the visibility, accessibility and impact of their work.)

(4) Offer trained digital librarian help in showing faculty how to self-archive their papers in their own university Eprint Archive (it is very easy).

(5) Offer trained digital librarian help in doing "proxy" self-archiving, on behalf of any authors who feel that they are personally unable (too busy or technically incapable) to self-archive for themselves. They need only supply their digital full-texts in word-processor form: the digital archiving assistants can do the rest (usually only a few dozen keystrokes per paper).

(A policy of mandated self-archiving for all refereed research output, together with a trained proxy self-archiving service, to ensure that lack of time or skill do not become grounds for non-compliance, are the most important ingredients in a successful self-archiving program . The proxy self-archiving will only be needed to set the first wave of self-archiving reliably in motion. The rewards of self-archiving -- in terms of visibility, accessibility and impact -- will maintain the momentum once the archive has reached critical mass. And even students can do for faculty the few keystrokes needed for each new paper thereafter.)

(6) Digital librarians, collaborating with web system staff , should be involved in ensuring the proper maintenance, backup, mirroring, upgrading, and migration that ensures the perpetual preservation of the university Eprint Archives. Mirroring and migration should be handled in collaboration with counterparts at all other institutions supporting OAI-compliant Eprint Archives.

See the Institutional Archives Registry and List

What is the purpose of self-archiving?

The purpose of self-archiving is to make the full text of the peer-reviewed research output of scholars/scientists and their institutions visible, accessible, harvestable, searchable and useable by any potential user with access to the Internet. The purpose of thus maximizing public access to research findings online is that this in turn maximizes its visibility, usage and impact -- which in turn not only maximizes its benefits to researchers and their institution in terms of prestige, prizes, salary, and grant revenue but it also maximizes its benefits to research itself (and hence to the society that funds it) in terms of research dissemination, application and growth, hence research productivity and progress. This is why open access is both optimal and inevitable.

See the Institutional Archives Registry and List

What is the difference between distributed and central self-archiving?

All OAI-compliant Eprint Archives are interoperable. This means their contents are harvestable by cross-archive search engines such as OAIster or citebase into global virtual archives. Hence OAI has eliminated the difference between self-archiving documents in one central archive or many distributed archives. Users need not know where documents are located in order to find, browse and retrieve them (any more than they do when they are using commercial indexing or abstracting services); and the full texts are all retrievable.

What is the difference between institutional and central Eprint Archives?

Because of OAI-compliance , it no longer matters whether documents are archived in one central Eprint Archive or in many distributed ones. They are all interoperable and harvestable into one virtual "central" archive in which all contents are seamlessly navigable and retrievable. Strategically, however, there is a difference between institutional and central self-archiving.

Self-archiving is done in order to maximize the visibility and accessibility of refereed research, and hence to maximize its usage by researchers and its impact on research. The benefits of maximizing research impact are felt by the researcher and the researcher's institution, rather than by some more central entity (such as the research discipline or learned society). The academic reward system (salaries, research funding) is centered on the researcher's institution. Publishing and impact confer advantages on both researcher and institution. Hence the researcher's institution is the natural one to host self-archiving and ensure that its archives are filled with its annual research output.

See the Institutional Archives Registry and List.

Who should self-archive?

The Budapest Open Access Initiative is focussed specifically on the refereed research literature, across all disciplines. It is the authors of these articles who should self-archive them, in order to maximize the visibility, accessibility, uptake and impact of their work. The self-archiving itself, however, though rapid and simple, can be done by "proxy," by digital archivers in the researcher's institution or its library . It can also be done in bulk, by (free) software (under development).

See the Institutional Archives Registry and List.

What is an Eprint?

Eprints are the digital texts of peer-reviewed research articles, before and after refereeing. Before refereeing and publication, the draft is called a "preprint." The refereed, accepted final draft is called a "postprint." (Note that this need not be the publisher's proprietary PDF version!) Eprints include both preprints and postprints (as well as any significant drafts in between, and any postpublication updates). Researchers are encouraged to self-archive them all. The OAI tags keep track of all versions. All versions should contain links to the publisher's official version of record.

Why should one self-archive?

In order to maximize the visibility and accessibility of one's research, and hence the usage and impact of one's work. Merely publishing it provides minimal impact: Also self-archiving it provides maximal impact.

What should be self-archived?

All significant stages of one's work, from the pre-refereeing preprint to the peer-reviewed, published postprint, to postpublication updates should be self-archived. The OAI tags keep track of all versions. (Note that the postprint need not be the publisher's proprietary PDF: there should always be a link to the publisher's official version, however, for scholarly purposes.)

Is self-archiving publication?

Self-archiving is definitely not publication. For purposes of establishing priority and asserting copyright, anything that is made public, even on a single piece of paper, meets the legal definition of "publication." Hence so does self-archiving. But for scholarly and scientific purposes, only meeting the quality standards of peer review, hence acceptance for publication by a peer-reviewed journal, counts as publication. Self-archiving should on no account be confused with self-publication (vanity press). (Self-archiving pre-refereeing preprints, however, is an excellent way of establishing priority and asserting copyright.)

What about copyright?

The author holds the copyright for the pre-refereeing preprint, so that can be self-archived without seeking anyone else's permission. Sixty-eight percent of journals already give their green light to postprint self-archiving. With the remaining 32%, the author can either try to modify the copyright transfer agreement to reserve the right to self-archive the postprint, or, failing that, can append or link a corrigenda file to the already self-archived preprint. See " Is self-archiving legal? ," "What if the publisher forbids self-archiving the preprint? " and the Rights MEtadata for Open archiving Project and Directory of Journals' Policies on Author Self-Archiving

What if my copyright transfer agreement explicitly forbids self-archiving ?

See " Is self-archiving legal? ," "What if the publisher forbids self-archiving the preprint? " and the Rights MEtadata for Open archiving Project and Directory of Journals' Policies on Author Self-Archiving .

Peer-review reform: Why bother with peer review?

Peer review is not without its flaws, but improving peer review first requires careful testing of alternative systems, and demonstrating empirically that these alternatives are at least as effective as classical peer review in maintaining the quality of the refereed literature (such as it is).  No alternatives have yet been tested or demonstrated effective.

Hence current peer review reform or elimination proposals are merely speculative hypotheses at this time, and red herrings insofar as the freeing of the peer-reviewed literature is concerned: The self-archiving initiative is directed at freeing the current peer-reviewed literature, such as it is, from the impact/access barriers of Subscription/License/Pay-per-view access-tolls, now. It is not directed at freeing the literature from peer review, or at testing or implementing untested alternatives to peer review (Cf. library.caltech.edu/publications/ScholarsForum/042399sharnad.htm
and www.ecs.soton.ac.uk/~harnad/Ebiomed/com0509.htm#harn45 ).

The benefits of freeing the refereed literature now are a sure thing; the benefits (if any) from future alternatives to peer review (if any) are purely hypothetical, and certainly nothing to hold us back from self-archiving and wait for .

Is self-archiving legal?

Texts that an author has himself written are his own intellectual property. The author holds the copyright and is free to give away or sell copies, on-paper or on-line (e.g., by self-archiving), as he sees fit. For example, the pre-refereeing preprint can always be legally self-archived .

Self-archiving of one's own, non-plagiarized texts is in general legal in all cases but two. The first of these two exceptions is irrelevant to the kind of self-archiving BOAI is concerned with, and for the second there is a legal alternative.

Exception 1: Where exclusive copyright in a "work for hire" has been transfered by the author to a publisher -- i.e., the author has been paid (or will be paid royalties) in exchange for the text -- the author may not self-archive it. The text is still the author's "intellectual property," in the sense that authorship is retained by the author, and the text may not be plagiarized by anyone, but the exclusive right to sell or give away copies of it has been transfered to the publisher.

Exception 1 is irrelevant to BOAI , because BOAI is concerned only with peer-reviewed research, for which the author is paid nothing, and no royalty revenue or author fee is expected, sought, or paid.

Exception 2: Where exclusive copyright has been assigned by the author to a journal publisher for a peer-reviewed draft, copy-edited and accepted for publication by that journal, then that draft may not be self-archived by the author (without the publisher's permission).

The pre-refereeing preprint, however, has already been (legally) self-archived. (No copyright transfer agreement existed at that time, for that draft.)

So in those cases where the the copyright transfer agreement does not yet give the author the green light to self-archive the refereed final draft ("postprint"), there is always the alternative of self-archiving a corrigenda file alongside the already archived preprint, listing the changes that need to be made to make the pre-refereeing preprint conform to the refereed postprint.

See the Directory of Journal Self-Archiving Policies. Of the nearly 10,000 journals surveyed over 90% are already "green" (i.e., they have already give their official green light to author self-archiving: 62% for postprints, 29% for preprints). Many of of the remaining 9% "gray" journals will agree if the author asks.

Perhaps the most sensible default strategy of all is the one that the physicists have been successfully practicing since 1991 and computer scientists have been practicing since even earlier: "don't-ask/don't-tell": Simply self-archive your preprint as well as your postprint, and wait to see whether the publisher ever requests removal. After nearly a decade and a half of practicing this default strategy, and at least a million and a half self-archived papers in physics and computer science, only a handful of papers have ever been removed because a publisher requested it. On the contrary, virtually all physics journals and most computer science journals have since become officially "green" in response to the physics and computer science community's evident desire and determination to enjoy the research benefits of providing open access to their own papers by self-archiving them, and they now even encourage the self-archiving. In contrast, those researchers who during that decade and a half have not been practicing this default strategy have instead needlessly lost a decade and a half's worth of cumulative research impact .

Another alternative is to provide "almost-OA" by depositing embargoed articles as Closed Access instead of Open Access. Institutional Repositories can all implement the semi-automatic "email eprint request" Button, which provides almost-immediate access even to these embargoed deposits. When individual users reach a Closed Access item, they paste their email addresses in a box provided by the IR, click, and the author receives an instant email request for the eprint. With one click, the author authorizes fulfilling the eprint request, and the IR automatically emails the eprint to the requester.

What if the publisher forbids preprint self-archiving?

The right to self-archive the refereed postprint is a legal matter, because the copyright transfer agreement pertains to that text. But the pre-refereeing preprint is self-archived at a time when no copyright transfer agreement exists and the author holds exclusive and full copyright to that draft. So publisher policy forbidding prior self-archiving of preprints is not a legal matter, but merely a journal policy matter (just as it would be if the journal were to forbid the submission of papers by authors with blue-eyed uncles!). (It would become a legal matter -- but a contractual matter, not a copyright one -- if the author were to sign a contract explicitly stating that the unrefereed preprint had not been previously self-archived online. Obviously an author should strike such arbitrary stipulations out of any contract.)

This policy goes by the name of the " Ingelfinger Rule ," originally invoked by the Editor of the New England Journal of Medicine (NEJM), Franz Ingelfinger, in order to protect public health (and the NEJM's priority) from any publicity about unrefereed findings prior to publication.

The Ingelfinger Rule (sometimes also referred to as a "prepublication embargo ") is accordingly not a copyright matter, but a journal submission policy: "We will not consider for publication any preprint that has been previously self-archived."

BOAI makes no recommendations to authors regarding compliance with such policies, except to note that (1) the Ingelfinger Rule is not a legal matter, (2) the number of journals invoking the Ingelfinger Rule is rapidly diminishing in the face of self-archiving pressure from authors in the interests of research progress (Nature, for example, has dropped it, and most other journals are following suit) and (3) the Ingelfinger Rule was probably never enforceable in any case.





What-to-do FAQs

What can researcher/authors do to facilitate self-archiving?

Make sure that your university or research institution has installed OAI-compliant Eprint Archives .

Self-archive your pre-peer-review preprints in your institutional (or central) Eprint Archives.

Self-archive your post-peer-review postprints (or corrigenda file) in your institutional (or central) Eprint Archives.

See the Institutional Archives Registry and List.

What can researchers' institutions do to facilitate self-archiving?

See " How can an institution facilitate the filling of its Eprint Archives ?"

See the Institutional Archives Registry and List.

Sign the Declaration of Institutional Commitment to Providing OA .

What can libraries do to facilitate self-archiving?

Digital librarians are the natural candidates for maintaining the Eprint Archives, their institution's outgoing collection of peer-reviewed research output.

(1) Offer trained digital librarian help in showing faculty how to self-archive their papers in the university Eprint Archive (it is very easy).

(2) Offer trained digital librarian help in doing "proxy" self-archiving, on behalf of any authors who feel that they are personally unable (too busy or technically incapable) to self-archive for themselves. Authors need only supply their digital full-texts in word-processor form: the digital archiving assistants can do the rest (usually only a few dozen key/mouse-strokes per paper).

(The proxy self-archiving will only be needed to set the first wave of self-archiving reliably in motion. The rewards of self-archiving -- in terms of visibility, accessibility and impact -- will maintain the momentum once the archive has reached critical mass. And even students can do for faculty the few keystrokes needed for each new paper thereafter.)

(3) Digital librarians, collaborating with web system staff , should be involved in ensuring the proper maintenance, backup, mirroring, upgrading, and migration that ensures the perpetual preservation of the university Eprint Archives. Mirroring and migration should be handled in collaboration with counterparts at all other institutions supporting OAI-compliant Eprint Archives.

See the Institutional Archives Registry and List.

What can research funders do to facilitate self-archiving?

Mandate that the research that is publicly funded must not merely be published but it must be publicly accessible online (whether through self-archiving, open-access journals, or both) as recommended by the UK Government Science and Technology Committee as well as the Berlin Declaration.

Make it part of grant applications that CVs and bibliographies citing the applicant's prior work should contain links to the online full-text (whether self-archived or in open-access journals, or both).

Sign the Declaration of Institutional Commitment to Providing OA .

What can publishers do to facilitate self-archiving?

Support Open Access by adopting a "green" author self-archiving policy, i.e. giving your green light to author self-archiving of preprints and postprints (not necessarily the publisher's PDF) as over ninety percent of journals sampled (8,000+) have already done. See the Directory of Journals' Policies on Author Self-Archiving.
See also FOS policy statements by learned societies and professional associations and "The Green Road to Open Access: A Leveraged Transition".

Publishers are encouraged to fill out the SHERPA/ROMEO webform describing their self-archiving policy statement for inclusion in the Romeo publishers directory and to email their journals list (with ISSNs and URLs) to Maria at romeo.eprints.org/corrections.php for inclusion in the Romeo journals directory .



1. Preservation

"I worry about self-archiving because archived eprints may not continue to exist or to be accessible in perpetuum on-line, the way they were on-paper."

This worry is misplaced. It is not really a worry about self-archiving at all, but about the online medium itself. As such, it needs to be directed toward the primary database in question, which is the toll-access refereed journal literature, currently in the hands of publishers and libraries, and most of it already in both paper and digital form. That is the official version of record. If you are worried about the preservation of the online version, it is to its publishers and subscribing/licensing librarians that your worry needs to be addressed. The preprints and postprints that are being self-archived by their authors in their institutional eprint archives today are intended to maximize impact by providing immediate open access; they are merely open-access supplements to that toll-based primary literature at this time, not substitutes for it.

To put even this misdirected worry into perspective, we must remember that print-on-paper is not permanent either. The only relevant parameter is the probability of future access. The on-paper probability, such as it is, is achieved by generating (a) multiple copies that are (b) geographically distributed  (c) in a (relatively) robust medium and can be made (d) visible to the human eye.

All four of these properties can be achieved (and have been) on-line too, and the resulting preservation probability can be made as good as, or even better than, the current probability on-paper.

That should be the end of the story: For once this concern is no longer grounded in actual, objective probabilities, but only in prior habits and attendant intuitions, then we are talking about biasses and superstitions and not about actual risks.

There are a few side issues: People worry about global power-failures, or global dictatorships. They should remind themselves that these are matters of probability too, and have their equivalents in paper.

People also, by analogy with current unreadable documents in obsolete word-processors or peripherals, worry about whether the digital code, even if preserved, will always be accessible and visible to the eye.

The answer is again probability: The reason print-on-paper has been faithfully preserved across generations (when it has been) is that the literate world's collective interests were vested in ensuring that it should do so. This same continuity of collective interests will exist for the digital corpus too, for the same reasons, except that digital code will be much easier to keep migrating to every successive new technology than print on-paper to every successive building or regime ever was.

(And there is always the option for those who are still not confident enough in the technology, despite the odds, of printing out hard copies as back-up: Indeed, that is a good way to put the magnitude of one's preservation worries to the test: Who will still feel the need to keep hard copies, and of how much of the corpus, once it's all on-line and accessible to everyone, everywhere, at all times?)

In short, setting up active preservation programs implemented by digital librarians is indeed important and necessary; but it would be completely irrational to interpret the need for robust preservation programs as a reason for any hesitation or delay whatsoever about proceeding with self-archiving right now -- a fortiori, because, for the time being, self-archiving is merely a supplement to, not a substitute for, the existing, modes of preservation, on paper and online. If and when the day should ever come when primary journal publishers decide to downsize and become peer-review service-providers only, cutting costs by offloading the access and archiving burden entirely onto the network of institutional archives, then that institutional network will be quite ready, willing and able to take over the distributed digital preservation burden for its collective research legacy. But that time is not now, hence this worry (about self-archiving now) is misplaced.

2. Authentication

"I worry about self-archiving because you can never be sure whether you are reading the definitive version of an