Can repositories solve the access problem?

March 2, 2013

The progressive RCUK policy on open access has recently come under fire, particularly from humanities scholars, for favouring Gold OA over Green. For various reasons — and I won’t, for now, go into the question of which of these reasons are and aren’t sound — they favour an approach to open access where publishers keep final versions of their papers behind paywalls, but drafts are deposited in institutional repositories (IRs) and people who want to read the paper can have access to the drafts.

It’s appealing to think that this relatively lightweight way of solving the access problem can work. Unfortunately, I’m not convinced it can, for several reasons. I’ll discuss these below, not so much with the intention of persuading people that Gold is a better approach, but with the hope that those of you who are Green advocates have seen things that I’ve missed and you’ll be able to explain why it can work after all.

spacer

1. Two-class system

Most fundamentally, I worry that Green OA creates a two-class system which I can’t approve of. It does this in two ways.

First, it necessarily creates two classes of papers: author’s draft and publishers’ final versions. These will differ in some respects, and it’s hard in general to know what those respects are. Of course pagination will differ — which means you can’t cite page-numbers reliably. But other changes are possible as well. For example, Matt and I have a paper in press now for which a whole additional figure — an important one — was added at the proofing stage. In our case, the paper will be OA anyway, but if it were not then the authors’ manuscript would be a poor substitute.

And by implication, Green OA creates two classes of researchers — a potentially harmful division between those privileged few who have the “proper” papers and an underclass who have only manuscripts. (It doesn’t help that for stupid historical reasons, our manuscripts are often butt-ugly: double-spaced, line-numbered, all the figures at the end instead of where they’re needed, etc.)

Admittedly, the two-classes-of-researcher problem is not created by Green OA: it already exists, in a worse form where the underclass doesn’t have access to any version of the paper. But whereas Gold OA solves this problem (everyone has exactly the same access to PLOS and PeerJ papers), Green doesn’t.

(To me, it’s obvious that democratising access is a good thing. But now that I’ve made the notion explicit, I can’t help the uncharitable thought that there may be those out there who want to maintain a two-class system — to retain a notion that they are “in” while others are “out”. I hope I’m wrong, and I’m certainly not accusing Green OA advocates of having this motivation. It just seems like it might be implicit in some of the broader struggles over access. Anyway, let’s not confuse this separate potential problem with access with the actual problems with Green OA I’m addressing in this post.)

spacer

2. Expense of continuing subscriptions

I find it baffling that people keep talking as though Green OA is cheaper than Gold. It isn’t, at all. As I’ve shown previously, the cost to the world of a paywalled paper (aggregated across all subscriptions) is about $5333. There is no reason to think that will change under the Green model, in which we continue to give the final and best version of our work to publishers.

By contrast, even the publisher-influenced Finch estimates typical Gold APCs as £1500-£2000 (about $2270-$3030), which amounts to 43%-57% as much. (Conveniently, the midpoint of the Finch range, £1750, is about $2650, which is almost exactly half of what we pay by the subscription model.

But the true cost of Gold OA is much, much less. Follow the link for the detail, but one credible banner figure starts with the observation that half of all Gold OA articles are published at no cost to the author and that the average APC of the other half is $906, to arrive at a true average APC of $453 — about one twelfth of the cost for a paywalled article.

So for purely pragmatic financial reasons, Green seems like a silly path. There’s a very short-term saving, sure, as we avoid paying APCs. But we have to look further ahead than the next five years.

spacer

3. Embargoes

Now there is nothing intrinsic to Green OA that means embargoes must be in place. It’s perfectly possible, and manifestly desirable, that no-embargo Green-OA mandate should be enacted, requiring that authors’ final manuscripts become available immediately on publication. But for whatever historical reasons (and I admit I find this baffling) there are few or no Green-OA mandates that do this. Even the best of them seem to allow a six-month delay; twelve months is not uncommon (and Michael Eisen worries that the new White House policy with further establish twelve months as the norm.

I will have more to say about embargoes in a subsequent post. (SPOILER: it’s not going to be pretty.) But for now it suffices to say that any system that makes research freely available only a year after it’s published is wholly inadequate. Not to mention stupid. Stupid and inadequate.

So if Green OA is going to be the solution we need, it has to break free from embargoes.

spacer

4. Non-open licences

Similarly, there is no intrinsic reason why Green OA should mean non-open licences and Gold OA should mean truly open (BOAI-compliant) open access. And yet history has brought us to a point where is often how things are. For example, the RCUK policy (even before its progressive erosion got properly under way) says of its Gold arm that “The CC-BY license should be used in this case”, but contains weasel words in its Green arm:

the journal must allow deposit of Accepted Manuscripts that include all changes resulting from peer review (but not necessarily incorporating the publisher’s formatting) in other repositories, without restrictions on non-commercial re-use.

This just won’t do. It’s not open access. To quote Heather Piwowar’s pithy statement once more, “We do basic research not only to know more, but to do more”. Non-commercial licences impede the use of research, and that’s not to the benefit of wider society. (I won’t labour this point now, because I’ll have more to say on non-commercial clauses in a subsequent post.)

So as with embargoes, if Green OA is going to be the solution we need, it has to break free to its habitual acceptance of non-commercial clauses.

spacer

5. Practical failings

On top of the fundamental problems already discussion (two-class system, expense of continuing subscriptions, embargoes and non-open licences), the repository system as it exists today suffers from a suite of practical problems that render it pretty inadequate.

  • Many institutions don’t even have an IR; or if they do it doesn’t work.
  • Many scholars aren’t associated with an institution and so don’t know where they should reposit their manuscripts. (That this is overlooked is a symptom of an unfortunate elitist tendency among academics.) [UPDATE 4th March: thanks to Neil Stewart, whose comment below points out Open Depot as a solution to this.]
  • The use of IRs involves an institution-by-institution fragmentation, with different user interfaces, policies, etc.
  • For whatever reasons, many scholars do not bother to reposit their manuscripts in institution repositories.
  • Even when mandates are in place, compliance is often miserable, to the point where Peter Suber considers the 80% NIH compliance rate as “respectable”. It really isn’t. 100% is acceptable; 99% is respectable.
  • Many IRs have abject search facilities, often for example lacking the ability to restrict searches to papers that are actually available.
  • Many IRs impose unnecessary restrictions on the use of the materials they contain: for example, Bath’s repo prohibits further redistribution.
  • There is no central point for searching all IRs (at least not one that is half-decent; I know about OAIster).
  • The quality of metadata within most IRs variable at best
  • Use of metadata across IRs is inconsistent — hence many of the problems that render OAIster near-useless.

… and, I am sure, many more that I’ve not thought of right now.

Could these issues be addressed? Yes, probably; but ten years have unfortunately not done much to resolve them, so I don’t feel all that confident that the next ten will.

Do the IR advocates have a plan for solving these problems? Because they are much more political/sociological than technical, and those always seem to be the hardest ones to solve.

Share this:

  • Facebook
  • Reddit
  • Twitter
  • Google
Posted by Mike Taylor
Filed in CC BY-NC, Green open access, open access, repositories
41 Comments »

41 Responses to “Can repositories solve the access problem?”

  1. spacer Mark C. Wilson Says:

    March 2, 2013 at 6:01 pm

    No doubt Stevan Harnad will be along soon, but I will take the chance to comment first. I have been concerned about 1) for a long time. I think that 2) is perhaps misleading. Presumably if bundling can be solved, then sufficient uptake of Green will impose substantial downward price pressure on subscriptions, and the chance of dropping them altogether will be taken. If this in turn spreads then the subscription model will die, and some kind of Gold model will take over. So Gold should be the end result.

    My main complaint about the current Gold offerings is the price – I am still strongly of the opinion that they are an order of magnitude too high. Of course, I am a mathematician, and, essentially, we don’t have money. I am still pretty sure that good “Diamond/Platinum” journals (Gold with author fee approaching zero) are possible, and they need to be explored more. With luck, increasing competition among Gold offerings will drive down prices.

    3), 4), 5) are certainly relevant in the short term, but if the above scenario comes to pass, they probably won’t be.

  2. spacer klausgraf2001 Says:

    March 2, 2013 at 8:48 pm

    Great text, I agree.

    I have wrote a similar blog entry in German some days ago:
    archiv.twoday.net/stories/285824796/

    Summary:

    Green is OA for the poor

    1) The scholar needs access IMMEDIATELY

    2) The format problem generates costs if the scholar believes that he needs the version of record.

    3) No libre OA

    And two minor points.

  3. spacer petermurrayrust Says:

    March 3, 2013 at 4:00 pm

    Full agreement Mike,
    I admire the effort you put in to reporting this in a systematic manner while the “Green” advocates are often reluctant to analyze the minuses of Green as well as the pluses.
    Another serious negative of Green is that it legitimizes current publishing practice and treats the publisher as an equal co-partner. In many cases they aren’t – they make Green as difficult to create as possible. And some will fight to the death rather than allow Green

  4. spacer lescarr (@lescarr) Says:

    March 3, 2013 at 4:14 pm

    Can you quantify “many” in the first two bullet points under #5? Also why you don’t accept “Google” as a central point for search, or why there should be a central point for IR search when there isn’t for publisher search?

  5. spacer Mike Taylor Says:

    March 3, 2013 at 5:00 pm

    Can you quantify “many” in the first two bullet points under #5?

    No. I’d welcome any data anyone has.

    Why you don’t accept “Google” as a central point for search […]

    Well, I certainly wouldn’t want to enshrine any for-profit corporation as a single central point of control. But assuming you meant third-party crawler-based search-engines in general, I would find that approach tolerable — though far from optimal, as such engines nearly always work only on text, so that they wouldn’t be able to support searches like “studies on stegosaurs published between 1990 and 1995”.

    But my sense is that not all repositories expose all their content for crawling anyway. If anyone has any data on this, I’d be happy to see it.

    […] or why there should be a central point for IR search when there isn’t for publisher search

    I’d hope we can aspire to a higher goal than “no worse than what publishers do”.

  6. spacer Heather Piwowar Says:

    March 3, 2013 at 5:39 pm

    Google isn’t an acceptable answer because it doesn’t support a way for people to build on the search results, add value, and apply the results in new and innovative ways. Google search results can only be used on Google’s website manually, or embedded as-is in other websites.

    Neither Google not Google Scholar offer an API — for love nor money, as far as I can tell, point me to it if I am wrong — that would let us do a Google Search and then sort/filter/enhance the results to add value and use in research and scholarly tools.

    Totally unacceptable as a search solution for the scholarly literature.

    It doesn’t have to be this way. PLOS and PMC searches to not have this problem, they offer open apis with broad reuse terms.

  7. spacer Heather Piwowar Says:

    March 3, 2013 at 5:46 pm

    BTW great post, Mike, thanks for articulating these issues so clearly.

  8. spacer Mike Taylor Says:

    March 4, 2013 at 7:59 am

    Thanks, Heather. This one gave me no pleasure at all to write, but I think the questions need asking.

  9. spacer cityopenaccess Says:

    March 4, 2013 at 9:54 am

    Hi Mike, thanks for an interesting post. Full disclosure: I am an IR manager, so am invested in Green OA. I’m not going to comment on points 1 – 4 because I agree these are issues, what I would say about them is that Gold OA is problematic principally because of publisher double dipping. I don’t expect publishers’ subs costs to come down any time soon.

    I will comment on point 5, though, as follows.
    – Can you give an example of a repository that “doesn’t work”? What does “not working” in this context mean? I know Bristol has had problems, are they being sorted out now?
    – For unaffiliated scholars. there’s Open Depot opendepot.org/, a green repository for anyone to use.
    – Fragmentation is an issue, yes, one effort to address this is JISC’s UK RepNet project www.repositorynet.ac.uk/blog/ More does need to be done though.
    – Scholars not depositing is also an issue, yes, despite my and colleagues’ efforts. The evidence shows that mandates do work with this though, Harnad is right on that one!
    – Can you give an example of an IR with this “bad” search functionality you mention? Search within IRs could be improved IMO, but there are some pretty sophisticated advanced search menus associated with many IRs that allow to e.g. identify full text only (my IR is full text only so this shouldn’t be a problem for us!)
    – Re-use rights are tricky, and the legal picture is murky (though I would love to have it clarified)- I don’t think repositories are in a position to effectively re-licence papers they hold as CC-BY or similar, though perhaps this is something IRs should do as a condition of submission.
    – On central search: Google Scholar. If you don’t like the big G (understandable enough) I would recommend Base Search, an excellent OA search engine www.base-search.net/
    – On metadata: could bang on about this for a long time, but most (all?) IRs use Dublin Core and OAI-PMH. Most (all?) IRs allow re-use of their metadata in a variety of formats (XML, RDF, as a CSV file etc. etc.) and if they don’t do this, they should.

    Finally, on your answer to Les Carr: IRs should certainly expose all their metadata to Google (and Bing, Baidu etc.) Google does pretty well at identifying IRs as good sources of research information, and the structured metadata does help with Google crawling. I would be interested to see an example of Google failing to crawl repository metadata.

    Okay that’s probably enough! Final thought: IRs aren’t perfect, certainly, but then what method of OA currently is?

  10. spacer Ross Mounce (@rmounce) Says:

    March 4, 2013 at 11:27 am

    re: GreenOA, I have one interesting thing to add from the recent Institute of Historical Research meeting on Open Access:

    ‘The Finch Report, open access and the historical community’
    Friday 1 March 2013, Chancellor’s Hall, Senate House, University of London

    Whilst some of the historians were still harping on that they ‘need’ 36 month embargoes for their journals (with a admitted lack of hard supporting evidence for this), it emerged that Cambridge University Press currently operate a 0-month embargo for pre-prints and 12-months for archiving the final version. This was revealed by Daniel Pearce (Commissioning Editor, HSS journals, CUP).

    I raised this matter at the end to the panel. Their whole argument seemed to rest on the assertion that an embargo length of anything less than 36 months would ‘imperil’ their journals as subscriptions *might* be cut.

    Yet CUP and many other journals in HSS are *currently* operating very successfully, and have been for years, with embargoes of much less than 36 months! Oh the irony…

    Even more astoundingly, it was Daniel Pearce (again) who leapt-in to defend the panel from my point, stating (I paraphrase) “we only allow these short-embargo lengths at the moment because virtually no-one is using institutional repositories. If more historians did actually self-archive their works we’d increase our embargo lengths to protect our journals.”

    Now, I have to admit I haven’t exactly been brilliantly positive about ‘GreenOA’ in the past. But the above quote particularly horrifies me — a publisher on Friday afternoon, nakedly admitted that as soon as the ‘green route’ to OA actually starts to become effective the publishers *will* move to block it & dilute its utility.

    That’s the most frightening thing of all about the ‘green route’ in my opinion. It only works as a patchy solution and I doubt it could achieve 100% gratis access to all scholarship on its own. At an institutional level, early, strong mandates like the University of Liege can achieve very high compliance. But as more and more self-archiving mandates come-in, the publishers are unfortunately well within their rights to block, delay or confound this approach, if it actually gains traction. Sure, the publishers can’t block the self-archiving of pre-submission manuscripts but as stated in the post, these are 2nd class inferior versions in many disciplines.

    That’s why I’m glad RCUK has a preference for gold. The green route would seem to be a rather risky strategy to me if pursued entirely on it’s own. A mixed strategy of gold & green allows us an alternative route to use if publishers starting blocking the green route.

  11. spacer Mike Taylor Says:

    March 4, 2013 at 11:37 am

    Right. A very fundamental problem with Green is that it relies on the benevolence of publishers. And publishers have shown again and again that they are not benevolent. I am actually rather glad that Daniel Pearce said what he did in a public forum. Because we’ve all known or suspected it for a long time, but now we don’t have to speculate any more.

  12. spacer cityopenaccess Says:

    March 4, 2013 at 11:54 am

    I don’t think publishers would be likely to withdraw well-established green archiving rights, personally, since it would be another PR disaster for them. Though if they did I don’t think authors would stop posting to ArXiv, SSRN, RePEc and IRs anyway.

  13. spacer brembs Says:

    March 4, 2013 at 12:28 pm

    Well done! These are exactly the points why I don’t see myself as pushing for green, even though I champion libraries as publishers.

    The only thing that occurred to me as potentially contentious is the cost argument (#2): If you assume that in the short term, no subscriptions will be cut, gold is more expensive than green, as the APCs add up on top of subscriptions, while green doesn’t cost anything in addition (other than perhaps $7 per article for hosting/bandwidth and such).
    It gets a

gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.