Science's broken publishing model
Posted by ekr on 30 Jun 2021
Matt Ridley has an article over at CAPX about how science journals -- in this case Nature are modifying their coverage to avoid antagonizing China. Most of the story is about some reporting by Amy Maxmen on the "lab leak hypothesis" but Ridley also writes:
One of the subtexts of the debate over the origin of the pandemic concerns the role of the scientific journals. The magazines that publish scientific papers have become increasingly dependent on the fees that Chinese scientists pay to publish in them, plus advertisements from Chinese firms and subscriptions from Chinese institutions. In recent years observers have noticed that the news coverage of China in these magazines has begun to look a little less objective than it once did.
I'm not that interested in the details of Nature's behavior in this case, but what Ridley is bringing up goes to some fairly fundamental issues in scientific publishing.
What are Scientific Journals #
For those of you who aren't familiar with scientific publishing many of the prestige publication venues are journals. These range from relatively niche publications you've probably never heard of such as Condensed Matter Physics to top field-specific publications like the New England Journal of Medicine to top general scientific publications like Nature, Science. The top publications like Science and Nature are really two magazines in one:
A general interest science magazine with articles written by professional science journalists and targeted for a scientifically trained but non-specialist audience -- kind of like a high end version of Scientific American.
A collection of actual scientific papers that are deemed to be particularly important/worthy/impactful.
Like any magazine, these journals have subscriptions and advertising. The subscriptions can be quite expensive. For instance an individual Nature subscription is $199/year but an institutional subscription (e.g., for a university) is over $10,000/year. Many journals also have what's called a "page fee" or an "article processing charge", where authors pay to publish. An interesting wrinkle here is that some journals charge more to make your article "open access". The way this works is that ordinarily upon submitting to a journal they would require you to assign your copyright, so that they are the only ones who can distribute it. However, if you pay extra (€9500 for Nature) you can retain the copyright and publish your paper "open access" in which case it will be freely redistributable under a generous license (Nature uses the CC-BY license).
At this point you should be thinking "this all sounds pretty expensive", and you'd certainly be right. On the other hand, it's also quite prestigious to appear in Science or Nature along with all that other great research, so maybe it's worth it. Here's the thing, though, what you're paying for is primarily the right to put "Science" on your CV. To see why, we'll need to take a bit of a detour into how scientific publishing works.
The publication process #
Publication processed vary dramatically between fields, but at a high level, here's how journal publication works:
- You submit your paper.
- The editor sends it out for review to some reviewers in your field
- Time passes
- The reviewers eventually send back their reviews
- On the basis of those reviews, you are either accepted, rejected, or told to
- If you are rejected, you take it somewhere else
- If you are accepted, go to step 6.
- If you are told to revise you go back to step 1.
- Usually there is some back and forth with the editor about what exactly you have to change and then you submit a new manuscript
- More time passes while the journal copy edits your paper, typesets it in their particular format, etc.
- Eventually they send you page proofs.
- You approve/revise the page proofs and send them back
- The paper is published.
There are two important things to note here. First, this all takes a fantastically long time. I've only published in CS journals, but I remember it being on the order of a year or so. During all this time, your paper is just kind of sitting there in some liminal state. These days what really happens is that it's usually circulating as a "preprint". It used to be that people posted these on their Web sites and tweeted them or whatever, but now there are a number of "preprint" sites such as arXiv or ePrint which will just let you distribute your paper as long as it meets some minimal criteria like being apparently topical, non-libelous, etc.
Second, most of the work of reviewing the content of the paper is being done by the reviewers, i.e., your peers (hence peer review) who are generally anonymous and uncompensated, although journal's editor might be paid (I believe this varies). But at all the end of this, it's the journal who gets paid. So, what exactly is it that they are being paid for? I'll get to that in a moment but first I want to get to an even more egregious case, which is computer science conferences.
CS Publication #
Computer Science -- and especially security and networking, which is my area of focus -- does much of its publication at conferences, with journals being seen as where you send your "expanded" paper that was too long to fit into the conference proceedings rather than a primary publication venue.
Historically the way that conferences work is that there is "program committee" chaired by a "program chair" (again, all these people are unpaid faculty members, researchers, and the like; being on the PC of a good conference looks good on your CV). They issue a call for submissions with a deadline at some point in the future. Once all the submissions are in, each PC member is then assigned some subset of them to review and on the basis of those reviews and further discussion (traditionally at a PC meeting, but now often online) the PC accepts some papers and rejects the rest. If you're accepted, you get to present your paper at the conference; if you're not, you go submit it somewhere else.
We're getting really off topic here, but the dynamics of a PC meeting are worth spending a little time on to see how the sausage is made. The conference has a roughly fixed number of agenda slots and that tells you how many papers you can have. So, then the PC needs to pick out the top 20 papers or so out of a stack of say 150. There are a lot of ways to do this, of course, each of which has its own problems. A common practice would be to say "OK, we're going to reject anything below a given threshold unless someone wants to advocate for it". That might get you down to 2 or 3 times what you need. Then you just have to go through the papers one at a time, which can be pretty entertaining.
Say, for instance, you go top down, discussing the highest rated papers first. In theory these are all easy accepts. At this point, people are pretty fresh and often want to show how smart they are, so it's not too uncommon for one or more of these papers to get torn apart and if not rejected, then put into the "if space" pile (more on this in a bit). It's pretty easy to get fairly far down into the pile with only a few outright accepts, at which people start to notice that at this rate you won't have enough papers and might get a bit more generous. Lots of times, though, you get through the whole pile and you still need a lot more papers, so you start turning to the "if space" pile, which mostly consists of adequate but imperfect papers (aren't they all) which someone doesn't like for some reason. If there's room, they'll often just get pulled in, and then you end up trying to pick a few more papers which everyone knows aren't that great but seem like the best of the rest. This isn't the only thing that happens: you can also -- though rarely in my experience -- have more good papers than you can accept at which point you have the even more unpleasant task of rejecting some good papers. There is a little bit of slack in the system here, because conferences often have "invited talks" so if you really need to add one more paper you can say "we'll have one less IT" or if, conversely, there just aren't any good ones left, you can have some extra ITs.
Once you've been accepted or rejected, you get some time to submit your "camera ready" version, which is the version that's actually going to be published in the "proceedings" (i.e., the book of papers that the conference distributes, assuming they distribute one), or just published on the conference Web site. This is nominally supposed to take into account the reviewer comments, but as a practical matter once you're accepted you can mostly do whatever you want You may have noticed that the publication process is even more self-serve than in the journal case: you don't get any copy-editing or proof-reading and do all your own typesetting. Indeed, the term "camera ready" comes from the idea that the proceedings would be produced by photographing your final copy for reproduction, though of course this is all now done with PDF. The only part of this process that is compensated is that the staff who actually run the conference (rent the hotels, register people, etc.) are paid. But the program chair and the PC are all volunteers. But don't think this necessarily stops the conference from charging. For instance, if you go the proceedings for ACM CCS 2020, some papers can be downloaded while others cannot. As far as I can tell, this comes down to whether the authors paid ACM the article processing charge of $1000 or so to make them free.
Adding value #
As you should have gathered from the above, the scientific work is mostly done on an unpaid basis (of course, the researchers and reviewers get paid by their institutions but not by the journal) but it's the journal who collects the money and maybe even charges the researchers to have their own papers published. This seems kind of backwards -- after all, book authors earn royalties -- and it's not like the actual publication is expensive because it just goes on a Web site So, why do authors put up with it?
The simple reason is signaling: an enormous number of papers are published every year and having one in one of the top venues is prestigious. And what the publishers have is the name of the prestige venue: go ahead and publish in a free journal if you want but if you want to publish in Nature you need to deal with its publisher, Springer-Verlag. And while we would collectively be better off with a completely open access system that didn't shovel piles of money to the publishers, individually people are a lot better off publishing in the best (i.e., most famous) venue they can get into because -- rightly or wrongly -- people use venue as a quick indicator of paper quality, so we're kind of in an equilibrium that's hard to get out of without a lot of collective action. For instance, Dan Wallach has been talking for years about rebooting CS publication by replacing the whole system with one of open publication and post-publication rankings. There are some good ideas here, but they have yet to take off.
I do see a few reasons for hope here. The first is that there is increasing pressure for some form of open access within the traditional publication structure. This comes in a number of forms, ranging from funding requirements for open access such as Plan S (though this still allows for article publication fees) to initiatives such as Research Without Walls in which reviewers commit not to review for non open access venues. Moreover, as more and more publication moves to the electronic media, charging large amounts of money for access to those publications becomes extremely hard to justify (though Nature tries here).
The second is the rise of direct-to-Web publication either by preprint services like arXiv, ePrint or NBER or just by Twitter. The major rationale for this practice is getting work out fast, but increasingly it's just how people disseminate their work, with most of the impact happening before you even know if the paper has been accepted anywhere. I don't see this trend reversing, and once people have their work out there, charging for the version that happened to be accepted at the conference looks increasingly silly, and the publishers will need to adapt somehow. Of course, this all comes at a cost: these papers haven't been peer reviewed and while one of the functions of the journal/conference review process is to determine if the paper is one of the better papers submitted, it also serves as a partial check on whether it's right (note that many papers are right but not exciting). We'll eventually need some way to address that issue, but I expect it's going we're going to go a bit further down the path of a preprint free-for-all before the situation becomes so untenable that that actually happens.
Or, if it was a really cool result, especially an attack on something, you'd invent a cutesy name and logo and have your own Web site just for the result. ↩︎
So why do people do this work? It's seen as a service contribution, but at least in some cases, the fact that someone is a reviewer in general is public even if the papers they reviewed were not, and so it's a career milestone to be invited to review. ↩︎
Why too long, you say? Most conferences have page limits, even when the proceedings are entirely online. Many are the hours that my collaborators and I have spent messing with LaTeX source to get our paper under the page limit. ↩︎
It's fairly traditional to extend the deadline if they aren't getting enough submissions, people feel like they running late, or just because. It's a particular mixture of relief and burning rage to have the submission deadline extended by a week 48 hours before the original deadline. On the one hand, you have more time; on the other, you've spent the past 72 hours cramming for no reason. ↩︎
Some conferences have now gone to "rolling submission" where you can submit multiple times a year, but everything is presented at the end of the year. That at least lets you get an answer faster. ↩︎
This is all a bit of a random process; I've seen at least one paper rejected at one conference and go on to win Best Paper at another, equally good conference. ↩︎
There are 4-5 big computer security conferences a year: ISOC NDSS, USENIX Security, ACM CCS, IEEE S&P (often called "Oakland" because that's where it historically took place, but now it is in San Jose) and arguably Euro S&P and then a giant pile of lower prestige conferences. The usual practice is to submit to one of the big conferences and if it gets rejected then you try another, but eventually you give up and go to a smaller conference, or a workshop. ↩︎
The exception here is that sometimes papers will be "shepherded" which means that the PC thought that the paper was only acceptable with some specific changes and has delegated a PC member to make sure you make them. In this case, if you don't satisfy the shepherd your paper won't appear. ↩︎
ACM's system is particularly goofy here, because they allow you to do self-archiving which means you post it on your Web site or on arXiv, but it's not free on their site. They also have some system in which your ACM conference proceedings can link to the ACM's site and if people go there from your site, then it will be free, but otherwise they'll get charged. True story: this works by using the Referer[sic] header, but Firefox and Chrome recently changed the default for the Referer header, which caused this to break for some conferences, such as ANRW. The whole system seems to be designed to be nominally open access but in practice to make it harder for people to actually find the free version. ↩︎
In fact, much of the complexity of the site seems to go to controlling access to the papers so you can charge for them. ↩︎
And of course because "selectivity" (the fraction of papers that get accepted) is used as a proxy for conference quality, this is a self-perpetuating process. ↩︎
And nobody, AFAIK, requires you to withdraw your preprint ↩︎