I don’t post very often on Twitter, but I do read things that others post, especially over the past couple of years since I stopped using Facebook. I can’t remember when I first became aware of the open science ‘movement’ as it’s often called. I suppose that I read about the various components separately at different times. Much like religious extremists who ‘self-radicalise’ by reading material online and watching videos, over the past couple of years I’ve gradually come around to this point of view. I now intend to dramatically change how we do research in my lab, and have put together a ‘roadmap’, detailed in the companion post.
But this change hasn’t happened all at once, and several aspects of open science previously seemed either unnecessary or unappealing. Here are some of my previous objections, and why I changed my mind:
I always felt that preregistration was appropriate for clinical trials, but not much else. I remember being shocked to find out the that Declaration of Helsinki changed in 2008 (I still don’t really understand why a declaration can change and be updated) to mandate preregistration of all studies involving human participants. Much like the current objections to the NIH proposal to treat all such studies as ‘clinical trials’, I thought this was an error based on a lack of understanding of fundamental laboratory science. Much of what I do is basic experimental work, and it’s often exploratory – sometimes we don’t have a very strong prediction about exactly what the results will be, aside from think that they will be interesting.
Moreover, I think there’s also a psychological issue here. The implication that if a study is not preregistered it is somehow ‘dubious’ or ‘suspicious’ feels slightly offensive, as though one’s honour as a scientist is being called into question. But given the severity of the replication crisis, and numerous cases of fraud and malpractice across psychology as a whole, I don’t think we can just assume that everyone’s intentions are pure. Science is full of perverse incentives, and the incentive to publish is top of the list. So although it might sometimes feel like virtue signalling, preregistration is probably the most important structural change being currently introduced.
Having recently been involved in several preregistered studies, I now realise that the process need not be as restrictive as I had always assumed. The preregistration documents describe the methods, and an outline of the analyses. It’s just not necessary to predict every result and finding in detail, and there is room for serendipitous discoveries and exploratory analyses. At least a couple of our recent papers would have had greater face validity (and probably an easier time in review) if we’d preregistered. I also realised that preregistration can happen after initial pilot work, which is invaluable for scoping out the parameter space and fine-tuning an experiment. This is just good scientific practice, and particularly important when working in a new area or using a novel paradigm – it isn’t “cheating”!
Sample size and replication
When I was a postdoc, I published a study on individual differences in suppression between the eyes. We found a significant correlation between the rate of binocular rivalry and the amount of dichoptic masking. But our sample size wasn’t huge – about 40 participants. I always wondered if this effect would replicate, and felt faintly nervous in case it was a false positive. Luckily (for me!) a team at Cambridge incorporated similar conditions into the Pergenic study, which tested over 1000 participants, and replicated the effect in this much larger sample.
Of course, we can’t always rely on others to set our minds at ease in this way. Preregistration might make a finding more convincing if we’ve predicted it in advance, but really there is no substitute for large sample sizes and (ideally) internal replication, particularly for studies looking at individual differences. Sometimes this might be a conceptual replication, rather than a direct replication, where the stimuli or dependent variables might change, or a different population is tested (adults vs children for example). If a phenomenon is worth investigating, it’s worth investigating thoroughly and rigorously, and if that means spending more time testing more people, then I think that’s a price worth paying.
Many journals now mandate making data openly available for all articles they publish. I always thought this was pointless because I couldn’t imagine that anybody would ever be interested in accessing the raw data. But my opinion on this completely changed recently because I did a meta-analysis. I realised that pulling together all of the data across dozens of studies would have been much easier if it was all freely available online. But crucially, it would have gotten incrementally easier for each study with open data – it’s not an all-or-nothing approach. We can’t really predict which data sets will be useful to other people, or even how they might be used, so a blanket policy of posting data online is the only sensible solution.
I’ve heard people talking about Bayesian stats for ages, but I never really ‘got it’. A few years ago, I decided to include a lecture on Bayesian methods in my final year advanced module. So I did some reading. I’m not a mathematician and I don’t really understand the maths behind the more sophisticated Bayesian methods. But one thing really hit home and convinced me of the problems with frequentist methods. It’s something I’ve always sort of known and thought was a bit strange, but never really questioned or thought about in much detail. It’s that with frequentist stats (t-tests, ANOVAs, correlations and so on) the false positive rate is constant, regardless of sample size. That means that even a study with an impossibly large sample (say a million subjects) will still produce apparently significant (but actually spurious) results for 5% of tests! To my mind, this just can’t be OK. Bayesian versions of traditional tests accrue more evidence in support of either the null or experimental hypothesis with each new participant tested, meaning that a larger sample size will always give you a better estimate of reality. Regardless of any mathematical justification, this just seems right. Additionally, the barrier to entry for using Bayesian techniques is now significantly lower than it used to be, with easy-to-use software being freely available.
I’ve always favoured open access (OA) journals, particularly since one of the main journals in my field (Journal of Vision) has been free to read since it launched in 2001. But at the same time I’ve often felt that hybrid journal OA charges were a total rip off! The thinking behind hybrid journals was that they would act as a stepping stone, with an increasingly larger proportion of articles being open access, and subscription fees reducing as a consequence. But this just hasn’t happened, and traditional publishers seem to view gold OA fees as an additional revenue stream to be exploited. Furthermore, gold OA fees for most hybrid journals are higher than those for online-only ‘pure’ OA journals, which seems wrong to me. Surprisingly, a consortium of research funders have similar views, and from 2020 will no longer fund hybrid OA charges (“Plan S”). My intuition is that traditional subscription journals will survive in some form, though probably with reduced prestige given that well-funded labs will often be barred from publishing in them (though there is a green OA loophole that might get around this, particularly if publishers relax their embargo periods). But financial aspects aside, making all publications public is just self-evidently the right thing to do, regardless of how this is achieved.
I never bothered with posting preprints because nobody else did. Whilst it was common practice in fields like physics, nobody in the life sciences seemed to bother, and so neither did I. I guess there were also residual worries that a journal might reject a paper on the basis that it was already available. But now preprint use is widespread, with funders even allowing citation of preprints in grant proposals, and so none of these objections are valid. Even if nobody reads your preprint, it’s still there, still part of the scientific record, and is a useful form of Green open access.
Avoiding commercial software
This is a tricky one. I was raised (scientifically speaking) using Matlab, and it’s still the core programming language in my lab for running experiments and analysing data. Sticking with what you know is attractive, particularly as there’s lot of legacy code that at present makes for quick progress when working on something new. And yet, Matlab itself is problematic, particularly when updates change the way built-in functions work and break old code. That’s not very future-proof, and most of my code from when I was a PhD student doesn’t actually work anymore without modification. Over the past few years I’ve started using R for creating figures, and increasingly for other types of data analysis. I think shifting away from Matlab will be a gradual transition, with new members of the lab bringing new skills and experience, so that over time we shift to languages like Python and R.
Another bit of commercial software that I use quite heavily is Adobe Illustrator, which is great for compositing multi-part figures, particularly when it’s important to keep images in vector format. But it’s expensive, and Adobe’s subscription licensing approach means that the costs are annualised. None of the open source alternatives I’ve looked at are really up to the job yet. However, I recently discovered that a toolbox exists in R for combining multiple EPS files (the grImport library). I haven’t used this properly yet, and it doesn’t deal well with transparency, but it looks like the way forward.
Now that I’m a convert, I’ve devised a roadmap to make all of the work that happens in my lab more open. I’ll go over the details of that roadmap in part 2 of these posts. But just to conclude, I think that the various components of the open science movement add up to a fundamental paradigm shift. Most of the time we think of paradigm shifts as focussing around a particular theoretical idea, like Newtonian physics or evolution. But it can be applied more generally to refer to “a fundamental change in the basic concepts and experimental practices of a scientific discipline”. In this sense, the open science movement represents a major paradigm shift for all of modern scientific research. Right now we are at the transition point between the old, closed system where scientists guard their data, and publishers restrict access to publications, and the system of the future, where knowledge is shared. In ten years time the old system will seem absurdly outdated, and the sooner we routinely adopt open practices the better for everyone.