Open EEG data


I’ve been thinking recently about how best to share data from the lab, as part of our adoption of open science practices. In particular, we generate quite a lot of EEG data. Unlike the MRI community, who have a universal file format (the Nifti), there is no widely-agreed standard format for EEG data. Almost all systems use a proprietary file format, for example our main EEG systems are ANT Neuroscan systems, which use a file format called EEProbe. For long term data accessibility, this is not ideal – companies go bust, and file formats get forgotten about. Also, reading these file formats into languages like R is problematic. Unless someone has written code to read the file type, they are next to useless (see this blog post by Matt Craddock for further discussion).

So, I have decided that the raw data files generated by the EEG system should be converted to another file format so that they can be shared more easily. Previously much of our analyses were done in Matlab, and so the .mat file format was a possibility. However this is actually just another proprietary file format (owned by the Mathworks), so might not still be widely readable in a few decades time. The very simplest thing I could think of was just to use a comma separated value (csv) file – a text file format used to store data, in which each field is separated by a comma.

For this to work, the data layout needs to be logical and standardised, and able to be adapted to use with various different EEG systems and electrode montages. For the main data files, the first column should show the time of each sample (in ms), and the second column should contain trigger values. Each subsequent column will contain the data from one electrode (with the column header containing the electrode name). Here is an example of some data in this format:


Most of the triggers turn out to be zeros, because the trigger events happen only rarely. However I think it’s better to store the triggers along with the data, rather than in a separate file (which is what the EEProbe format does). The electrodes can appear in any order – these will get matched up with the montage later. The big advantage to this format is simplicity – it’s clear what is being stored in each column of the spreadsheet and what to do with it. However there is one big disadvantage – the raw csv files are at least 10x larger than the original files from the EEG system. This is because the EEProbe file format uses some form of compression to reduce the file size, whereas csv files are uncompressed.

We can also compress the csv files, using something like gzip. The compressed files are still quite a lot larger than the originals – approximately 4.2x larger – but I think that’s manageable because these days storage is cheap, and the aim here is to host the data files on a website like the OSF or Figshare, which don’t charge for hosting publicly available data anyway. Crucially, the gzip compression is transparent to software like R, which can load a csv.gz file using the same command as you would use to load an uncompressed csv file. This can be done using native R functions, and is as simple as:

data <- read.csv('file.csv.gz',header=TRUE)

Of course data can also be read into other programming languages, or even packages like Excel.

I have written a Matlab script to convert cnt files to gzipped csv files, which is linked to below. This function uses the EEProbe CNT reader plugin from EEGlab, and so requires EEGlab to be installed, and visible on the Matlab path. The idea is that you give it the path to a folder as an input, and it will convert all the cnt files it finds in there into csv format. I’m not particularly intending for others to use this script as is, but rather it’s a useful template to adapt to process data from different systems.

Besides the raw data, it is also often useful to have meta-data associated with the study. Whereas the raw data files will usually involve one file per recording session (block), the meta-data should be common to all participants and sessions in a study, so only needs to be generated once. I also used the csv format for this, and laid it out as follows:


The first two columns contain useful information about the study, including a description, the year it took place, and parameters of the EEG system and experimental conditions. The second two columns contain all legal trigger codes, along with a description of the conditions they indicate. Next I included a list of participant numbers for whom complete data sets exist. Then the next three columns give the labels and x and y positions of each electrode in the montage, and all remaining columns are to draw cartoon head, nose and ears. This is to permit the creation of scalp plots. I think this is more or less everything one would need to process the results of a typical EEG experiment. Again, example code to create the header file is linked below.

As a first step, I thought it would be a good idea to upload one of our largest data sets in this format. It is from a steady-state EEG experiment measuring contrast response functions, in which we tested N=100 participants. The results are reported in a recent paper (Vilidaite et al., 2018). We plan to do some secondary analyses on this data set, but I think it’s an unusual resource that others might find interesting as well. It can be accessed here:

Example script to convert to csv format:
Example script to create a header file:


How many trials should each participant do in an experiment?


Whenever we design a new experiment, we have to specify how many times each participant should repeat each condition. But how do we decide this? I think most researchers base this decision on things like the amount of time available, what they did in their last study, and what seems ‘about right’. We all know that running more trials gets us ‘better’ data, but hey, you’ve got to be pragmatic as well right? Nobody would expect their participants to do an experiment lasting hours and hours (except psychophysicists…).

I started thinking about this a few months ago, and discovered that the number of trials has a surprisingly direct effect on the statistical power of a study design. Power is the probability that a study design will be able to detect an effect of a particular size. Most people know that, for a given effect size, power increases as a function of sample size (see the figure below). But it turns out that under certain conditions power can also depend on the number of trials each participant completes.


Power as a function of sample size. For each effect size (curve), power increases monotonically as a function of the number of participants.

So what are these conditions? Well first let’s imagine a situation where the number of trials doesn’t matter. This would be the case if we could very precisely estimate the true value for each participant of whatever it is we’re measuring. Let’s imagine we have some extremely accurate and well-calibrated scales, probably using lasers or something, which measure the participant’s weight to within a fraction of a gram. Under these circumstances, the differences between people (formally the between participants standard deviation σb) will be much larger than the differences between repeated measurements of the same participant (formally the within participant standard deviation, σw, which will be much less than 1g). Here, the spread of values in a sample of people (the sample standard deviation, σs) will be determined primarily by the between participants standard deviation, and it won’t matter how many times we measure each individual. We might get a distribution that looks something like this:


Distribution of values in a sample of participants. Each point represents one individual (N=50), and the curve shows the overall sample standard deviation for an infinite sample size.

In this situation, it just doesn’t matter how many trials we run, because each estimate is very accurate, and the participants’ weights are very stable from moment to moment. But what would happen if neither of these conditions were met? In psychology and human neuroscience we are trying to measure things that are not very stable, because brain activity changes from moment to moment, as participants’ concentration fluctuates, as they move around, think about other things, and so on. We are also often making measurements using equipment that is itself subject to noise, for example in neuroimaging studies. All of these sources of noise make it much harder to measure the true (mean) value for an individual participant, and so each estimate ends up having a greater variance. This is why we typically conduct many (similar or identical) trials in each condition. In the left hand figure below, the individual points now have an associated (horizontal) error, and this increases the spread of the sample standard deviation (curve).


Distributions of values in a sample of noisier participants. Here each point has an associated variance (expressed by the horizontal standard errors), and the sample standard deviation is affected by the within-participant variance. In the left panel k=20 trials were simulated for each participant, in the right panel there were k=200 trials. Increasing the number of trials reduces the sample standard deviation.

An increased sample standard deviation will affect statistical power by reducing the effect size. Effect size measures such as Cohen’s d are calculated by dividing the mean effect by the sample standard deviation, so a bigger standard deviation results in a smaller effect size. This means we can derive a set of power curves analogous to those in the first figure above, but as a function of the number of trials per participant (k) for a fixed sample size:


The effect of number of trials on sample standard deviation (left) and power (right) for a range of within-participant standard deviations. We assumed σb=0, M=0.2, and N=200 for these calcualtions.

I suspect the above plots will be rather surprising to a lot of people. They make explicit the vague heuristic that ‘more trials is better’ by showing how data quality has a direct effect on statistical power. Most a priori power analyses assume that effect size is constant, because it is invariant to sample size (though the accuracy with which effect size is measured increases with N). For this reason, power calculations typically optimise the sample size (N), and ignore the number of trials. But we could perform complementary calculations, where we assume a fixed sample size, and manipulate the number of trials to achieve a desired level of power. Or, more realistically, since both N and k are degrees of freedom available to the experimenter, we should consider them together when designing a study.

This is the aim of a recent paper (preprint) which proposes to represent statistical power as the joint function of sample size (N) and number of trials (k). The two-dimensional ‘power contour’ plots below are hypothetical examples for different values of within- and between-participant standard deviations. In the left panel, the within-participant standard deviation is negligible, and increasing the number of trials does not affect power. The vertical lines are iso-power contours – combinations of values which produce the same level of power. It’s clear for the left example that power is invariant with k. However, in the right hand panel, the within-participant standard deviation is large, and the power contours become curved. Now there are many combinations of N and k that will provide 80% power (thick blue line). In principle any of these combinations might constitute a valid study design, and experimenters can choose a combination based on other constraints, such as the time available for testing each participant, or how easy it is to recruit from the desired sample. Power contours can be generated using an online Shiny app, available here:


Example simulated power contours for different values of σw. In both panels, the sample mean was M=1 and the between-participants standard deviation was σb = 2. In the left panel the within-participant standard deviation was σw = 0, and in the right panel it was σw = 10. The contours show combinations of N and k which give a constant statistical power.

Of course, for this method to be useful, we need to check that the power contours look more like the second plot above than the first. And we also should have some idea about the likely within- and between-participant standard error for the technique we plan to use. To this end, we reanalysed 8 existing data sets for a range of widely used methods, including reaction times, sensory thresholds, EEG, MEG, and fMRI. In all cases it turned out that the within-participant variance was greater than the between-participants variance, and power contours (generated by repeatedly subsampling the data) had the expected shape.

Consideration of the power contour plot is instructive when thinking about different experimental traditions. In some sub-disciplines it is commonplace to test large numbers of participants on relatively few trials, occupying the region in the lower right hand corner of the space. Other experimental traditions (for example psychophysics) go to the other extreme – a small number of participants complete very large numbers of trials each. Both approaches have their advantages and disadvantages, but it is clear that under reasonable assumptions, high statistical power can be achieved.

Overall, we think that calculating power contours offers a useful framework for thinking in more detail about the design of future studies. Of course, as with any type of power analysis, the outcome depends on the assumptions that we make about the likely difference between means, and the variances involved. We will never know these values for sure and can only estimate them once we have conducted a study. However the values in the paper are plausible, and we have made all scripts and data available so that others can see how to conduct similar analyses on their own data.

Some useful links:

A lab roadmap (an open science manifesto, part 2)


Following my ‘conversion’ to open science practices, detailed in my previous post, I have put together a roadmap for transitioning research in my lab to be more open. Whilst many of these changes won’t happen immediately, we’re at a point in time where we’re wrapping up many studies before starting new ones. So my intention is that all new studies will follow as many of these guidelines as possible. Studies which are completed but not yet published will also aim to use as much of the roadmap as is practical. Within 2-3 years this backlog of current work should be cleared, paving the way for a more open future.

Study design: Routinely pilot new paradigms

Pilot work is an important part of the scientific process. Without adequate piloting, experimental paradigms will be poorly understood, stimuli may not be optimised, and much time can be wasted by conducting full studies that contain a basic (avoidable) error. We are lucky in psychology and cognitive neuroscience that our participants are humans, and we are also humans. So it’s completely reasonable to run ourselves through several iterations of a pilot experiment, to get a feel for what is happening and make sure that experimental code is working and producing sensible results. This isn’t “cheating”, it’s just good scientific practice.

Study design: Larger sample sizes and internal replication

My background is in low-level vision, where it is quite common to run a very small number of participants (usually about N=3) on a very large number of trials (usually several thousand per experiment, taking many hours across multiple sessions). For some types of study, this is the only realistic way of completing an experiment – we simply cannot expect dozens of volunteers to each do 40 hours of psychophysics. But even in these sorts of studies, it is often possible to confirm our main results using a subset of critical conditions, with a much larger sample size (see e.g. this paper). This constitutes a form of internal replication, where the basic effect is confirmed, and sometimes the findings are extended to additional conditions. Other types of internal replication we have used in the past include running a similar experiment using a different imaging modality (EEG and MRI, or EEG and MEG), or replicating in a different population (e.g. adults vs children). These types of study design help us to move away from the single-experiment paper, and guard against false positives and other statistical anomalies, to make sure that the work we publish is robust and reproducible. I’m not sure it’s helpful to specify minimum sample sizes, as this will vary depending on the paradigm and the relevant effect sizes, but I’d expect to at least double the sample size of all new studies going forwards. A few years back we ran an EEG project where we tested 100 people using a steady-state paradigm. The average data you get with such a large sample size are incredibly clean, and can be used for answering any number of secondary questions.

Study design: Include meta-analyses where appropriate

In many areas of research, there are already lots of studies investigating the same basic phenomenon. A meta-analysis is a systematic, empirical method for summarising all of these results. Sometimes this will answer a research question for you, without requiring any new data collection. In other situations, the meta-analysis can make explicit what is currently unknown. In visual psychophysics, we have a huge body of existing research using consistent methods, and spanning many decades. Yet meta-analyses are rarely used. Where appropriate, we will conduct meta-analyses to complement empirical work.

Study design: Preregister all studies

Preregistration requires very similar information to an ethics proposal, so these two steps should be done at around the same time, before any formal data collection begins. Importantly, the preregistration documents should detail both the design of the experiment, and also how the data will be analysed (and what hypotheses are being tested). This helps to guard against the common, but problematic, practice of “HARKing” (Hypothesising after the results are known), though of course it is still acceptable to perform additional and exploratory analyses after the data have been collected. The Open Science Framework provides a straightforward platform for preregistration, and will also host analysis scripts and data once the study has been conducted.

Study design: Move from Matlab/PTB to Python/Psychopy

For over a decade, I have used a combination of Matlab and Psychtoolbox to run most experiments. This works well, and has many advantages, not least that my familiarity with these tools makes setting up new experiments very quick. But Matlab is a closed commercial language, and Psychtoolbox development has slowed in recent years. In contrast, Jon Peirce’s Psychopy is completely open source, and is under intensive development by many people. It also benefits from a graphical interface, making it easier for project students to use to set up their own experiments. I’ve made some inroads to starting to learn Python, though I’m aware that I still have a very long way to go on that front. But Rome wasn’t built in a day, as they say, and I’m sure I’ll get there in a few years.

Analysis: Script all data analysis in R

Although graphical interfaces in packages such as SPSS are intuitive and straightforward, statistical analyses performed in this way are not easy to reproduce. Creating a script in R means that others (including yourself in the future) can reproduce exactly what you did, and get the same results. Since every aspect of R is open source, sharing analysis scripts (and data, see below) means that others can reproduce your analyses. R also produces excellent figures, and copes well with alpha transparency, meaning it can be used for the entire analysis pipeline. There are some down sides to doing this – R is sometimes slower than Matlab, it has less provision for parallel computing, and far fewer specialised toolboxes exist (e.g. for EEG, MRI or MEG analysis). So the intention to do all analyses in R might not be realised for every single study, but it will be increasingly possible as more tools become available.

Analysis: Create a lab R toolbox

Some things need doing the same way every time, and it makes sense to write some robust functions to perform these operations. In R you can create a custom package and share it through GitHub with others in the lab (or elsewhere). I’ve already started putting one together and will post it online when the first iteration is finished.

Analysis: Level-up data visualisation

I’ve been working pretty hard on improving data visualisation over the past few years. I like including distributions wherever possible, and am a fan of using things like violin plots, raincloud plots and the like to replace bar graphs. Showing individual participant data is pretty standard in threshold psychophysics (where often N=3!), and I think this is generally worthwhile in whatever form is appropriate for a given data set. In many studies we measure some sort of function, either by parametrically varying an independent variable, or because measures are made at multiple time points. Superimposing each participant’s function in a single plot, along with the average (as is typical for grand mean ERPs), or showing individual data in a supplementary figure are both good ways to present data of this type. Of course sometimes there will be outliers, and sometimes data are noisy, but that’s OK!

Analysis: Use Bayesian statistics

Being from a visual psychophysics background, I probably use fewer traditional statistical tests than a lot of researchers. But I do still use them, and with them come all the well-established problems with false positives and an over-reliance on p-values. I think the Bayesian approach is more rational, and I’d like to use it more in the future. At the moment I’m only really comfortable using the basic features of packages like BayesFactor, but over time I’d like to learn how to create more complex Bayesian models to understand our results. So the plan for the moment is to use Bayesian versions of tests where possible, and to embrace the Bayesian philosophy and approach to data analysis.

Publication: Always post preprints

There’s just no reason not to do this anymore. Preprints increase citations and visibility of publications, and they’re a form of green open access. BioRxiv is good for neuroscience papers, PsyArXiv for more psychology-related work. Everything should be posted as a preprint before submission to a journal. A really neat idea I saw recently was to include the DOI of the preprint in the abstract of the submitted paper – that way there is a direct link to an open access version of the work in all versions of the abstract that get indexed by services like PubMed.

Publication: Always share raw data

At the same time as a preprint is posted, all data involved in the study will also be made available online. In the past I’ve sometimes posted data (and all studies with available data now have a link on the Publications page), but this has often been partly processed data – detection thresholds, or steady state amplitudes. For open data to be truly useful, it should be as comprehensive as possible. So for this reason, we will wherever possible post the raw data, along with the scripts used to perform the analyses reported in the study. There are several potential hurdles to this, most importantly the intention to make data available should be stated explicitly in the initial ethics proposal, as well as in all information and consent materials so that participants in the study are agreeing for their data to become public (albeit in anonymised form). Next the data should be stored in an open format, which is particularly problematic for EEG data, as EEG systems often use proprietary file formats. Solving this problem will be the topic of a future post. Finally, large data files need to be stored somewhere accessible online. Fortunately a number of sites such as the Open Science Framework and Figshare offer unlimited storage space for publicly available files in perpetuity. Meta-data should be included with all data files to make them comprehensible.

Publication: Aim for true open access

Where possible I’ll aim to publish in pure open access journals. There are lots of good examples of these, including eLife, Nature Communications, Scientific Reports, the PLoS journals, Journal of Vision, iPerception and Vision. Unfortunately there are also many dubious journals which should be avoided. Paying the article processing fees can get expensive, and so this might not always be possible, particularly for work that is not grant funded. In that situation, green OA is an acceptable substitute, particularly when a preprint has already been posted making the work available. But I find it morally dubious at best to pay gold OA charges for publishing in a subscription journal, so will try my hardest to avoid doing so.

Publication: Disseminate work through Twitter and blog posts

I think it’s important to try and disseminate work widely, so each new publication should be publicised on Twitter, and usually with its own blog post. This will allow us to link the final published article with all of the related resources, and also include an accessible summary of the work, and part of the story about how it came about. In the past when papers have generated media interest, this is also a useful place to collate links to the coverage.

Overall ethos: Aim for fewer, larger studies

As discussed in the sections on sample size and replication, going forward I’m planning to aim for much larger sample sizes. A lot of work these days seems to test ‘just enough’ participants to be publishable. But data sets are so much richer and more informative, and less likely to generate spurious findings, when the sample size is larger. The cost here is that bigger studies take more time and resources, and so this means that probably fewer experiments can be done in total. I feel OK about that though. I think I’ve reached a point in my career where I’ve published plenty of papers, and going forward I’d prefer to aim for quality over quantity (not to say that any of my existing work is of low quality!).

This will mean changing the way some studies are conducted. Traditional threshold psychophysics experiments involve testing a small number of participants on many conditions. It might be necessary to flip this paradigm around, and test many participants on a small subset of conditions each (which I’ve done in the past). For studies that can be run outside of a highly controlled lab setting, online recruitment tools will be worth investigating. Student projects can be run in groups and across multiple years to increase the sample size. And when writing grant proposals, funds can be requested to cover the costs of testing a larger sample. Many funders (for example the MRC) explicitly support reproducible, open science, and would rather fund something that will deliver a definitive answer, even if it costs a bit more.

Overall ethos: Pass on these habits to the next generation

All of the above activities should, in time, become part of the lab culture. This means that all project students, PhD students and postdocs working in the lab should pick up the general habit of openness, and take this with them to whatever they do next. I’ll also try to spread these practices to people I collaborate with, so hopefully this series of blog posts will help make the case for why it’s important and worth the effort!

How I self-radicalised (an open science manifesto, part 1)


I don’t post very often on Twitter, but I do read things that others post, especially over the past couple of years since I stopped using Facebook. I can’t remember when I first became aware of the open science ‘movement’ as it’s often called. I suppose that I read about the various components separately at different times. Much like religious extremists who ‘self-radicalise’ by reading material online and watching videos, over the past couple of years I’ve gradually come around to this point of view. I now intend to dramatically change how we do research in my lab, and have put together a ‘roadmap’, detailed in the companion post.

But this change hasn’t happened all at once, and several aspects of open science previously seemed either unnecessary or unappealing. Here are some of my previous objections, and why I changed my mind:


I always felt that preregistration was appropriate for clinical trials, but not much else. I remember being shocked to find out the that Declaration of Helsinki changed in 2008 (I still don’t really understand why a declaration can change and be updated) to mandate preregistration of all studies involving human participants. Much like the current objections to the NIH proposal to treat all such studies as ‘clinical trials’, I thought this was an error based on a lack of understanding of fundamental laboratory science. Much of what I do is basic experimental work, and it’s often exploratory – sometimes we don’t have a very strong prediction about exactly what the results will be, aside from think that they will be interesting.

Moreover, I think there’s also a psychological issue here. The implication that if a study is not preregistered it is somehow ‘dubious’ or ‘suspicious’ feels slightly offensive, as though one’s honour as a scientist is being called into question. But given the severity of the replication crisis, and numerous cases of fraud and malpractice across psychology as a whole, I don’t think we can just assume that everyone’s intentions are pure. Science is full of perverse incentives, and the incentive to publish is top of the list. So although it might sometimes feel like virtue signalling, preregistration is probably the most important structural change being currently introduced.

Having recently been involved in several preregistered studies, I now realise that the process need not be as restrictive as I had always assumed. The preregistration documents describe the methods, and an outline of the analyses. It’s just not necessary to predict every result and finding in detail, and there is room for serendipitous discoveries and exploratory analyses. At least a couple of our recent papers would have had greater face validity (and probably an easier time in review) if we’d preregistered. I also realised that preregistration can happen after initial pilot work, which is invaluable for scoping out the parameter space and fine-tuning an experiment. This is just good scientific practice, and particularly important when working in a new area or using a novel paradigm – it isn’t “cheating”!

Sample size and replication

When I was a postdoc, I published a study on individual differences in suppression between the eyes. We found a significant correlation between the rate of binocular rivalry and the amount of dichoptic masking. But our sample size wasn’t huge – about 40 participants. I always wondered if this effect would replicate, and felt faintly nervous in case it was a false positive. Luckily (for me!) a team at Cambridge incorporated similar conditions into the Pergenic study, which tested over 1000 participants, and replicated the effect in this much larger sample.

Of course, we can’t always rely on others to set our minds at ease in this way. Preregistration might make a finding more convincing if we’ve predicted it in advance, but really there is no substitute for large sample sizes and (ideally) internal replication, particularly for studies looking at individual differences. Sometimes this might be a conceptual replication, rather than a direct replication, where the stimuli or dependent variables might change, or a different population is tested (adults vs children for example). If a phenomenon is worth investigating, it’s worth investigating thoroughly and rigorously, and if that means spending more time testing more people, then I think that’s a price worth paying.

Open data

Many journals now mandate making data openly available for all articles they publish. I always thought this was pointless because I couldn’t imagine that anybody would ever be interested in accessing the raw data. But my opinion on this completely changed recently because I did a meta-analysis. I realised that pulling together all of the data across dozens of studies would have been much easier if it was all freely available online. But crucially, it would have gotten incrementally easier for each study with open data – it’s not an all-or-nothing approach. We can’t really predict which data sets will be useful to other people, or even how they might be used, so a blanket policy of posting data online is the only sensible solution.

Bayesian statistics

I’ve heard people talking about Bayesian stats for ages, but I never really ‘got it’. A few years ago, I decided to include a lecture on Bayesian methods in my final year advanced module. So I did some reading. I’m not a mathematician and I don’t really understand the maths behind the more sophisticated Bayesian methods. But one thing really hit home and convinced me of the problems with frequentist methods. It’s something I’ve always sort of known and thought was a bit strange, but never really questioned or thought about in much detail. It’s that with frequentist stats (t-tests, ANOVAs, correlations and so on) the false positive rate is constant, regardless of sample size. That means that even a study with an impossibly large sample (say a million subjects) will still produce apparently significant (but actually spurious) results for 5% of tests! To my mind, this just can’t be OK. Bayesian versions of traditional tests accrue more evidence in support of either the null or experimental hypothesis with each new participant tested, meaning that a larger sample size will always give you a better estimate of reality. Regardless of any mathematical justification, this just seems right. Additionally, the barrier to entry for using Bayesian techniques is now significantly lower than it used to be, with easy-to-use software being freely available.

Open access

I’ve always favoured open access (OA) journals, particularly since one of the main journals in my field (Journal of Vision) has been free to read since it launched in 2001. But at the same time I’ve often felt that hybrid journal OA charges were a total rip off! The thinking behind hybrid journals was that they would act as a stepping stone, with an increasingly larger proportion of articles being open access, and subscription fees reducing as a consequence. But this just hasn’t happened, and traditional publishers seem to view gold OA fees as an additional revenue stream to be exploited. Furthermore, gold OA fees for most hybrid journals are higher than those for online-only ‘pure’ OA journals, which seems wrong to me. Surprisingly, a consortium of research funders have similar views, and from 2020 will no longer fund hybrid OA charges (“Plan S”). My intuition is that traditional subscription journals will survive in some form, though probably with reduced prestige given that well-funded labs will often be barred from publishing in them (though there is a green OA loophole that might get around this, particularly if publishers relax their embargo periods). But financial aspects aside, making all publications public is just self-evidently the right thing to do, regardless of how this is achieved.

Posting preprints

I never bothered with posting preprints because nobody else did. Whilst it was common practice in fields like physics, nobody in the life sciences seemed to bother, and so neither did I. I guess there were also residual worries that a journal might reject a paper on the basis that it was already available. But now preprint use is widespread, with funders even allowing citation of preprints in grant proposals, and so none of these objections are valid. Even if nobody reads your preprint, it’s still there, still part of the scientific record, and is a useful form of Green open access.

Avoiding commercial software

This is a tricky one. I was raised (scientifically speaking) using Matlab, and it’s still the core programming language in my lab for running experiments and analysing data. Sticking with what you know is attractive, particularly as there’s lot of legacy code that at present makes for quick progress when working on something new. And yet, Matlab itself is problematic, particularly when updates change the way built-in functions work and break old code. That’s not very future-proof, and most of my code from when I was a PhD student doesn’t actually work anymore without modification. Over the past few years I’ve started using R for creating figures, and increasingly for other types of data analysis. I think shifting away from Matlab will be a gradual transition, with new members of the lab bringing new skills and experience, so that over time we shift to languages like Python and R.

Another bit of commercial software that I use quite heavily is Adobe Illustrator, which is great for compositing multi-part figures, particularly when it’s important to keep images in vector format. But it’s expensive, and Adobe’s subscription licensing approach means that the costs are annualised. None of the open source alternatives I’ve looked at are really up to the job yet. However, I recently discovered that a toolbox exists in R for combining multiple EPS files (the grImport library). I haven’t used this properly yet, and it doesn’t deal well with transparency, but it looks like the way forward.

Concluding remarks

Now that I’m a convert, I’ve devised a roadmap to make all of the work that happens in my lab more open. I’ll go over the details of that roadmap in part 2 of these posts. But just to conclude, I think that the various components of the open science movement add up to a fundamental paradigm shift. Most of the time we think of paradigm shifts as focussing around a particular theoretical idea, like Newtonian physics or evolution. But it can be applied more generally to refer to “a fundamental change in the basic concepts and experimental practices of a scientific discipline”. In this sense, the open science movement represents a major paradigm shift for all of modern scientific research. Right now we are at the transition point between the old, closed system where scientists guard their data, and publishers restrict access to publications, and the system of the future, where knowledge is shared. In ten years time the old system will seem absurdly outdated, and the sooner we routinely adopt open practices the better for everyone.

Psychophysical meta-analysis


Much of our understanding of sensory systems comes from psychophysical studies conducted over the past century. This work provides us with an enormous body of information that can guide contemporary research. Meta-analysis is a widely used method in biomedical research that aims to quantitatively summarise the effects from a collection of studies on a given topic, often producing an aggregate estimate of effect size. Yet whilst these tools are commonplace in some areas of psychology, they are rarely employed to understand sensory perception. This may be because psychophysics has some idiosyncratic properties that make generalisation difficult: many studies involve very few participants (frequently N<5), and most use esoteric methods and stimuli aimed at answering a single question. Here I suggest that in some domains, the tools of meta-analysis can be employed to overcome these problems to unlock the knowledge of the past.

In previous publications, I have occasionally aggregated data across previous studies to address a specific question. For example, in 2012 I published a paper that plotted the slope of the psychometric function with and without external noise, collated from 18 previous studies. This revealed a previously unreported effect of the dimensionality of the noise on the extent to which psychometric functions are linearised. Then in 2013 I aggregated contrast discrimination ‘dipper’ functions from 18 studies and 63 observers, to attempt to understand individual differences in detection threshold. This data set was also averaged to characterize discrimination performance in terms of the placement of the dip and the steepness of the handle.

These examples added value to the papers they were included in by reanalysing existing data in a novel way. But they are not traditional examples of meta-analysis, as they focussed on the (threshold and slope) data of individual participants from the studies included, instead of averaging measures of effect size across studies.

An excellent example of a study that collates effect size measures (Cohen’s d) across multiple psychophysical studies is an authoritative and detailed meta-analysis by Hedger et al. (2016). This paper investigates how visually threatening stimuli (such as fearful faces) are processed in the absence of awareness, when the stimuli were rendered invisible by manipulations such as masking and binocular rivalry. This is a heavily researched area, and the studies included contained a total of 2696 participants. Overall, this study concludes that masking paradigms produce convincing effects, binocular rivalry produces medium effects, and that effects are inconsistent using a continuous flash suppression paradigm. Additional analyses drill down into the specifics of each study, exploring how stimuli and experimental designs influence outcomes.

Inspired by this exemplary work, my collaborators and I recently undertook a meta-analysis of binocular summation – the improvement in contrast sensitivity when stimuli are viewed with two eyes instead of one. This is also a heavily investigated topic because of its clinical utility as an index of binocular health and function, and we included 65 studies with a total sample size of 716 participants. Our central question was whether the summation ratio (an index of the binocular advantage) significantly exceeded the canonical value of √2 first reported by Campbell and Green (1965). Many individual studies reported ratios higher than this, but sample sizes were often small (median N=5 across the 65 studies) meaning that individual variability could have a substantial effect. We averaged the mean summation ratios using three different weighting schemes (giving equal weight to studies, weighting by sample size, and weighting by the inverse variance). Regardless of weighting, the lower bound of the 95% confidence interval on the mean summation ratio always exceeded √2, conclusively overturning a long established psychophysical finding, with implications for our understanding of nonlinearities early in the visual system.

We also performed additional analyses to explore the effect of stimulus spatiotemporal frequency, and the difference in sensitivity across the eyes, confirming our findings with new data. This work reveals an effect of stimulus speed (the ratio of temporal to spatial frequency), suggesting that neural summation varies according to stimulus properties, and meaning that there is no ‘true’ value for binocular summation, rather a range of possible values between √2 and 2. Our analysis of monocular sensitivity differences leads to a deeper understanding of how best to analyse the data of future studies.

Although the summation meta-analysis was conducted using the summation ratio as the outcome variable, it is possible to convert the aggregate values to more traditional measures of effect size. Doing this revealed an unusually large effect size (Cohen’s d=31) for detecting the presence of binocular summation, and another large effect size (Cohen’s d=3.22) when comparing to the theoretical value of √2. These very large effects mean that even studies with very few participants (N=3) have substantial power (>0.95). In many ways, this can be considered a validation of the widespread psychophysical practice of extensively testing a small number of observers using very precise methods.

Overall, meta-analysis can reveal important psychophysical effects that were previously obscured by the limitations of individual studies. This provides opportunities to reveal findings involving large aggregate sample sizes, that will inspire new experiments and research directions. The binocular summation meta analysis is now available online, published in Psychological Bulletin [DOI].

Marmite, and the spread of misinformation


Last week we published a study about Marmite affecting brain function in the Journal of Psychopharmacology. Perhaps unsurprisingly, this got a huge amount of media attention, with coverage on radio, television and in print. Anika and I did a range of interviews, which was an interesting and exhausting experience!

What was really striking was watching how the echo chamber of the internet handled the story. We were very careful in our press release and interviews not to name any specific diseases or disorders that might be affected by our intervention. What we think is happening is that the high levels of vitamin B12 in Marmite are stimulating the production of GABA in the brain, leading to a reduction of neural activity in response to visual stimuli. Now it happens that GABA deficits are implicated in a whole range of neurological diseases and disorders, but since we haven’t tested any patients we can’t say whether eating Marmite could be a good thing, a bad thing, or have no effect on any diseases at all.

But to the media, this somehow became a study about trying to prevent dementia! Headlines like “Marmite may boost brain and help stave off dementia” (Telegraph) were exactly what we wanted to avoid, particularly because of the risk that some patient somewhere might stop taking their medication and eat Marmite instead, which could be very dangerous. We even stated very clearly in our press release:

“Although GABA is involved in various diseases we can make no therapeutic recommendations based on these results, and individuals with a medical condition should always seek treatment from their GP.”

But these cautions were roundly ignored by most of the reporters who covered the piece (even those who interviewed us directly), as amusingly and irreverently explained in an article from Buzzfeed. I think a big part of the problem is that it is not routine practise for scientists whose work is covered in the media to give approval of the final version of a story before it is published (or even to get to see it). Maybe a mechanism by which authors can grant some sort of stamp of approval to a story needs to be developed to prevent this sort of thing and avoid the spread of misinformation. In the meantime, it’s been an amazing example of how, despite our best efforts, the media will just report whatever they want to, however tenuously it’s linked to the underlying findings.

The paper:
Smith, A.K., Wade, A.R., Penkman, K.E.H. & Baker, D.H. (2017). Dietary modulation of cortical excitation and inhibition. Journal of Psychopharmacology, in press, [DOI].

Repository version (open access)

University of York press release

A selection of media coverage:

The Independent
The Telegraph
The Times
Sky News
Sky News Facebook Live
The Mirror
The Express
The Sun
The Jersey Evening Post
The Daily Maverick
Japan Times
Yorkshire Post
Eagle FM
Stray FM
New Zealand Herald
Huffington Post
Science Focus
Science Media Centre
Neuroscience News
Daily Star
Boots WebMD
Pakistan Today
Washington Times
Men’s Health
South China Morning Post
Good Housekeeping
Medical News Today
Daily Mail
Daily Mail


Estimating Oculus Rift pixel density


A few months ago I bought an Oculus Rift DK2. Although these are designed for VR gaming, they’re actually pretty reasonable stereo displays. They have several desirable features, particularly that the OLED display is pulsed stroboscopically each frame to reduce motion blur. However, this also means that each pixel is updated at the same time, unlike on most LCD panels, meaning they can be used for timing sensitive applications. As of a recent update they are also supported by Psychtoolbox, which we use to run the majority of experiments in the lab. Lastly, they’re reasonably cheap, at about £300.

In starting to set up an experiment using the goggles I thought to check what their effective pixel resolution was in degrees of visual angle. Because the screens are a fixed distance from the wearer’s eye, I (foolishly) assumed that this would be a widely available value. Quite a few people simply took the monocular resolution (1080 x 1200) and divided this by the nominal field of view (110° vertically), producing an estimate of about 10.9 pixels per degree. As it turns out, this is pretty much bang on, but that wasn’t necessarily the case, because the lenses produce increasing levels of geometric distortion (bowing) at more eccentric locations. This might have the effect of concentrating more pixels in the centre of the display, increasing the number of pixels per degree.

Anyway, I decided it was worth verifying these figures myself. Taking a cue from methods we use to calibrate mirror stereoscopes, here’s what I did…

First I created two calibration images, consisting of a black background, and either one central square, or two lateralised squares. All the squares were 200 pixels wide (though this isn’t crucial), and the one with two squares was generated at the native resolution of the Oculus Rift (2160×1200). Here’s how the first one looks:


And here’s how the other one, with only one square looked:


These images were created with a few lines of Matlab code:

ORw = 2160; % full width of the oculus rift in pixels
ORh = 1200; % height of the oculus rift in pixels
CSw = 1440; % height of other computer's display in pixels
CSh = 900;  % width of other computer's display in pixels
ORs = 200;  % width of the squares shown on the rift
CSs = 200;  % width of the square shown on the computer's display

a = zeros(ORh,ORw);
a((1+ORh/2-ORs/2):(ORh/2+ORs/2),(1+ORw/4-ORs/2):(ORw/4+ORs/2)) = 1;
a((1+ORh/2-ORs/2):(ORh/2+ORs/2),(1+3*ORw/4-ORs/2):(3*ORw/4+ORs/2)) = 1;

a = zeros(CSh,CSw);
a((1+CSh/2-CSs/2):(CSh/2+CSs/2),(1+CSw/2-CSs/2):(CSw/2+CSs/2)) = 1;

I then plugged in the Rift, and displayed the two-square image on it, and the one-square image on an iPad (though in principle this could be any screen, or even a printout). Viewed through the Rift, each square goes to only one eye, and the binocular percept is of a single central square.

Now comes the clever bit. The rationale behind this method is that we match the perceived size of a square shown on the Rift with one shown on the iPad. We do this by holding the goggles up to one eye, with the other eye looking at the iPad. It’s necessary to do this at a bit of an angle, so the square gets rotated to be a diamond, but we can rotate the iPad too to match the orientation. I found it pretty straightforward to get the sizes equal by moving the iPad forwards and backwards, and using the pinch-to-zoom operation.

Once the squares appeared equal in size I put the Rift down, but kept the iPad position fixed. I then measured two things: the distance from the iPad to my eye, and the width of the square on the iPad screen. The rest is just basic maths:

The iPad square was 7.5cm wide, and matched the Rift square at 24cm from the eye. At that distance an object 1cm wide subtends 2.4° of visual angle (because at 57cm, 1cm=1°). [Note, for the uninitiated, the idea of degrees of visual angle is that you imagine a circle that goes all the way around your head, parallel to your eyes. You can divide this circle into 360 degrees, and each individual degree will be about the size of a thumbnail held at arm’s length. The reason people use this unit is that it can be calculated for a display at any distance, allowing straightforward comparison of experimental conditions across labs.] That means the square is 2.4*7.5=18° wide. Because this is matched with the square on the Rift, the Rift square is also 18° wide. We know the square on the Rift is 200 pixels wide, so that means 18° = 200 pix, and 1° = 11 pixels. So, the original estimates were correct, and the pixel density at the centre of the screen is indeed 11 pixels/deg.

This is actually quite a low resolution, which isn’t surprising since the screen is close to the eye, individual pixels are easily visible, and the whole point of the Rift is to provide a wide field of view rather than a high central resolution. But it’s sufficient for some applications, and its small size makes it a much more portable stereo display than either a 3D monitor or a stereoscope. I’m also pleased I was able to independently verify other people’s resolution estimates, and have developed a neat method for checking the resolution of displays that aren’t as physically accessible as normal monitors.