Psychophysical meta-analysis


Much of our understanding of sensory systems comes from psychophysical studies conducted over the past century. This work provides us with an enormous body of information that can guide contemporary research. Meta-analysis is a widely used method in biomedical research that aims to quantitatively summarise the effects from a collection of studies on a given topic, often producing an aggregate estimate of effect size. Yet whilst these tools are commonplace in some areas of psychology, they are rarely employed to understand sensory perception. This may be because psychophysics has some idiosyncratic properties that make generalisation difficult: many studies involve very few participants (frequently N<5), and most use esoteric methods and stimuli aimed at answering a single question. Here I suggest that in some domains, the tools of meta-analysis can be employed to overcome these problems to unlock the knowledge of the past.

In previous publications, I have occasionally aggregated data across previous studies to address a specific question. For example, in 2012 I published a paper that plotted the slope of the psychometric function with and without external noise, collated from 18 previous studies. This revealed a previously unreported effect of the dimensionality of the noise on the extent to which psychometric functions are linearised. Then in 2013 I aggregated contrast discrimination ‘dipper’ functions from 18 studies and 63 observers, to attempt to understand individual differences in detection threshold. This data set was also averaged to characterize discrimination performance in terms of the placement of the dip and the steepness of the handle.

These examples added value to the papers they were included in by reanalysing existing data in a novel way. But they are not traditional examples of meta-analysis, as they focussed on the (threshold and slope) data of individual participants from the studies included, instead of averaging measures of effect size across studies.

An excellent example of a study that collates effect size measures (Cohen’s d) across multiple psychophysical studies is an authoritative and detailed meta-analysis by Hedger et al. (2016). This paper investigates how visually threatening stimuli (such as fearful faces) are processed in the absence of awareness, when the stimuli were rendered invisible by manipulations such as masking and binocular rivalry. This is a heavily researched area, and the studies included contained a total of 2696 participants. Overall, this study concludes that masking paradigms produce convincing effects, binocular rivalry produces medium effects, and that effects are inconsistent using a continuous flash suppression paradigm. Additional analyses drill down into the specifics of each study, exploring how stimuli and experimental designs influence outcomes.

Inspired by this exemplary work, my collaborators and I recently undertook a meta-analysis of binocular summation – the improvement in contrast sensitivity when stimuli are viewed with two eyes instead of one. This is also a heavily investigated topic because of its clinical utility as an index of binocular health and function, and we included 65 studies with a total sample size of 716 participants. Our central question was whether the summation ratio (an index of the binocular advantage) significantly exceeded the canonical value of √2 first reported by Campbell and Green (1965). Many individual studies reported ratios higher than this, but sample sizes were often small (median N=5 across the 65 studies) meaning that individual variability could have a substantial effect. We averaged the mean summation ratios using three different weighting schemes (giving equal weight to studies, weighting by sample size, and weighting by the inverse variance). Regardless of weighting, the lower bound of the 95% confidence interval on the mean summation ratio always exceeded √2, conclusively overturning a long established psychophysical finding, with implications for our understanding of nonlinearities early in the visual system.

We also performed additional analyses to explore the effect of stimulus spatiotemporal frequency, and the difference in sensitivity across the eyes, confirming our findings with new data. This work reveals an effect of stimulus speed (the ratio of temporal to spatial frequency), suggesting that neural summation varies according to stimulus properties, and meaning that there is no ‘true’ value for binocular summation, rather a range of possible values between √2 and 2. Our analysis of monocular sensitivity differences leads to a deeper understanding of how best to analyse the data of future studies.

Although the summation meta-analysis was conducted using the summation ratio as the outcome variable, it is possible to convert the aggregate values to more traditional measures of effect size. Doing this revealed an unusually large effect size (Cohen’s d=31) for detecting the presence of binocular summation, and another large effect size (Cohen’s d=3.22) when comparing to the theoretical value of √2. These very large effects mean that even studies with very few participants (N=3) have substantial power (>0.95). In many ways, this can be considered a validation of the widespread psychophysical practice of extensively testing a small number of observers using very precise methods.

Overall, meta-analysis can reveal important psychophysical effects that were previously obscured by the limitations of individual studies. This provides opportunities to reveal findings involving large aggregate sample sizes, that will inspire new experiments and research directions. The binocular summation meta analysis is now available online, published in Psychological Bulletin [DOI].


Estimating Oculus Rift pixel density


A few months ago I bought an Oculus Rift DK2. Although these are designed for VR gaming, they’re actually pretty reasonable stereo displays. They have several desirable features, particularly that the OLED display is pulsed stroboscopically each frame to reduce motion blur. However, this also means that each pixel is updated at the same time, unlike on most LCD panels, meaning they can be used for timing sensitive applications. As of a recent update they are also supported by Psychtoolbox, which we use to run the majority of experiments in the lab. Lastly, they’re reasonably cheap, at about £300.

In starting to set up an experiment using the goggles I thought to check what their effective pixel resolution was in degrees of visual angle. Because the screens are a fixed distance from the wearer’s eye, I (foolishly) assumed that this would be a widely available value. Quite a few people simply took the monocular resolution (1080 x 1200) and divided this by the nominal field of view (110° vertically), producing an estimate of about 10.9 pixels per degree. As it turns out, this is pretty much bang on, but that wasn’t necessarily the case, because the lenses produce increasing levels of geometric distortion (bowing) at more eccentric locations. This might have the effect of concentrating more pixels in the centre of the display, increasing the number of pixels per degree.

Anyway, I decided it was worth verifying these figures myself. Taking a cue from methods we use to calibrate mirror stereoscopes, here’s what I did…

First I created two calibration images, consisting of a black background, and either one central square, or two lateralised squares. All the squares were 200 pixels wide (though this isn’t crucial), and the one with two squares was generated at the native resolution of the Oculus Rift (2160×1200). Here’s how the first one looks:


And here’s how the other one, with only one square looked:


These images were created with a few lines of Matlab code:

ORw = 2160; % full width of the oculus rift in pixels
ORh = 1200; % height of the oculus rift in pixels
CSw = 1440; % height of other computer's display in pixels
CSh = 900;  % width of other computer's display in pixels
ORs = 200;  % width of the squares shown on the rift
CSs = 200;  % width of the square shown on the computer's display

a = zeros(ORh,ORw);
a((1+ORh/2-ORs/2):(ORh/2+ORs/2),(1+ORw/4-ORs/2):(ORw/4+ORs/2)) = 1;
a((1+ORh/2-ORs/2):(ORh/2+ORs/2),(1+3*ORw/4-ORs/2):(3*ORw/4+ORs/2)) = 1;

a = zeros(CSh,CSw);
a((1+CSh/2-CSs/2):(CSh/2+CSs/2),(1+CSw/2-CSs/2):(CSw/2+CSs/2)) = 1;

I then plugged in the Rift, and displayed the two-square image on it, and the one-square image on an iPad (though in principle this could be any screen, or even a printout). Viewed through the Rift, each square goes to only one eye, and the binocular percept is of a single central square.

Now comes the clever bit. The rationale behind this method is that we match the perceived size of a square shown on the Rift with one shown on the iPad. We do this by holding the goggles up to one eye, with the other eye looking at the iPad. It’s necessary to do this at a bit of an angle, so the square gets rotated to be a diamond, but we can rotate the iPad too to match the orientation. I found it pretty straightforward to get the sizes equal by moving the iPad forwards and backwards, and using the pinch-to-zoom operation.

Once the squares appeared equal in size I put the Rift down, but kept the iPad position fixed. I then measured two things: the distance from the iPad to my eye, and the width of the square on the iPad screen. The rest is just basic maths:

The iPad square was 7.5cm wide, and matched the Rift square at 24cm from the eye. At that distance an object 1cm wide subtends 2.4° of visual angle (because at 57cm, 1cm=1°). [Note, for the uninitiated, the idea of degrees of visual angle is that you imagine a circle that goes all the way around your head, parallel to your eyes. You can divide this circle into 360 degrees, and each individual degree will be about the size of a thumbnail held at arm’s length. The reason people use this unit is that it can be calculated for a display at any distance, allowing straightforward comparison of experimental conditions across labs.] That means the square is 2.4*7.5=18° wide. Because this is matched with the square on the Rift, the Rift square is also 18° wide. We know the square on the Rift is 200 pixels wide, so that means 18° = 200 pix, and 1° = 11 pixels. So, the original estimates were correct, and the pixel density at the centre of the screen is indeed 11 pixels/deg.

This is actually quite a low resolution, which isn’t surprising since the screen is close to the eye, individual pixels are easily visible, and the whole point of the Rift is to provide a wide field of view rather than a high central resolution. But it’s sufficient for some applications, and its small size makes it a much more portable stereo display than either a 3D monitor or a stereoscope. I’m also pleased I was able to independently verify other people’s resolution estimates, and have developed a neat method for checking the resolution of displays that aren’t as physically accessible as normal monitors.

Aesthetically pleasing, publication quality plots in R


I spend a lot of my time making graphs. For a long time I used a Unix package called Grace. This had several advantages, including the ability to create grids of plots very easily. However it also had plenty of limitations, and because it is GUI-based, one had to create each plot from scratch. Although I use Matlab for most data analysis, I’ve always found its plotting capabilities disappointing, so a couple of years ago I bit the bullet and started learning R, using the RStudio interface.

There are several plotting packages for R, including things like ggplot2, which can automate the creation of some plots. Provided your data are in the correct format, this can make plotting really quick, and tends to produce decent results. However, for publication purposes I usually want to have more control over the precise appearance of a graph. So, I’ve found it most useful to construct graphs using the ‘plot’ command, but customising almost every aspect of the graph. There were several things that took me a while to work out, as many functions aren’t as well documented as they could be. So I thought it would be helpful to share my efforts. Below is some code (which you can also download here) that demonstrates several useful techniques for plotting, and should create something resembling the following plot when executed.

Example plot created by the script.

Example plot created by the script.

My intention is to use this script myself as a reminder of how to do different things (at the moment I always have to search through dozens of old scripts to find the last time I did something), and copy and paste chunks of code into new scripts each time I need to make a graph. Please feel free to use parts of it yourself, to help make the world a more beautiful place!

# this script contains examples of the following:
# outputting plots as pdf and eps files
# creating plots with custom tick mark positioning
# drawing points, bars, lines, errorbars, polygons, legends and text (including symbols)
# colour ramps, transparency, random numbers and density plots

# Code to output figures as either an eps or pdf file. Note that R’s eps files appear not to cope well with transparency, whereas pdfs are fine
outputplot <- 0
if(outputplot==1){postscript(“filename.eps”, horizontal = FALSE, onefile = FALSE, paper = “special”, height = 4.5, width = 4.5)}
if(outputplot==2){pdf(“filename.pdf”, bg=”transparent”, height = 5.5, width = 5.5)}
# all the code to create the plot goes here
if(outputplot>0){}  # this line goes after you’ve finished plotting (to output the example below, move it to the bottom of the script)

# set up an empty plot with user-specified axis labels and tick marks
plotlims <- c(0,1,0,1)  # define the x and y limits of the plot (minx,maxx,miny,maxy)
ticklocsx <- (0:4)/4    # locations of tick marks on x axis
ticklocsy <- (0:5)/5    # locations of tick marks on y axis
ticklabelsx <- c(“0″,”0.25″,”0.5″,”0.75″,”1”)        # set labels for x ticks
ticklabelsy <- c(“0″,”0.2″,”0.4″,”0.6″,”0.8″,”1”)    # set labels for y ticks

par(pty=”s”)  # make axis square
plot(x=NULL,y=NULL,axes=FALSE, ann=FALSE, xlim=plotlims[1:2], ylim=plotlims[3:4])   # create an empty axis of the correct dimensions
axis(1, at=ticklocsx, tck=0.01, lab=F, lwd=2)     # plot tick marks (no labels)
axis(2, at=ticklocsy, tck=0.01, lab=F, lwd=2)
axis(3, at=ticklocsx, tck=0.01, lab=F, lwd=2)
axis(4, at=ticklocsy, tck=0.01, lab=F, lwd=2)
mtext(text = ticklabelsx, side = 1, at=ticklocsx)     # add the tick labels
mtext(text = ticklabelsy, side = 2, at=ticklocsy, line=0.2, las=1)  # the ‘line’ command moves away from the axis, the ‘las’ command rotates to vertical
box(lwd=2)      # draw a box around the graph
title(xlab=”X axis title”, col.lab=rgb(0,0,0), line=1.2, cex.lab=1.5)    # titles for axes
title(ylab=”Y axis title”, col.lab=rgb(0,0,0), line=1.5, cex.lab=1.5)

# create some synthetic data to plot as points and lines
datax <- sort(runif(10,min=0,max=1))
datay <- sort(runif(10,min=0.2,max=0.8))
SEdata <- runif(10,min=0,max=0.1)
lines(datax,datay, col=’red’, lwd=3, cex=0.5)     # draw a line connecting the points
arrows(datax,datay,x1=datax, y1=datay-SEdata, length=0.015, angle=90, lwd=2, col=’black’)  # add lower error bar
arrows(datax,datay,x1=datax, y1=datay+SEdata, length=0.015, angle=90, lwd=2, col=’black’)  # add upper error bar
points(datax,datay, pch = 21, col=’black’, bg=’cornflowerblue’, cex=1.6, lwd=3)   # draw the data points themselves

# create some more synthetic data to plot as bars
datax <- 0.1*(1:10)
datay <- runif(10,min=0,max=0.2)
SEdata <- runif(10,min=0,max=0.05)
ramp <- colorRamp(c(“indianred2”, “cornflowerblue”))  # create a ramp from one colour to another
colmatrix <- rgb(ramp(seq(0, 1, length = 10)), max = 255)   # index the ramp at ten points
barplot(datay, width=0.1, col=colmatrix, space=0, xlim=1, add=TRUE, axes=FALSE, ann=FALSE)  # add some bars to an existing plot
arrows(datax-0.05,datay,x1=datax-0.05, y1=datay-SEdata, length=0.015, angle=90, lwd=2, col=’black’)  # add lower error bar
arrows(datax-0.05,datay,x1=datax-0.05, y1=datay+SEdata, length=0.015, angle=90, lwd=2, col=’black’)  # add upper error bar

coltrans=rgb(1,0.5,0,alpha=0.3)             # create a semi-transparent colour (transparency is the alpha parameter, from 0-1)
a <- density(rnorm(100,mean=0.75,sd=0.1))   # make a density distribution from some random numbers
a$y <- 0.2*(a$y/max(a$y))                   # rescale the y values for plotting
polygon(a$x, 1-a$y, col=coltrans,border=NA) # plot upside down hanging from the top axis with our transparent colour

# create a legend that can contain lines, points, or both
legend(0, 1, c(“Lines”,”Points”,”Both”), cex=1, col=c(“darkgrey”,”black”,”black”), pt.cex=c(0,1.8,1.8),“black”,”violet”,”darkgreen”),lty=c(1,0,1), lwd=c(5,3,3), pch=21, pt.lwd=3, box.lwd=2)
# add text somewhere, featuring symbols and formatting
text(0.8,0.95,substitute(paste(italic(alpha), ” = 1″ )),cex=1.2,adj=0)

A first look at the Olimex EEG-SMT


Last week I ordered and received a small EEG device manufactured by a Bulgarian company called Olimex. Called the EEG-SMT, it is part of the OpenEEG project, and is a small USB device that looks like this:

The Olimex EEG device.

The Olimex EEG device.

It has five audio jacks for connecting custom electrodes. The ground electrode is passive, and the other four electrodes are active and comprise two bipolar channels. The system is very basic, and at around €150 (including the electrodes) is obviously not going to compete with high end multi-channel EEG rigs.  But, I’m interested in running some steady state VEP experiments that can be run with a single channel, and in principle are quite robust to lower signal to noise ratios from lower quality equipment. Given the price, I thought it was worth a shot.

Although there are several PC packages capable of reading data from the device, I ideally want to integrate EEG recording into the Matlab code I use for running experiments. So, I decided to try and directly poll the USB interface.

The first stage was to install a driver for the device. I’m using a Mac running OSX 10.8, so I went with the FDTI virtual COM port driver. I also found it useful to check the device was working with this serial port tool. The driver creates a virtual serial port, the location of which can be discovered by opening a Terminal window and entering:

    ls -l /dev/tty.*

On my machine this lists a couple of bluetooth devices, as well as the serial address of the Olimex device:


Matlab has its own tool for polling serial ports (Serial). I was able to read from the device this way, but I found it less flexible than the IOPort function that comes with Psychtoolbox 3. The rest of this post uses that function.

First we open the serial port and give it a handle:

    [h,e] = IOPort(‘OpenSerialPort’,’/dev/tty.usbserial-A9014SQP’);

Then we can set a few parameters, including the baud rate for data transmission, buffer size etc:


To start recording, we purge the buffer and then send this command.


We wait for a while, then we check how much data is waiting for us in the buffer and read it out into a vector:

    bytestoget = IOPort(‘BytesAvailable’,h)
    [longdata,when,e] = IOPort(‘Read’,h,1,bytestoget);

Finally, we stop recording, purge the buffer and close the port:


I had some trouble initially streaming data from the device. If you forget to purge the buffer it can cause your entire system (not just Matlab) to hang and restart. This is very annoying, and slows development progress.

Now that we have some data, we need to process it. The vector is a stream of bytes in packets of 17. We can separate it out like this:

    for n = 1:17
        parseddata(n,:) = longdata(n:17:end);

And plot each signal separately:

Outputs from the Olimex serial interface

Outputs from the Olimex serial interface

According to the device’s firmware, the first two plots are control lines that always output values of 165 and 90. This provides an anchor that lets us know the order of the signals. The next plot tells us the firmware version (version 2), and the fourth plot is a sample counter that increases by 1 each time the device samples the electrodes. The sampling happens at a fixed frequency of 256Hz, so 256 samples represent one second of activity. Plots 5-16 are the outputs of the electrodes (this is what we’re interested in), and I don’t really understand plot 17 yet.

Each channel gets 2 bytes (e.g. 16 bits), but only uses 10 of those bits. This means that to get the actual output, we need to combine the data from two adjacent bytes (paired by colour in the above plots). The data are in big-endian format, which means that the first byte contains the most significant bits, and the second byte the least significant. We can combine them by converting each byte to binary notation, sticking them together, and then converting back:

   for l = 1:6
    for m = 1:length(parseddata)
      trace(l,m)  = bin2dec(strcat(dec2bin(parseddata(lineID(l,1),m)),dec2bin(parseddata(lineID(l,2),m))))./1023;

We now have six ten bit signals, which we can plot as follows:

Channel outputs

Channel outputs

Although the waveforms look exciting, they aren’t very informative because most of what we’re seeing is an artefact from the ‘hum’ of AC mains electricity. We can see this if we examine the Fourier spectrum of one of our waveforms:

Example EEG fourier spectrum

Example EEG fourier spectrum

It is clear that much of the energy is concentrated at 0, and at 50Hz. We can remove these using a bandpass filter, that includes only frequencies between (approximately) 1 and 49Hz. Taking the inverse Fourier transform then gives us a more sensible waveform:

Bandpass filtered waveform

Bandpass filtered waveform

Actually though, I’m more interested in what is happening in the frequency domain. This is because I want to run experiments to measure the response of visual cortex to gratings flickering at a particular frequency. However, there are some problems to overcome first. Critically, I don’t understand how the four active electrodes on the device map onto the six channel outputs that I read over the serial connection. They all seem to produce a signal, and my initial thought was that the first four must be the outputs of individual electrodes, and the final two the differences between positive and negative electrodes for channels 1 & 2. As far as I can tell, that isn’t what’s actually happening though. I have posted on the OpenEEG mailing list, so hopefully someone with experience of using these devices will get back to me.

If anyone is interested, I have put a version of the code outlined above here (with a few extra bells and whistles). Note that it may require some modifications on your system, particularly the serial address of the device. You will also need to have Matlab (or maybe Octave), Psychtoolbox and the driver software installed. Finally, your system may hang if there are problems, and I hereby absolve myself of responsibility for any damage, loss, electrocution etc. that results in you using my code. However, I’d be very interested to hear from anyone else using one of these devices!



The previous post outlined a basic experiment for measuring sensitivity to contrast at detection threshold for a simple target. In this post, I’ll describe how detection thresholds can be affected by other stimuli, which are termed ‘masks’. You can think of a mask as ‘getting in the way’ of a target, and making it harder to detect. I’ll describe two varieties of masking (though several others exist) – when the mask is similar or identical to the target (pedestal masking) and when the mask is very different from the target (cross-channel masking).

Pedestal masking

In a standard 2AFC detection experiment, the target is shown in one temporal (or spatial) interval, but is absent in the other. This is still the case in a masking experiment, but there is also a mask, which is presented in both intervals. When the mask is spatially identical to the target (i.e. it’s the same image, but probably at a different contrast) it is known as a pedestal. The pedestal affects detection thresholds in interesting ways. For low contrast pedestals, thresholds are reduced (i.e. performance gets better). A good analogy is with height – if detection occurs when someone’s head is visible above a wall, standing them on a box (a pedestal) will make it more likely that their head pops up over the wall. Contrast detection works in a similar way, and the improvement in threshold is often referred to as facilitation. For high contrast pedestals, the task becomes contrast discrimination: the pedestal is visible in both intervals, but the target is added only in one. This is like judging the height of two huge skyscrapers – you’ll only notice the difference if one is substantially taller than the other. So, with high contrast pedestals, thresholds increase, and performance gets worse than at detection threshold (with no pedestal) – this is masking. The interaction of these two effects (masking and facilitation) produces a characteristic ‘dipper’ shaped function, as shown by the red symbols below. In the figure, the dashed horizontal line indicates detection threshold. Note that DHB (left panel) shows a clearer dip, whereas LP (right panel) shows clearer masking.

If the visual system were entirely linear, detection would be unaffected by a pedestal. So, the presence of the dipper reveals a nonlinearity of some kind in the system. Two main candidates for this nonlinearity have been proposed over the years, and there is not yet a consensus amongst researchers over which is truly responsible. It is likely that both are correct to some degree, or perhaps that they are both equally valid descriptions at different levels of analysis. The first is that there is a nonlinear transducer (e.g. Legge & Foley, 1980) or gain control, of the form:

C2.4/(1 + C2)

where C is the input contrast. This equation produces a sigmoidal (s-shaped) contrast response function (see panel B below). The dipper function is determined by the gradient of the contrast response at a given input (pedestal) contrast, because this governs how much contrast the target must add to the pedestal to produce a given increase in output. When the contrast response function is steep, thresholds are low (detection, and the dip region of the dipper). When the function is shallow, target contrast must be higher to produce the same increase in response, so thresholds increase (the handle region of the dipper).

Model dipper function and contrast response function.

The other explanation for dipper functions was proposed by Pelli (1985). This account has two parts: the first explains facilitation (the dip) and the second explains masking (the handle). Pelli proposed that observers are uncertain about exactly which internal detecting mechanism(s) will respond to the target. Their strategy is to monitor many (linear) mechanisms, and select the most responsive. Because the mechanisms are noisy, when the target contrast is low observers will often select the wrong one. However, the pedestal raises the activity of the correct detecting mechanism above the background noise level (right panel below). This improves thresholds, producing the facilitation effect, as shown by the red curve in the left panel below.

In this scheme, masking occurs because each detecting mechanism is noisy, with the amount of noise being proportional to the activity in the channel (called signal-dependent, or multiplicative noise). So, for a high pedestal contrast, the mechanism will be more noisy, meaning more target contrast is required to overcome the noise. This produces masking, as shown by the blue curve in the left panel below. The dipper function (green) comes from the combination of uncertainty and multiplicative noise.

Cross-channel masking

Example of a vertical mask (left) and mask + horizontal target (right)

When the mask is very different from the target, they will activate different detecting mechanisms. This means that the within-channel processes which produce dipper functions do not occur. But masking still happens, for example when the mask is orthogonal (at 90 degrees) to the target (see above stimulus example, and green data points in the top graphs). The most common explanation for this masking is that mechanisms sensitive to different stimuli inhibit each other. This inhibition can be modeled as a divisive process as part of a gain control equation (Heeger, 1992),

C2.4/(1 + C2 + wX)

where X refers to activity in mechanisms other than that which responds to the target, and w is a weight which determines the level of suppression. Note that this form of masking is very different from masking by a pedestal, and does not typically include facilitation.


Masking experiments are important, because they reveal nonlinear properties of the visual system. Pedestal masking tells us about the gradient of the contrast response function, whereas cross-channel masking tells us about interactions between different detecting mechanisms (or channels). Masking also occurs in other sensory domains, such as hearing or touch, and along other visual dimensions, like spatial frequency (size) or speed discrimination.


Heeger, D.J. (1992). Normalization of cell responses in cat striate cortex. Vis Neurosci, 9, 181-197.

Legge, G.E. & Foley, J.M. (1980). Contrast masking in human vision. J Opt Soc Am, 70, 1458-1471.

Pelli, D.G. (1985). Uncertainty explains many aspects of visual contrast detection and discrimination. J Opt Soc Am A, 2, 1508-1532.

Visual psychophysics for beginners


This post will outline some of the basic methods vision scientists use in experiments, in plain everyday language. I’m anticipating that future posts will refer back to this as a sort of glossary, so pay attention, this will be on the test…

Aims of vision research

Although it seems effortless, vision is pretty complicated. Your brain processes a huge amount of information very, very quickly to allow you to interpret the world. We’re years away from building a computer that can do the same job anywhere near as fast or as well. So, the aim of vision science is to understand the processing (or algorithms) that the brain uses to simplify and interpret the input from the eye. If we can understand this, we’ll know a lot more about how the brain works in general, and also be able to reproduce its functions in a computer program or even a robot. There are many potential applications for an accurate computer model of human vision, in areas such as advertising, image and movie compression, airport security, photography – even building artificial eyes for the blind.

Psychophysical methods: overview

Some vision research uses direct methods, such as single cell recording (usually in animals) or imaging (e.g. fMRI). However, a less invasive (and expensive!) method is to present observers with visual stimuli (pictures or movies) and ask them questions about them. These questions are not typically open-ended (how does this picture make you feel . . .) but are instead very simple. For example, we might show an observer two images of some stripes, and ask which has the highest contrast (i.e. the biggest difference in brightness between the dark and light regions). In the example below, it’s clear that the stimulus on the left is higher in contrast, though in real experiments the judgement might be much more difficult. Such simple questions have the advantage that they take very little time to respond to, and responses can often be given via a mouse or computer keyboard and repeated many times (often many thousands of times).

Example visual stimuli of different contrasts. The image on the left is higher in contrast than the one on the right.

Using these responses, we can gain insight into how the brain is working. Sometimes, an experiment might be designed to distinguish between two alternative theories. Other experiments might report on a surprising new finding, and still others collect data to inform the construction of computer models of the visual system. By systematically varying the stimuli in a controlled manner, we can find out about the limitations on performance (i.e. how good subjects are at doing a task) as well as their subjective experience of perception. This works for all sorts of stimulus dimensions: contrast, luminance, colour, motion, depth, size, tilt and many others. Psychophysical methods are used for other senses too – hearing and touch in particular.

A simple detection experiment

A fundamental question we can ask is how intense a target stimulus must be before we can see it. This intensity could be along lots of possible dimensions (luminance, colour, motion etc.), but the example here will be for contrast. So, how high must the contrast of a stimulus be before it can be reliably detected? We can find this out by presenting the target at a range of contrasts, and seeing how accurate an observer’s performance is at each contrast level. When the contrast is high, they should get it right all the time, as the target will be clearly visible. When the contrast is very low (definitely invisible) the observer will be guessing, so performance will be at chance levels. Somewhere in between these two extremes we should be able to find the contrast level where the target is just able to be detected.

One possible method is to show observer a single presentation, and ask whether or not they saw the target (and repeat many times for different contrast levels). This is called a yes/no task (for obvious reasons) and under some circumstances it is an OK method to use. However, there are a few technical reasons why it might not be the best choice of task. Instead, a technique called two-alternative forced choice (2AFC) is often better. This is very similar, except that there are two intervals (indicated by beeps), one of which contains the target, and the other contains a blank screen. The observer says which interval they thought contained the target (actually, this can also be done by using different areas of the screen, like left and right sides, instead of different temporal intervals), and the computer records their response. We repeat this for many many trials, over a range of contrasts, and calculate on what percentage of trials the observer was correct at each contrast level.

The psychometric function

The graph above shows some example data (circles) from an experiment like the one described. The y-axis tells us what percentage of trials were answered correctly at each contrast level (given on the x-axis in logarithmic (dB) units). The function the data describe is called a psychometric function. Because there are two intervals in our task, even when the target is invisible and the observer is guessing they will still be right half (50%) of the time. You can see that at the left hand side of the graph, where the contrast is low, the data points cluster around 50% correct. At the other extreme, when contrast is high, the observer is right all of the time – on 100% of the trials. The intermediate contrasts are the interesting ones, as here performance is somewhere between chance and perfect.

We usually decide to call a specific level of performance the ‘threshold’. A good level (for 2AFC) is 75% correct, as it is half way between chance and perfect performance – it’s when the observer can just see the target. You might notice that there is a data point very close to 75% correct, at about 6dB of contrast. If we didn’t care about details, this would be a good approximation of threshold. However, sometimes we want to be a bit more exact, so we fit a curve to the data points (using a computer program) and find out where the curve passes through 75% correct. For this example, it’s just under 6dB – this is our threshold.

Thresholds for different targets

Measuring just one threshold on its own isn’t really very interesting. But if we vary something about the target we can see how performance gets better or worse by measuring lots of thresholds. A classic example is the contrast sensitivity function (CSF). This measures thresholds (or sensitivity, which is 1/threshold) for grating stimuli (like the ones above) at a range of bar sizes (spatial frequencies). Low spatial frequencies are big, with wide bars, whereas high spatial frequencies are small, with very narrow bars. The graph below plots sensitivity as a function of spatial frequency. You can see there is a peak between 1 and 4 c/deg – these are the frequencies at which we are most sensitive. On either side of this, sensitivity falls off, meaning we need more contrast to reach threshold. The contrast sensitivity function is our window of visibility on the world – stimuli within the window are visible, those outside it are not. It is often used in clinical research to understand the source of a visual problem, or the limitations it causes for a patient.