Fashionable Watches

Capturing Digital Images (The Bayer Filter) – Computerphile

Capturing Digital Images (The Bayer Filter) – Computerphile

We talked a little bit about how images, particularly RGB Images are stored in memory, but one interesting question is how do we obtain those images to begin with. Obviously, we used to use Photographic film. Now we’ve got a huge amount of consumer cameras on every device that we have, and they almost all use the same technique to obtain their RGB images. All the cameras that we own will have some kind of CCD or some kind of CMOS sensor on [it], which is essentially a photosensitive layer which will tell the camera how much light has hit a certain position. And that will be arranged in a grid form so that each position represents a pixel in our image. And so from the top we might have something like this. We have some CCD or CMOS elements and then light from our scene is going to come in like this. Now if we just leave it at that, then we’re going to get a grayscale image out, because there’s no way of knowing what proportion of this light is red and what proportion is blue and what proportion is green. Because that’s not how these sensors work. So what we instead do is we put a sort of filter over each of these but [it] filters a different colour. So this one will filter red, this one will filter green, and this one will filter blue. And then if we do that over a whole image we can start to recompute our actual pixel values and we can work out what colour we were actually supposed to be looking at. Sean: That filter in the camera – it’s a physical thing, right? Mike: Yes, it’s a physical set of small elements that intercept certain wavelengths. It’s like a pair of those 3D glasses that you use where one side’s red and one side’s blue. But you’ve also got green ones, and you’ve got them in a grid arrangement in front of your camera’s eye. Sean: If I’ve bought a 10 Megapixel camera, does that mean only three of the Megapixels are doing green, and three of them are different? Mike: It does. So different camera manufacturers may have different ways of doing this, but in general what they do is they split the amount of Megapixels that they’ve got available on their sensor into green, red, and blue as appropriate, and then they interpolate the values that they’re missing. The Technique used for this is called the Bayer Filter. There are other filters but the Bayer Filter is by far the most common. So, from the top, your CCD sensor will look a little bit like this. So each of these represents our photosensitive element and a part of our filter. So we start off with green and then blue Each is a group of four. So green, blue, and then a green in this corner and a red in this corner. So you can immediately see already that there’s two greens for every blue and red. And that’s because our eyes are more sensitive to green than they are to blue and red, and we also distinguish Luminance, i.e. Brightness with much more intensity – sort of in the green channel. So if you have an image that’s captured using two green elements rather than, say, two blue elements, it will look sharper to us. And of course, this is all about how it looks to us. So this pattern is repeated, but the problem here is that you’ve got, say, 10 Megapixels of this available, but you’ve only captured half of them as green and the other half as either blue or red. So the amount of red you’ve got is not 10 Megapixels. But they exploit a nice quality of our eyes, which is that we don’t really see colour that well. We see it okay, but we see Grayscale and Luminance much much better. So if we can use the green, and to an extent the red and the blue to create a nice, sharp Luminance the fact that the colour’s a little bit less high-resolution won’t matter to us, and it’ll still look nice and sharp in the image. So all we need to do is to look by the nearby pixels that have the colour we’re looking for and interpolate that value. So in this case, we don’t have a green value here, but we know what this green value is, and we know what this green value is. So on a very simple level we could just pick a green value which was halfway between the two, and assume that there’s nothing complicated going on and it’s a nice clean slope. And it’s the same for blue and the same for red. The process of turning a CCD or CMOS image that’s been used with a Bayer Filter into an RGB image where red, green and blue appear at every pixel is called Demosaicing. So this is a mosaic, and we’ll say we’ve got some samples of green, some samples of blue, and some samples of red And we want all the samples of green and blue and red. And we’re going to make some assumptions about what happens in the image. So, we’re going to make the assumption that nothing particularly complex is going on at the moment between these two pixels because they’re very close together, and so this green is probably halfway between these ones, and this red here in this pixel is probably halfway between these two red ones. And you’ve also got other red ones nearby that you could use. Now modern consumer cameras will do more complicated demosaicing, and in fact if you shoot in the Raw format, you can control the demosaicing algorithms in some of these software packages yourself. It will literally be the raw output of the sensor, including any weird colour effects based on the fact that you’ve got a Bayer Filter in front of your sensor. So you can do more complicated demosaicing algorithms. So if we’re trying to capture our blue channel and we’ve got a value of 200, and a value of 200, and a value of 200 in our neighbouring pixels and we don’t know what this one is, and we’ve got a value of 50 here. We could assume that it’s somewhere averaged between these four values, but we could also assume that perhaps this represents an edge, and this should be 200, because there’s a lot of consensus in this direction that we’ve got an edge. So more complicated demosaicing algorithms will try and preserve edge detail, Which is something you will classically lose in a normal demosaicing approach. It will go a little bit fuzzy, and it may not matter because you’ve got, let’s say, 16 or 20 Megapixels at your disposal and, this is when you zoom right in that you’re going to see these kinds of problems. But for people who are really interested in image quality, they spend a lot of time looking into this. The downside of the Bayer filter approach, or any filter that you’re putting in front of your your camera is if you get decreased Chrominance resolutions. The Chrominance is what we call our red and blue channels, Luminance is green, generally speaking. Although obviously they all represent colours. Some types of images like with fast, repeating stripy patterns will look extremely bad after you try and sort of apply a demosaicing algorithm that hasn’t been tailored to that. And that’s just because we’re making assumptions about the smoothness between nearby blue pixels and they don’t hold – those assumptions don’t hold for certain types of images. So that’s a sort of way of taking videos You might find that certain textures look particularly bad, and it’s these kinds of things that are causing that problem. We’ve got a lot of investment in 8-bit code. How can, we exploit that investment whilst getting into the 16-bit market? And so what we had sketched on the table, it was effectively a dual processor system.

Reader Comments

  1. It might be a good idea to explain the antialiasing filter that tries to prevent demosaicing artifacts by blurring the lines between the color pixels. 

  2. Sounds like we need a camera that detects the exact color for every pixel instead of creating the color by mixing nearby rgb values.
    I wonder if that's possible. A sensor that detects the exact color frequency hitting its sensor. Not rellying on using rgb filters to determine the rgb values.

  3. No, a camera with x megapixels isn't x/3 for red, x/3 for blue, x/3 for green… Every pixel is composed of an amount of g/r/b…

    So he's kind of contradicting himself since when the interviewer asks him he says yes, but then explains it differently.

  4. Huh.  Always thought cameras captured the whole of the image in front of them, but it seems it's a game of knowing how human eyes work and how that can be applied to make image capturing easier.

    Does that mean that other species might see pictures far different from how we do?

  5. Why are raw images so huge in file size? It seems like there is less data than in a demosaiced image because each and every pixel has three values, not just the one in the raw.

  6. So are we interpolating pixels just for the sake of cramming more of them in? Couldn't we just used less pixels and stores only the ones whose values we truly know?

  7. I think you should have shown examples of the demosaicing process. This is a purely technical theoretical explanation. I knew and understood that explanation for years before I first saw an actual visual demonstration of it. That visual completely changed my understanding of it, because just the technical story doesn't really explain it all that well.

    Wikipedia has a very nice demonstrative visual.

  8. I always thought each of these r/g/b sensors where grouped to  1 pixel which is independent from the other pixels and the brightness get's averaged out from these grouped sensors…

  9. Some higher end manufacturers used to be a bit more honest about the actual interpolation breakdown in their technical spec documents like hasselblad and phase one but for some reason I can't find that data anymore – I've spent a lot of time googling to put some more data into my comment but have to go from memory so please excuse that.

    Most 35mm format digital cameras are in fact as many people ask/suggest offering about 1/3 of the reported resolution due to interpolation and then also make it worse by using AA filters to get rid of interpolation artifacts. So a 30 MP camera is closer to 10MP as far as the physical sensors capability is concerned regarding the colour with the least amount of pixels dedicated to it and you still have the AA edge issues which are a lot more noticeable than the marketing would make you believe – it's not just fine patterns that are affected, those are just the extreme examples – all edge quality is visibly affected if you don't downsample.

    If you downsample a 36MP nikon d800 shot 2-3x then you can correct the edge detail and counter the quality lost due to interpolation, especially with the excellent downsampling that you can do nowadays (choose correct technique depending on image content etc.) and arrive at a resolution of around 16MP or what was the standard for MF digital backs many years ago.

    The problem is that even the most expensive MF digital backs do not report interpolation figures as honestly as they used to and when judging progress of digital sensor technology we often look at the MF market, at the expensive solutions!
    I have no idea how many physical "pixels" are on the sensor of say an PhaseOne IQ260 which talks of 60MP resolution, by old high quality standards for interpolated MF backs 60MP reported would suggest there are more than 60MP "pixels" in total on the sensor, Maybe 30MP for the colour with the least amount of pixels but definitely not 1/3 but lack of data might suggest that it is in fact the same breakdown as in small format full frame stuff from Canon and Nikon.

    It is the standard that small frame digital bodies are interpolated and the resolution is over reported and almost all have AA filters but that is not what we expect of medium format backs. Most of these campaigns against film photography are based on figures and opinions stemming from use of the medium format stuff not small frame.

    So not to start a debate between medium format and small frame – what I am trying to say is that we are all getting shafted nowadays with this ridiculous marketing, especially against film and the data to tell how much shafted we actually are is no longer made available.

    These interpolation techniques which are not an issue with say 3k dollars for a nikon body are an issue when we are talking about a 50k dollar medium format back, which is at the end of the day just a big sensor you add to a system you already own and paid for.
    For those of you who never used MF, my feelings translate like this – it's as if you would have to buy a 3k dollar sensor that you would put in your analog Nikon F and it would offer maybe 2x time the resolution of an iphone camera interpolated up 3 times – you would be very unhappy, especially given that film is being destroyed by this kind of marketing.

    You can take amazing digital photos with just a pinhole attached to an old digital camera – the point is that the marketing used to be a bit more honest about the resolution. The old expensive MF backs offered maybe 4-12 MP but that was without interpolation or AA and what is left nowadays is still thankfully lack of AA but the hidden interpolation details and lack of reporting in regards to how the sensor is physically divided between the colours is all about hiding the true progress of technology.

    Unlike many seemingly commenting on the web about too many MP, I do actually feel there is not enough MP – I want more. I want about 120MP on a full frame MF sensor if it has to be interpolated (that is at least 56mmx56m to cover the film area in my system) and I don't want less than 80MP per colour.
    This kind of digital back is maybe 15 years in the future.

    An interpolated file without the AA filter where you get say 16MP at least for the colour chosen to be least important resolution wise is of-course going to surpass a non interpolated file where you had 16MP for each colour because the interpolation techniques and filtering is very good nowadays – it's just that we are not in the high MP numbers as we think we are because through the years the reporting and measuring changes.

    This might be irrelevant to most but to those of us who are attacked all the time about our film being outdated have to deal with all kinds of crazy arguments. One person told me my slow speed MF film is at most 50MP, well I disagree but due to interpolation even if my film was 50MP he won't be able to find a MF back that actually approaches that save for maybe the newest 80MP offering from Phase One.

    The same problem is with scanners for film and general reproduction setups – they also use interpolation and are even more ridiculous in their marketing. 

    Arguing that interpolating something to large sizes is as good as natively reproducing it at those sizes requires tinfoil hats – it's like saying on the other side of that argument, that a slide enlarged to cover a whole building magically increased in resolution – we might perceive more resolution and fantastic detail, the colours might be vibrant with that super expensive projector and so on and so forth but it's still not a ground to argue against a competing technology.

  10.  @Computerphile  Nice to know how all of this works, thanks for the post. I have a question about these filters: how possible would it be to have a sensor with a specific pattern of R, G, and B filters that shift 3 times during the taking of a picture and each time capturing the specific amount of R, G, B for each separate pixel in the sensor. I understand this would have to be a very precise movement of the filters relative to the sensor, and that this would either work for still images or for very high shutter speeds; but would it work at all? I thought about this because I was thinking about HDR pictures and how, when using this setting, cameras take multiple pictures in different brightness settings and make a composite one based on what the software thinks is the best HDR picture. So thinking along this same line, with te shifting filters you record separate R,G, and B value based on three different pictures and make a composite RGB value. 

  11. I'm just getting into photography and astrophotography so this is of extra interest to me now (on top of the generally interesting topics that you tend to cover). Thanks! 

  12. Watching this made me try to think of how to do that process (well, in a way) on each captured pixel. Granted it would be more expensive, but could there be a mini prism that "stretches" the light like a prism and then use that data? I'm guessing no because of the size of the photon packet, but it sounded good in my head.

  13. will you guys please explain why some video cameras record moving bars of light and dark when filming computer screens?

  14. Humm, I always though the actual sensors raw data are the frequency and intensity of light that hit the sensor. Seems I was wrong.

  15. How about RGBC sensors? (C = Clear, not sure if it's the same as White as in RGBW)
    They're rare, but I know some things use them, like for example some Motorola phones.

    Great video btw. Very interesting topic.

  16. Just to nitpick #2:16, we don't distinguish luminance with much more intensity in the green channel, as lumen refers to how bright the human eye will perceive light; 100 lumen green, blue or red would look identically intense as is the purpose.

  17. Preamble: I am really not trying to be sycophantic.  But this is just another truly wonderful example of an Entertaining, thoughtful video whereby I really learnt something useful.  Kudos++ to Computerphile and Numberphile and the University of Nottingham.   Only one mystery remains: where do they get that line-printer paper from!

  18. In film making the debate between film v digital is still very much alive and well.  My Super 8 camera can give professional cinematic images. – The way the film is made and processed has improve dramatically over years, however, to gain that high widescreen quality is expensive.  The company Pro-8 is famous worldwide for it's Super 8 processing (and all other film formats). Film is still largely used by Hollywood. – Most directors are anti digital. TV networks often still use film. – PBS  has a strong relationship with Pro-8.

  19. Excellent video!! Please bring Mike back for more, He is a very articulate presenter and I would love to see more of him.

  20. You know, when I was doing research into how they made color movies before color film was developed I found out that they would split the light beam into it's component colors and they would have 2 or 3 rolls of film all going through the camera at once, taking black and whit film.
    They would then tint the film with it's respective color and then put them together. Some times gluing them, some times using a processes to make a kind of stamp. It's pretty interesting.

    I'm sure if you wanted to get a more accurate picture you could do something similar and have a beam slitter with 3 separate sensors.

  21. Rough idea: Build a camera with a semi-transparent mirror which transmits 67% of the light into a plain B&W sensor. The other 33% gets reflected into another sensor with a checkboard-like filter of just blue and red pixels. A chip could guess the green channel and you got the best of two worlds: great looking edges without texture aberrations and just two cheap sensors (the blue/red filter and the B&W sensor  are more simple than the bayer R/G/B filter) not three sensors. Because it would be a YCbCr camera, the YCbCr 4:2:2 mp4 compression process would be easier as well.

  22. Had an assignment last year to implement a Bayer filter on an FPGA which I failed because my professor couldn't explain what the filter was doing in terms I could understand – he insisted on trying to explain everything in terms of matrices: You've just explained it with perfect clarity in about 5 minutes.
    [Tears out hair] University sucks, Computerphile & YouTube FTW.

  23. Aren't there 3 subpixels for each pixel? This guy says there are actually 3 megapixels in a 10 megapixels sensor, but it seems more plausible to me that there are actually 30 megasubpixels and therefore 10 megapixels, can someone clarify?

  24. You should include more simplified diagrams & visuals for better learning. Its a huge topic to cover in a 6 minute video but I think it could've been clearer. Thanks for such informative videos & keep up the good work. (y)

  25. Interesting.
    But what happens if for example a yellow color photon hits one of these filters? Will it get through or get discarded? 
    Is there a range of frequency these R,G & B filters let through and if so does these ranges of frequencies cover the whole spectrum of visible light?
    So many questions, so little time. 🙂 

  26. is that the reason why sometimes when taking a picture of a fine mesh or looking at it on a screen can cause weird colour artifacts ?

  27. So our eyes don't see color that well… Does this mean that pictures taken with our cameras would look quite strange to animals that see color better?

  28. 8640000 bits per color in a 1 mega pixel image, tell me if i am wrong but 8 bits per pixel one mega pixel is 1200 x 900, so 8x(1200×900) should be 8640000

  29. 1. Why not CMYK? Would fit the 4 pixels perfectly, and CMYK is used for colour kind work no?

    2. Insensitive to colour? Human eye will still see if you used rec.2020, if not, why is 2020 needed to begin with?

  30. The problem of figuring out what goes in the gap of the color sensors seems like something that a neural network would work well to solve. Train a neural network to recognize patterns like edges, curves, and grain, by using incomplete color information. And maybe try to shape it's network structure so that it corroborates it's guess for one color of sensor with its given data from the other available sensor colors, rather than just a single type. I bet you could get an astounding quality image using this processing technique. (Come to think of it, I think that "RAW" image data may actually be the data from the individual color sensors, before the camera tries to squash them into normal pixels. So the neural network could be used well after taking the picture, if your camera saves the raw data.)

  31. Well, this says nothing about the sensor size in relation to the lens, which is also important to actually see anything, or even to know the optimum amount of sensors.

  32. Verify your electrical circuits on the go! Look up: 'Circuit Solver' by Phasor Systems on Google Play.

  33. I wonder if anyone has made a mosaic pattern that's not a grid of squares. If you did triangles you could get equal proportions of red green and blue.

  34. One thing that's great about this aproach on learning this….is that, after you understand it on this "higher" level it's much more easy to get into the algorithm implementation part and the mahematical part. You get into it with a purpose, with a visual idea of your goal….and spend much more time in the abstract world without losing focus (pun intended).

Leave a Reply

Your email address will not be published. Required fields are marked *