Do Grayscale Images Take Less Space?

Published on 30 June 2024

TL;DR

Grayscale images usually take less space than those represented in the more common sRGB space. However, subsampling and optimizations applied on chroma (colors) channels can be so aggressive (without visual impact) that a 4:2:0-subsampled image is sometimes pretty close to a grayscale image in terms of size.

Grayscale and sRGB

Some formats like JPEG/JFIF allow saving images represented in a grayscale space, i.e., with each pixel encoded as a single 8-bit value from 0 (black, full dark) to 255 (white, full bright). For color images, common formats like JPEG generally represent images in the sRGB space, i.e., with each pixel encoded as a 3×8-bit triple (24 bits per pixel total).
(we assume we are not talking about HDR images here.)

Question: Thus, can we save space using grayscale (8 bpp) instead of sRGB (24 bpp)? Let's see with a B&W image. (original output is B&W.)

sRGB: 183 kB sRGB image

Grayscale: 124 kB Grayscale image

We do save space with grayscale, but I was expecting something more spectacular. How to explain such a little gain?

The Y'C_bC_r Space

In fact, sRGB is not how internal pixels are formatted. It's how the raw input (to be encoded) and the raw output (decoded) must be represented.

JPEG/JFIF (and other common formats) doesn't process pixels encoded as RGB triples directly. Instead, it processes Y'C_bC_r triples, where:

Y', the luma channel: the "brightness", from black (0) to white (255)
C_b and C_r, the two chroma channels: the "colors"

There is a formula to convert an RGB triple to a Y'C_bC_r triple (and to do the reverse operation as well). That's what a JPEG encoder does first (and a JPEG decoder does the reverse operation). We can think of Y' as the axis of the RGB triples (r,g,b) such that r=g=b. The other two chroma channels encode the rest of the RGB triples.

Separating the luma channel from the chroma channels is useful because it allows two optimizations leading to substantial gains:

For color images, applying aggressive optimizations on the two chroma channels without visual degradation
Keeping only luma for B&W images, instead of having RGB triples where r=g=b (redundant)

Subsampling

Human eyes are great at perceiving subtle variations in brightness. They are not as good for color variation.

Only Y'

C_b + C_r

Only C_b

Only C_r

Without the two other chroma channels, the luma-only (Y') image looks fine, just like a B&W picture—because it's a regular B&W picture! But the chroma-only (C_b + C_r) images look weird. Without the luma component, we only see flat and ill-defined shapes.

That's how our eyes work, they are more sensitive to brightness variation and less sensitive to color variation. Getting rid of too much brightness information might be noticeable, while we have a greater tolerance for color information. This property is exploited to reduce image file size without unacceptable visual degradation thanks to a simple, yet effective technique: subsampling.

4:4:4 - 150 kB 444

4:2:2 - 107 kB 422

4:2:0 - 82 kB 420

4:0:0 / Grayscale - 50 kB 400 / Grayscale

Delivering 4:4:4 or 4:2:2-subsampled images on the Web is not something we see much, for obvious reasons. The vast majority of distributed images are 4:2:0-subsampled. The WebP format designed for the Web does not even allow anything else, for instance.

From reference JPEG encoder (cjpeg) manual:

since the human eye is more sensitive to spatial changes in brightness than spatial changes in color, the chrominance components can be quantized more than the luminance components without incurring any visible image quality loss.

As you see, subsampling is a determinant step regarding the final image size. 420 is so stripped that the only-Y (400) is not going to make a great difference.

That is, a grayscale image encoded in JPEG contains the only channel that left not much room for aggressive optimization. It does not differ much from a color version subsampled to 4:2:0 and then optimized for details we cannot perceive.

Code

I used OpenCV with Python to split my picture into the three channels.

import cv2

img = cv2.readim("/tmp/pigeon.jpg")

# OpenCV represents image as BGR (not RGB) triples.
# We convert BGR to YCrCb and then split the three channels.
y, cr, cb = cv2.split(cv2.cvtColor(img, cv2.COLOR_BGR2YCrCb))

# Write the result to image files to see the result
cv2.imwrite("/tmp/pigeon-y.jpg", y)
cv2.imwrite("/tmp/pigeon-cb.jpg", cb)
cv2.imwrite("/tmp/pigeon-cr.jpg", cr)

# Combined Cb+Cr version
gray = y.copy()
gray.fill(128) # set constant luma, midgray
cbcr = cv2.merge([gray, cb, cr])
cv2.imwrite("/tmp/pigeon-cbcr.jpg", cbcr)

Re-encoding and subsampling JPEG images:

# 4:4:4
djpeg -scale 1/12 pigeon.jpg |
  cjpeg -quality 75 -optimize -rgb -sample 1x1 > pigeon-444.jpg

# 4:2:2
djpeg -scale 1/12 pigeon.jpg |
  cjpeg -quality 75 -optimize -rgb -sample 2x1 > pigeon-422.jpg

# 4:2:0
djpeg -scale 1/12 pigeon.jpg | 
  cjpeg -quality 75 -optimize -rgb -sample 2x2 > pigeon-420.jpg

About pictures

Acros film on digital Acros simulation, 2023.
Fujifilm X-S10, Fujinon XF35mm F2 R WR (35mm) - 3200 ISO - 1/10s - f4
Pigeon, La Villette (Paris), 2023.
Fujifilm X-S10, Ге́лиос 44-2 (58mm) - 160 ISO - 1/640s