Continuing the announced series of my original manuscripts for 3D World magazine.
Thinking of images as data containers.
Although those raster image files filling our computers and lives are most commonly used to represent pictures (surprisingly), I find it useful for a CG artist to have yet another perspective – a geekier one. And from that perspective a raster image is essentially a set of data organized into a particular structure, to be more specific — a table filled with numbers (a matrix, mathematically speaking).
The number in each table cell can be used to represent a color, and this is how the cell becomes a pixel (stands for “picture element”). Many ways exist to encode colors numerically. For instance (probably the most straightforward one) to explicitly define a number-to-color correspondence for each value (i.e. 3 stands for dark red, 17 for pale green and so on). Such method was frequently used in the older formats like GIF as it allows for certain size benefits at the expense of a limited palette.
Another way (the most common one) is to use a continuous range from 0 to 1 (not 255!), where 0 stands for black, 1 for white, and numbers in between denote the shades of gray of the corresponding lightness. (A 0-255 range of integers is only an 8-bit representation of zero-to-one, popularized by certain software products and harmfully misleading in understanding many concepts such as color math or blending modes.) This way we get a logical and elegantly organized way of representing a monochrome image with a raster file. The term “monochrome” happens to be a more appropriate than “black-and-white” since the same data set can be used to depict gradations from black to any other color depending on the output device – like many old monitors were rather black&green than black&white.
This system, however, can be easily extended to the full-color case with a simple solution – each table cell can contain several numbers, and again there are multiple ways of describing the color with few (usually three) numbers each in 0-1 range. In RGB model they stand for the amounts of Red, Green and Blue light, in HSV - for hue, saturation and brightness accordingly. But most importantly – those are still nothing but numbers which encode a particular meaning, but don't have to be interpreted that way.
Now to the “why it is not a square” part. Because the table, which a raster image is, tells us how many elements are in each row and column, in which order they are placed, but nothing about what shape or even proportion they are. We can form an image from the data in a file by various means, not necessarily with a monitor, which is only one option for an output device. For example, if we would take our image file and distribute pebbles of sizes proportional to pixel values on some surface – we shall still form essentially the same image.
Computer monitor is only one out of many possible
ways to visualize the raster image data.
And even if we'd take only half of the columns, but instruct ourselves to use the stones twice wider for the distribution – the result would still show principally the same picture with the correct proportions, only lacking half of the horizontal details. “Instruct” is the key word here. This instruction is called “pixel aspect ratio”, which describes the difference between the image's resolution (number of rows and columns) and proportions. It allows to store frames stretched or compressed horizontally and is used in certain video and film formats.
In this example of an image stored with
the pixel aspect ratio of 2.0, representing pixels
as squares results in erroneous proportions (top).
Correct representation needs to rely
on the stretched elements like below.
Since we started on resolution – it shows the maximum amount of detail which an image can hold, but says nothing about how much does it actually hold. A badly focused photograph won't get improved no matter how many pixels the camera sensor has. Same way upscaling a digital image in Photoshop or any other editor will increase the resolution without adding no detail or quality to it – the extra rows and columns would be just filled with interpolated (averaged) values of originally neighboring pixels.
In a similar fashion a PPI (pixels per inch) parameter (commonly mistakenly called DPI – dots per inch) is only an instruction establishing the correspondence between the image file's resolution and the output's physical dimensions. Thus it is pretty much meaningless on its own, without either of those two.
Returning to the numbers stored in each pixel, of course they can be any, including so called out-of range (values above 1 and negative). And of course there can be more than 3 numbers stored in each cell. These features are limited only by the particular file format definition and are widely utilized in OpenEXR to name one.
The great aspect of storing several numbers in each pixel is their independence. Obviously, each of them can be studied and manipulated individually as a monochrome image called Channel – a sub-raster if you want. Additional channels to the usual color-describing Red, Green and Blue can carry all kinds of information. The default fourth channel is Alpha which encodes opacity (0 denotes a transparent pixel, 1 stands for completely opaque). ZDepth, Normals, Velocity (Motion Vectors), World Position, Ambient Occlusion, IDs and anything else you could think of can be stored in either additional or the main RGB channels – it is only data and the way to store it. Every time you render something out, you decide which data to include and where to place it. Same way you decide in compositing how to manipulate the data you possess to achieve the result pursued.
This is the numerical way of image-thinking, and I would like to wrap this article up with few examples of where it comes beneficial.
We've just mentioned understanding and using render passes, but this aside, it is pretty much all of the compositing that requires this perspective. The basic color-corrections for example are nothing but elementary math operations on pixel values, and seeing through it is quite essential for productive work. Furthermore, math operations like addition, subtraction or multiplication can be performed on pixel values, and with data like Normals and Position many 3D shading tools can be mimicked in 2D.
The described perspective is also how programmers see image files, thus especially in game industry it can help artists achieve a better mutual understanding with developers, resulting in better custom tools and cutting corners with various tricks like using textures for non-image data.
And of course the visual effects and motion design. Texture maps controlling properties of a particles emission, RGB displacements forming 3D shapes, encoding multiple passes within RGBA with custom shaders, and on, and on... All these techniques become much more transparent after you start seeing the digits behind the pixels, which is essentially what a pixel is – a number in its place.