Sunday, December 28, 2014

Bit Depth - color precision in raster images

Bit depth diagram

Last time we have been talking about encoding color information in pixels with numbers from a zero-to-one range, where 0 stands for black, 1 for white and numbers in between represent corresponding shades of gray. (RGB model uses 3 numbers like that for storing the brightness of each Red, Green and Blue components and representing a wide range of colors through mixing them). This time let's address the precision of such a representation, which is defined by a number of bits dedicated in a particular file format to describing that 0-1 range, or a bit-depth of a raster image.

Bits are the most basic units of storing information. Each can take only two values, which can be thought of as 0 or 1, off or on, absence or presence of a signal, or black or white in our case. Therefore using a 1-bit per pixel (1-bit image) would give us a picture consisting only of black and white elements with no shades of gray.*

*Of course the two values can be interpreted as anything (for instance you can encode two different arbitrary colors with them like brown and violet, but only two of them – with no gradations in between), and for the most common purpose (which is representing a 0 to 1 gradation), 1 bit means black or white and higher bit depths serve increasing the amount of possible gray sub-steps.

But the great thing about bits is that when you group them together you get times more than the simple sum of the individuals, as each new bit does not add 2 more values to the group, but instead doubles the amount of available unique combinations. It means that if we use 3 bits to describe each pixel value, we'd get not 6 (=2*3) but 8 (=2^3) possible combinations. 5 bits can produce 32, and 8 bits grouped together result in 256 different numbers.
Possible values represented by 1 and 3 bits
Although each bit can represent only 2 values, 
even 3 of them grouped together would already 
result in 8 possible combinations.

That group of 8 bits is typically called a byte, which is another standard unit computers use to store data. This makes it convenient (although not necessary) to assign the whole bytes to describe a color of a pixel, and it is one byte which is most commonly used per channel. This is true for the majority of digital images existing today, giving us the precision of 256 gradations from black to white possible (in either a monochrome picture or each Red, Green or Blue channel for RGB) and is what called an 8-bit image in computer graphics, where the bit-depth is traditionally measured per color component. In consumer electronics the same 8-bit RGB image would be called a 24-bit (True Color) simply because they count the sum of all 3 channels together (higher numbers must seem cooler for marketing). An 8-bit RGB image can possibly reproduce 16777216 (=256^3) different colors and results in color fidelity normally sufficient for not seeing any artifacts. Moreover, regular consumer monitors are physically not designed to display more gradations (in fact they may be limited to even less, like 6 bits per channel). So why would someone bother and waste disk space/memory on files of higher bit-depths?

The most basic example when 256 gradations of an 8-bit image are not enough is a heavy color-correction, which may quickly result in artifacts called banding. Rendering to higher bit-depth solves this issue, and normally 16-bit formats with their 65536 distinctions of gray are used for the purpose. But even 10 bits like in Cineon/DPX format can give 4 times higher precision against the standard 8. Going above 2 bytes per channel, on the other hand, becomes impractical as the file size increases proportionally to the bit-depth.*

Insufficient bit-depth of an output device can be another 
cause for banding artifacts, especially in gradients. 
Adding noise can help fighting this issue through dithering. 
A kind of fighting the fire with fire...
*No matter float or integer, the size of a raster image in memory can be calculated as a product of the number of pixels (horizontal times vertical resolution), bit-depth and the number of channels. This way 320x240 8-bit RGBA image would occupy 320x240x8x4=2457600 bits, or 320x240x4=307200 bytes of memory. This does not show the exact file size on disk though, as first, there is additional data like header and other meta-data stored in an image file; and second, some kind of compression (lossless like internal archiving or lossy like in JPEG) is normally utilized in image file formats to save the disk space.

But regardless of the number of gradations (2, 4, 256, 65536, etc), as long as we are using an integer file format, these numbers all describe the values within the range from 0 to 1. For instance, middle gray value in sRGB color space (the color space of a regular computer monitor - not to be confused with RGB color model) is around 0.5 – not 128, and white is 1 – not 255. It is only because 8-bits representation is so popular, many programs by default measure the color in it. But this is not how the underlying math is working and thus can cause problems when trying to make sense of it... For example take a Multiply blending mode – it's easy to learn empirically that it preserves the color of the underlying layer in white areas of the overlay, and darkens the picture under the dark areas – but what exactly is happening – why is it called “multiply”? With black it makes sense – you multiply the underlying color by 0 and get 0 – black, but why would it preserve white areas if white is 255? Multiplying something by 255 should make it way brighter... Well, because it is 1, not 255 (neither 4, nor 16, nor 65536...). And so with the rest of the CG math: white means one.

The above paragraphs referred to how the bit depth works in integer formats – defining the amount of possible variations between 0 and 1 only. Floating point formats are of a different kind. Bit depth does pretty much the same thing here – defines the color precision. However the numbers stored can be anything and may well lay outside of a 0 to 1 range. Like brighter than white (above 1) or darker than black (negative). Internally this works by utilizing the logarithmic scale and requires higher bit-depths for achieving the same fidelity in the usually most important [0,1]. Normally at least 16 or even 32 bits are used per channel to represent floating point data with enough precision. At the cost of the memory usage, this allows for representing High Dynamic Range imagery, additional freedom in compositing, and makes it possible to store arbitrary numerical data in image files like the World Position Pass to name one.

This also means that integer formats always clip the out-of-range pixel values. The quick way to test for clipping is to lower the brightness of a picture and see if any details get revealed in the overbright areas.

A 3d-render of a sphere used to illustrate artifacts of insufficient color precision
Source image

Color banding and clipping artifacts
The same source image rendered to 8 bits integer, 16 bits integer and 16 bits float with 2 different color corrections applied. Notice the color banding in 8-bit and clipping highlights in integer versions.
It is natural for a 3D renderer to work in floating point internally, so most often the risk of clipping would occur when choosing a file format to save the final image. But even when dealing with already given low bit-depth or clipped integer files, there are certain benefits in increasing its color precision inside of the compositing software. (For the best of my knowledge, Nuke converts any imported source into a 32-bits floating point representation internally and automatically.) Such conversion won't add any extra details or qualities to the existing data, but the results of your further manipulations would belong to a better colorspace with less quantization errors (and wider luminance range if you also convert an integer to float). Moreover, you can quickly fake HDR data by converting an integer image to float and gaining up the highlights (bright areas) of the picture. This won't give you a real replacement for the properly acquired HDR, but should suffice for many purposes like diffuse IBL (Image Based Lighting). In other words, regardless of the output requirements, do your compositing in at least 16 bits, float highly preferable – final downsampling and clipping for the output delivery is never a problem.

It is important to have a clear understanding of bit-depth and integer/float differences to deliver the renders in adequate quality and not to get caught during the post-processing stage later. Read up on the file formats and options available in your software. For instance 16 bits can refer to both integer and floating point formats, which may be distinguished as “Short” (integer) and “Half” (float) in Maya. As a general rule of thumb, use 16 bits if you plan for extensive color grading/compositing and make sure you render to floating point format to avoid clipping if any out-of-range values need to be preserved (like details in the highlights or negative values in Zdepth or if you simply use linear workflow). 16-bit OpenEXR files can be considered a good color precision/file size compromise for the general case.

Happy and Merry everyone!

Monday, November 3, 2014

Pixel Is Not a Color Square

Rater images contain nothing but numbers in the table cells

Continuing the announced series of my original manuscripts for 3D World magazine.
Thinking of images as data containers.

Although those raster image files filling our computers and lives are most commonly used to represent pictures (surprisingly), I find it useful for a CG artist to have yet another perspective – a geekier one. And from that perspective a raster image is essentially a set of data organized into a particular structure, to be more specific — a table filled with numbers (a matrix, mathematically speaking).

The number in each table cell can be used to represent a color, and this is how the cell becomes a pixel (stands for “picture element”). Many ways exist to encode colors numerically. For instance (probably the most straightforward one) to explicitly define a number-to-color correspondence for each value (i.e. 3 stands for dark red, 17 for pale green and so on). Such method was frequently used in the older formats like GIF as it allows for certain size benefits at the expense of a limited palette.

Another way (the most common one) is to use a continuous range from 0 to 1 (not 255!), where 0 stands for black, 1 for white, and numbers in between denote the shades of gray of the corresponding lightness. (A 0-255 range of integers is only an 8-bit representation of zero-to-one, popularized by certain software products and harmfully misleading in understanding many concepts such as color math or blending modes.) This way we get a logical and elegantly organized way of representing a monochrome image with a raster file. The term “monochrome” happens to be a more appropriate than “black-and-white” since the same data set can be used to depict gradations from black to any other color depending on the output device – like many old monitors were rather black&green than black&white.

Encoding custom data with images
A raster may contain data of a totally different kind. As an example, let's fill one table with the digits of PI divided by ten, the other one with random values and present both as images. Both data sets have a particular meaning different from each other, still visually they represent the same — noise. And if the visual sense matches the numeric one in the second case, there is almost no chance to correctly interpret the meaning of the first data set purely visually (as an image).

This system, however, can be easily extended to the full-color case with a simple solution – each table cell can contain several numbers, and again there are multiple ways of describing the color with few (usually three) numbers each in 0-1 range. In RGB model they stand for the amounts of Red, Green and Blue light, in HSV - for hue, saturation and brightness accordingly. But most importantly – those are still nothing but numbers which encode a particular meaning, but don't have to be interpreted that way.

Now to the “why it is not a square” part. Because the table, which a raster image is, tells us how many elements are in each row and column, in which order they are placed, but nothing about what shape or even proportion they are. We can form an image from the data in a file by various means, not necessarily with a monitor, which is only one option for an output device. For example, if we would take our image file and distribute pebbles of sizes proportional to pixel values on some surface – we shall still form essentially the same image.

Displaying raster image data with a set of pebbles
Computer monitor is only one out of many possible 
ways to visualize the raster image data.

And even if we'd take only half of the columns, but instruct ourselves to use the stones twice wider for the distribution – the result would still show principally the same picture with the correct proportions, only lacking half of the horizontal details. “Instruct” is the key word here. This instruction is called “pixel aspect ratio”, which describes the difference between the image's resolution (number of rows and columns) and proportions. It allows to store frames stretched or compressed horizontally and is used in certain video and film formats.

Pixel aspect ratio explained in a diagram with pebbles
In this example of an image stored with 
the pixel aspect ratio of 2.0, representing pixels 
as squares results in erroneous proportions (top). 
Correct representation needs to rely 
on the stretched elements like below.

 Since we started on resolution – it shows the maximum amount of detail which an image can hold, but says nothing about how much does it actually hold. A badly focused photograph won't get improved no matter how many pixels the camera sensor has. Same way upscaling a digital image in Photoshop or any other editor will increase the resolution without adding no detail or quality to it – the extra rows and columns would be just filled with interpolated (averaged) values of originally neighboring pixels.

In a similar fashion a PPI (pixels per inch) parameter (commonly mistakenly called DPI – dots per inch) is only an instruction establishing the correspondence between the image file's resolution and the output's physical dimensions. Thus it is pretty much meaningless on its own, without either of those two.

Returning to the numbers stored in each pixel, of course they can be any, including so called out-of range (values above 1 and negative). And of course there can be more than 3 numbers stored in each cell. These features are limited only by the particular file format definition and are widely utilized in OpenEXR to name one.

The great aspect of storing several numbers in each pixel is their independence. Obviously, each of them can be studied and manipulated individually as a monochrome image called Channel – a sub-raster if you want. Additional channels to the usual color-describing Red, Green and Blue can carry all kinds of information. The default fourth channel is Alpha which encodes opacity (0 denotes a transparent pixel, 1 stands for completely opaque). ZDepth, Normals, Velocity (Motion Vectors), World Position, Ambient Occlusion, IDs and anything else you could think of can be stored in either additional or the main RGB channels – it is only data and the way to store it. Every time you render something out, you decide which data to include and where to place it. Same way you decide in compositing how to manipulate the data you possess to achieve the result pursued.

This is the numerical way of image-thinking, and I would like to wrap this article up with few examples of where it comes beneficial.

We've just mentioned understanding and using render passes, but this aside, it is pretty much all of the compositing that requires this perspective. The basic color-corrections for example are nothing but elementary math operations on pixel values, and seeing through it is quite essential for productive work. Furthermore, math operations like addition, subtraction or multiplication can be performed on pixel values, and with data like Normals and Position many 3D shading tools can be mimicked in 2D.

The described perspective is also how programmers see image files, thus especially in game industry it can help artists achieve a better mutual understanding with developers, resulting in better custom tools and cutting corners with various tricks like using textures for non-image data.

And of course the visual effects and motion design. Texture maps controlling properties of a particles emission, RGB displacements forming 3D shapes, encoding multiple passes within RGBA with custom shaders, and on, and on... All these techniques become much more transparent after you start seeing the digits behind the pixels, which is essentially what a pixel is – a number in its place.

Procedural Clouds

Sample outputs of self-made procedural clouds generators

I've been playing around with generating procedural clouds lately, and this time before turning to the heavy artillery of full scale 3D volumetrics, spent some time with good old fractal noises in the good old Fusion.

So row by row, top to bottom:

The base fractus cloudform generator assembled from several noise patterns: from the coarsest one defining the overall random shape to the smallest for the edge work. It is used as a building block in the setups below. Main trick here was not to rely on a single noise pattern, but rather to look for a way to combine several sizes which would maximize the variation of shapes. The quality of the generator seems to be in direct correlation with the time, tenderness and attention spent on fine-tuning the parameters – the setup itself is not really sophisticated.

Another thing was not to aim for a universal solution, but to design a separate setup for each characteristic cloud type. Good reference is a must of course. Keeping such system modular helps as well, so that the higher level assets rely on the properly tuned base elements. Second and third rows are nothing more than different modifications of the base shapes into cirrus through warping. All 3 top types are then put onto a 3D-plane and slightly displaced for a more volumetric feeling.

Clouds in the forth line are merely the 3D bunches of randomized fractus sprites output from the base generator. The effect of shading is achieved through the variance in tones of individual sprites.

The lowest samples are more stylized experiments in distorting the initial sphere geometry and cloning secondary elements over its surface.

Sunday, September 28, 2014

On Anatomy of CG Cameras

Diagram of the main anatomical elements of a virtual camera
Anatomy of a CG Camera

The following article has first appeared in issue 180, and was the first in the series of pieces I've been writing for a 3D World magazine for some time now - the later ones should follow at a (very) roughly monthly pace as well. These versions I'm going to be posting here are my initial manuscripts, and typically differ (like having a worse English and more silly pictures) from what makes it to the print after editing. Try to enjoy.

Anatomy of a CG camera by Denis Kozlov - page 1

Anatomy of a CG camera by Denis Kozlov - page 2

Anatomy of a CG camera by Denis Kozlov - page 3

Anatomy of a CG camera by Denis Kozlov - page 4

Anatomy of a CG camera by Denis Kozlov - page 5

Wednesday, June 11, 2014

Typography Basics for Artists. Part 2 - Matching the Typeface

Anatomic parts of a glyph according to Wiki
Anatomic parts of a glyph according to Wiki:
1) x-height; 2) ascender line; 3) apex; 4) baseline; 5) ascender; 6) crossbar; 7) stem; 8) serif; 9) leg; 10) bowl; 11) counter; 12) collar; 13) loop; 14) ear; 15) tie; 16) horizontal bar; 17) arm; 18) vertical bar; 19) cap height; 20) descender line.
And here it comes finally - the second part of the typography basics for artists, where we're going to address a very common and practical task of matching a typeface to some pre-existing reference. The first part can be found here, and again, the material of these posts should be considered as no more than a starting point for further investigation – a hopefully useful introduction into the boundless world which typography is, aimed at those who do not necessarily inhabit it full-time.

So we have a reference text and want to match its look as close as possible. And first of all we need something to match with. Adobe users have access to the great library of typefaces which is a blessing on a budget, but even with no budget at all there are online collections to browse out there (“download fonts free for commercial use” seems to be a nice search line to start with). “Free for commercial use” part is quite important as many typefaces are freely available only for personal use – fonts are usually distributed with a license text file which is always worth of a study. And for that reason in particular my preferred online collection is Font Squirrel.

As soon as we have a typeface library and a quick way of browsing through it – it only take looking and comparing to find the closest match. Here are few things to look at.

1) The sample text. I personally find it most transparent and convenient to use the reference text (or its part) itself as a sample line when trying candidate typefaces on. Making sure the test string has some digits and special symbols is a good idea too. Another useful and beautiful tool are pangrams – phrases containing every letter of an alphabet. Wikipedia offers quite comprehensive list for numerous languages (including Klingon); some of my favorites for English:

Public junk dwarves quiz mighty fox.
Cozy sphinx waves quart jug of bad milk.
Bored? Craving a pub quiz fix? Why, just come to the Royal Oak!

typographic variants of lowercase "a" grapheme
Image by GearedBull Jim Hood
typographic variants of minuscule "g" grapheme2) One reason to compare the look of all the characters is that even though the other visual parameters (addressed below) of two typefaces might match quite closely, still the same symbol can be represented with different graphemes like the alternative versions of a and g shown on the right. Numbers and special characters allow various visual interpretations as well.

3) Identifying the typeface in question within a broad classification as the first step considerably speeds up the comparison, since from now on we can quickly identify and skip the non-relevant styles and focus on closer examination of candidates from the same group only (like Script or Serif).

4) The next level of precision would be considering the contrast (thickness ratio between the main and supplementary strokes in a typeface) and other proportions of the characters (both overall like wide or tall letters, and between the elements within each letter like ascenders, descenders and counters). These qualities play a big part in defining the look of the font, and the habit of thinking of typefaces in terms of their contrast speeds up the navigation over the typographic ocean considerably.
The contrast of a typeface is the thickness ratio of main and supplementary strokes

5) And then the details. Typography is all about the balance in proportion and fine finishing, so what could be considered a minor in most other visual arts becomes diverse and intricately nuanced. Shapes of the serifs, ending elements, connections between strokes – all have space for diversity. Here is a very cool PDF listing the typographic elements. The style of those elements is also a subject of fashion, and certain details can attribute the typeface to a particular temporal or stylistic group.

Different versions of serif "T" letter

Next part, whenever it will choose to arrive, is going to cover the basics of display typesetting.