Monday, November 3, 2014

Pixel Is Not a Color Square

Rater images contain nothing but numbers in the table cells

Continuing the announced series of my original manuscripts for 3D World magazine.
 
Thinking of images as data containers.


Although those raster image files filling our computers and lives are most commonly used to represent pictures (surprisingly), I find it useful for a CG artist to have yet another perspective – a geekier one. And from that perspective a raster image is essentially a set of data organized into a particular structure, to be more specific — a table filled with numbers (a matrix, mathematically speaking).

The number in each table cell can be used to represent a color, and this is how the cell becomes a pixel (stands for “picture element”). Many ways exist to encode colors numerically. For instance (probably the most straightforward one) to explicitly define a number-to-color correspondence for each value (i.e. 3 stands for dark red, 17 for pale green and so on). Such method was frequently used in the older formats like GIF as it allows for certain size benefits at the expense of a limited palette.

Another way (the most common one) is to use a continuous range from 0 to 1 (not 255!), where 0 stands for black, 1 for white, and numbers in between denote the shades of gray of the corresponding lightness. (A 0-255 range of integers is only an 8-bit representation of zero-to-one, popularized by certain software products and harmfully misleading in understanding many concepts such as color math or blending modes.) This way we get a logical and elegantly organized way of representing a monochrome image with a raster file. The term “monochrome” happens to be a more appropriate than “black-and-white” since the same data set can be used to depict gradations from black to any other color depending on the output device – like many old monitors were rather black&green than black&white.

Encoding custom data with images
A raster may contain data of a totally different kind. As an example, let's fill one table with the digits of PI divided by ten, the other one with random values and present both as images. Both data sets have a particular meaning different from each other, still visually they represent the same — noise. And if the visual sense matches the numeric one in the second case, there is almost no chance to correctly interpret the meaning of the first data set purely visually (as an image).

This system, however, can be easily extended to the full-color case with a simple solution – each table cell can contain several numbers, and again there are multiple ways of describing the color with few (usually three) numbers each in 0-1 range. In RGB model they stand for the amounts of Red, Green and Blue light, in HSV - for hue, saturation and brightness accordingly. But most importantly – those are still nothing but numbers which encode a particular meaning, but don't have to be interpreted that way.

Now to the “why it is not a square” part. Because the table, which a raster image is, tells us how many elements are in each row and column, in which order they are placed, but nothing about what shape or even proportion they are. We can form an image from the data in a file by various means, not necessarily with a monitor, which is only one option for an output device. For example, if we would take our image file and distribute pebbles of sizes proportional to pixel values on some surface – we shall still form essentially the same image.

Displaying raster image data with a set of pebbles
Computer monitor is only one out of many possible 
ways to visualize the raster image data.


And even if we'd take only half of the columns, but instruct ourselves to use the stones twice wider for the distribution – the result would still show principally the same picture with the correct proportions, only lacking half of the horizontal details. “Instruct” is the key word here. This instruction is called “pixel aspect ratio”, which describes the difference between the image's resolution (number of rows and columns) and proportions. It allows to store frames stretched or compressed horizontally and is used in certain video and film formats.

Pixel aspect ratio explained in a diagram with pebbles
In this example of an image stored with 
the pixel aspect ratio of 2.0, representing pixels 
as squares results in erroneous proportions (top). 
Correct representation needs to rely 
on the stretched elements like below.


 Since we started on resolution – it shows the maximum amount of detail which an image can hold, but says nothing about how much does it actually hold. A badly focused photograph won't get improved no matter how many pixels the camera sensor has. Same way upscaling a digital image in Photoshop or any other editor will increase the resolution without adding no detail or quality to it – the extra rows and columns would be just filled with interpolated (averaged) values of originally neighboring pixels.

In a similar fashion a PPI (pixels per inch) parameter (commonly mistakenly called DPI – dots per inch) is only an instruction establishing the correspondence between the image file's resolution and the output's physical dimensions. Thus it is pretty much meaningless on its own, without either of those two.

Returning to the numbers stored in each pixel, of course they can be any, including so called out-of range (values above 1 and negative). And of course there can be more than 3 numbers stored in each cell. These features are limited only by the particular file format definition and are widely utilized in OpenEXR to name one.

The great aspect of storing several numbers in each pixel is their independence. Obviously, each of them can be studied and manipulated individually as a monochrome image called Channel – a sub-raster if you want. Additional channels to the usual color-describing Red, Green and Blue can carry all kinds of information. The default fourth channel is Alpha which encodes opacity (0 denotes a transparent pixel, 1 stands for completely opaque). ZDepth, Normals, Velocity (Motion Vectors), World Position, Ambient Occlusion, IDs and anything else you could think of can be stored in either additional or the main RGB channels – it is only data and the way to store it. Every time you render something out, you decide which data to include and where to place it. Same way you decide in compositing how to manipulate the data you possess to achieve the result pursued.

This is the numerical way of image-thinking, and I would like to wrap this article up with few examples of where it comes beneficial.

We've just mentioned understanding and using render passes, but this aside, it is pretty much all of the compositing that requires this perspective. The basic color-corrections for example are nothing but elementary math operations on pixel values, and seeing through it is quite essential for productive work. Furthermore, math operations like addition, subtraction or multiplication can be performed on pixel values, and with data like Normals and Position many 3D shading tools can be mimicked in 2D.

The described perspective is also how programmers see image files, thus especially in game industry it can help artists achieve a better mutual understanding with developers, resulting in better custom tools and cutting corners with various tricks like using textures for non-image data.

And of course the visual effects and motion design. Texture maps controlling properties of a particles emission, RGB displacements forming 3D shapes, encoding multiple passes within RGBA with custom shaders, and on, and on... All these techniques become much more transparent after you start seeing the digits behind the pixels, which is essentially what a pixel is – a number in its place.


Procedural Clouds

Sample outputs of self-made procedural clouds generators

I've been playing around with generating procedural clouds lately, and this time before turning to the heavy artillery of full scale 3D volumetrics, spent some time with good old fractal noises in the good old Fusion.

So row by row, top to bottom:

The base fractus cloudform generator assembled from several noise patterns: from the coarsest one defining the overall random shape to the smallest for the edge work. It is used as a building block in the setups below. Main trick here was not to rely on a single noise pattern, but rather to look for a way to combine several sizes which would maximize the variation of shapes. The quality of the generator seems to be in direct correlation with the time, tenderness and attention spent on fine-tuning the parameters – the setup itself is not really sophisticated.

Another thing was not to aim for a universal solution, but to design a separate setup for each characteristic cloud type. Good reference is a must of course. Keeping such system modular helps as well, so that the higher level assets rely on the properly tuned base elements. Second and third rows are nothing more than different modifications of the base shapes into cirrus through warping. All 3 top types are then put onto a 3D-plane and slightly displaced for a more volumetric feeling.

Clouds in the forth line are merely the 3D bunches of randomized fractus sprites output from the base generator. The effect of shading is achieved through the variance in tones of individual sprites.

The lowest samples are more stylized experiments in distorting the initial sphere geometry and cloning secondary elements over its surface.

Sunday, September 28, 2014

On Anatomy of CG Cameras

Diagram of the main anatomical elements of a virtual camera
Anatomy of a CG Camera

The following article has first appeared in issue 180, and was the first in the series of pieces I've been writing for a 3D World magazine for some time now - the later ones should follow at a (very) roughly monthly pace as well. These versions I'm going to be posting here are my initial manuscripts, and typically differ (like having a worse English and more silly pictures) from what makes it to the print after editing. Try to enjoy.


Anatomy of a CG camera by Denis Kozlov - page 1

Anatomy of a CG camera by Denis Kozlov - page 2

Anatomy of a CG camera by Denis Kozlov - page 3

Anatomy of a CG camera by Denis Kozlov - page 4

Anatomy of a CG camera by Denis Kozlov - page 5



Wednesday, June 11, 2014

Typography Basics for Artists. Part 2 - Matching the Typeface

Anatomic parts of a glyph according to Wiki
Anatomic parts of a glyph according to Wiki:
1) x-height; 2) ascender line; 3) apex; 4) baseline; 5) ascender; 6) crossbar; 7) stem; 8) serif; 9) leg; 10) bowl; 11) counter; 12) collar; 13) loop; 14) ear; 15) tie; 16) horizontal bar; 17) arm; 18) vertical bar; 19) cap height; 20) descender line.
And here it comes finally - the second part of the typography basics for artists, where we're going to address a very common and practical task of matching a typeface to some pre-existing reference. The first part can be found here, and again, the material of these posts should be considered as no more than a starting point for further investigation – a hopefully useful introduction into the boundless world which typography is, aimed at those who do not necessarily inhabit it full-time.

So we have a reference text and want to match its look as close as possible. And first of all we need something to match with. Adobe users have access to the great library of typefaces which is a blessing on a budget, but even with no budget at all there are online collections to browse out there (“download fonts free for commercial use” seems to be a nice search line to start with). “Free for commercial use” part is quite important as many typefaces are freely available only for personal use – fonts are usually distributed with a license text file which is always worth of a study. And for that reason in particular my preferred online collection is Font Squirrel.

As soon as we have a typeface library and a quick way of browsing through it – it only take looking and comparing to find the closest match. Here are few things to look at.

1) The sample text. I personally find it most transparent and convenient to use the reference text (or its part) itself as a sample line when trying candidate typefaces on. Making sure the test string has some digits and special symbols is a good idea too. Another useful and beautiful tool are pangrams – phrases containing every letter of an alphabet. Wikipedia offers quite comprehensive list for numerous languages (including Klingon); some of my favorites for English:

Public junk dwarves quiz mighty fox.
Cozy sphinx waves quart jug of bad milk.
Bored? Craving a pub quiz fix? Why, just come to the Royal Oak!


typographic variants of lowercase "a" grapheme
Image by GearedBull Jim Hood
typographic variants of minuscule "g" grapheme2) One reason to compare the look of all the characters is that even though the other visual parameters (addressed below) of two typefaces might match quite closely, still the same symbol can be represented with different graphemes like the alternative versions of a and g shown on the right. Numbers and special characters allow various visual interpretations as well.

3) Identifying the typeface in question within a broad classification as the first step considerably speeds up the comparison, since from now on we can quickly identify and skip the non-relevant styles and focus on closer examination of candidates from the same group only (like Script or Serif).

4) The next level of precision would be considering the contrast (thickness ratio between the main and supplementary strokes in a typeface) and other proportions of the characters (both overall like wide or tall letters, and between the elements within each letter like ascenders, descenders and counters). These qualities play a big part in defining the look of the font, and the habit of thinking of typefaces in terms of their contrast speeds up the navigation over the typographic ocean considerably.
 
The contrast of a typeface is the thickness ratio of main and supplementary strokes

5) And then the details. Typography is all about the balance in proportion and fine finishing, so what could be considered a minor in most other visual arts becomes diverse and intricately nuanced. Shapes of the serifs, ending elements, connections between strokes – all have space for diversity. Here is a very cool PDF listing the typographic elements. The style of those elements is also a subject of fashion, and certain details can attribute the typeface to a particular temporal or stylistic group.

Different versions of serif "T" letter



Next part, whenever it will choose to arrive, is going to cover the basics of display typesetting.

Monday, March 10, 2014

My article on CG cameras in 3D World magazine

It should be out and on the shelves by now. Unfortunately, few errors sneaked into the printed version of the article. However, the editorial promised me to fix those in digital edition and to put the edited pdf into the online 'Vault', which all print readers have access to when they buy the issue. 

3D World Website

A little preview of the article below.