Recently I've been doing some presentations as well as just general sketches of some things I've been thinking about regarding optimization, concurrency and data design. I've been posting them on Twitter to gather feedback from my pals there. A couple have caused a little controversy, but remember that all of them are given in the simple spirit of sharing ideas among peers. And don't forget it's all in good fun!

Three Big Lies

This is a repost of a blog entry I wrote for the Insomniac R&D site (Three Big Lies). It's representative of what I believe are some of the fundamental problems in the culture of software development in general, and games in particular. There are some fundamental truths that seem to be often forgotten. For example, that the point of any program is simply to transform data from one form into another and nothing else. And as one "solution" which ignores the real core problems of development is developed and others over time are built on top of that idea, and so on, we're left with systems that are over-designed, perform poorly and simply do not accomplish what they intended to in the first place - and certainly not well. I continue to suggest that we all take a step back from what we're doing and the methods we're using to solve problems and try to remember what the real issues are.

One of the things we talked about this year at GDC was what we called the "Three Big Lies of Software Development." How much programmers buy into these "lies" has a pretty profound effect on the design (and performance!) of an engine, or any high-performance embedded system for that matter.

Utility: match

Update! If fixed up all the greater-than and less-than symbols in this entry. I didn't make much sense before. I always forget to change those up in the HTML.
I'm just sharing a little utility I use all the time called match.

Usage: ./match [-h] <source_file> <uniq_file>

For each line in <source_file> print the index to the
first matching line in <uniq_file>.

[-h] Print results in 32 bit hexidecimal (default is decimal)

Note: The max line width supported is 4095 characters.
Note: Maximum number of lines supported is (2^32)
If I have a source file of data represented as text (as I often do because it's often easier for me to read binary dumps in a text editor than a special "hex editor"), I use match to create a table of indices to unique lines (often these correspond to 128 bits since that's the size of an SPU register).

I commonly use it like so (given I have a file called "source_file")
sort source_file | uniq > uniq_file
match source_file uniq_file
Now I have a handy table of indices!

Download: match.c

Handy PS3 Linux Framebuffer Utilities

While the documentation within Sony's vsync example should be enough to get you started with writing to the framebuffer, here's a couple of handy functions to test the framebuffer settings, open the virtual terminal and get access the the frame buffer.

HowTo: Huge TLB pages on PS3 Linux

Updated! (22 Mar 07) Minor edits. Added notes for YellowDog Linux. Added source code for using huge page allocation.
Updated! (30 Mar 07) A couple minor fixes. Thanks to Guénaël Renault for pointing them out!
Updated! (15 July 07) Added notes for kernel 2.6.21
Guest article: Understanding the TLB and minimizing misses is a critical part of high performance Cell programming. Unfortunately some PS3 kernels do not come with huge page support enabled. Jakub Kurzak and Alfredo Buttari step through the details of recompiling the kernel for huge page support.
The availability of huge TLB pages depends on the way the linux kernel has been configured prior to compilation. The default kernel that ships with Fedora Core 5 (most likely with any other distribution that has binary kernel packages) doesn't include this option. So, in order to have huge TLB pages, it is necessary to reconfigure the kernel, recompile it, instruct the boot loader about the newly created kernel image. Finally we will also show a way to allocate the TLB pages automatically at boot time.

[Mike Acton] This process also works with YellowDog Linux virtually unchanged.

Cross-compiling for PS3 Linux

Now that the PS3 is out and multiple Linux-based distributions are available which can be installed using Open Platform [playstation.com] it's time to start developing on some publically available hardware!

Although the PPU and SPU compilers can be installed and used on the PS3 directly, I find it much more familiar and convinient to cross-compile from my desktop and just ship the resulting executables over to the target (PS3).

In this article, I will detail the basic steps I used to get started building on a host PC and running on the PS3.

atan2 on SPU

n 2006 March 03 on the IBM developerWorks Cell Broadband Engine Architecture forum [ibm.com] an interesting question was asked:
"I am trying to port an application from an older version of SDK to SDK 1.0. It uses atan2(.....) function, which is causing trouble... This code worked fine on SDK28, but now it looks like the new functions dont have this particular function defined..
I did change the makefile to include $(SDKLIB)/libmath.a

I searched in ./sysroot/usr/spu/include/* and src/include/spu/* but couldn't find a headerfile that has it defined.

Can anyone please suggest if I should just change the code to not use that function or is there a way to invoke it still?

Thanks!"

It turned out this function was not available in the SDK.

The following is a branch-free implementation of atan2 vector floats for the SPU. A scalar version which simply casts to vector and back is also provided. This implementation is fairly quick-and-dirty and no particular level of accuracy is gauranteed, but it should be usable for many purposes.


Or download the source files:
cp_fatan-cbe-spu.h
cp_fatan-cbe-spu.c

Open Source and Console Games

On August 16, 2006 I participated in a panel discussion on Open Source and media as part of Digital Hollywood's Building Blocks 2006 conference.

Here is the description of the panel [from digitalhollywood.com]
The Open Source movement began during the dot.com rise with young companies developing great tools to deliver applications and services across multiple platforms. The consumer's appetite for new content driven experiences has expanded to include ways to view, manage, and share content across devices. With the changing landscape around the home, Open Source promises to power a new generation of applications running over today's high-speed networks and the systems used to create, manage, and distribute that content.

Come join key leaders in the global electronics, online, and media communities to discuss Open Source's definition, and learn how companies will create systems, infrastructure, and applications for the next generation of the Consumer Entertainment Experience.


For those of you who did not attend, I would like to take an opportunity to discuss here my personal opinions on these issues.
Update! (19 July 06) Added Multiply. Fixed a problem with using __builtin_clz().
Update! (17 July 06) The code has been considerably refactored. Decided to go with single function per expression. The expressions have been reduced as a first optimization pass.
Project
The goal of this project is serve as an example of developing some relatively complex operations completely without branches - a software implementation of half-precision floating point numbers (That does not use floating point hardware). This example should echo the IEEE 754 standard for floating point numbers as closely as reasonable, including support for +/- INF, QNan, SNan, and denormalized numbers. However, exceptions will not be implemented.

Half-precision floats are used in cases where neither the range nor the precision of 32 bit floating point numbers are needed, but where some dynamic precision is required. Two common uses are for image transformation, where the range of each component (e.g. red, green, blue, alpha) is typically limited to or near [0.0,1.0] or vertex data (e.g. position, texture coordinates, color values, etc.).

The main advantage of half-precision floats is their size. Beyond the considerable potential for memory savings, processing a large number of half-precision values is more cache-friendly than using 32 bit values.

The current released version (including tests) can be downloaded here: half.c half.h

Increment And Decrement Wrapping Values

Small code, big impact
Occasionally you have a set of values that you want to wrap around as you increment and decrement them. For example, in a GUI where the user keys right or left and you want to wrap around the menu.

A typical implementation:
static inline int wrap_inc( int value, int min, int max ) { return ( value == max ) ? min : value + 1; } static inline int wrap_dec( int value, int min, int max ) { return ( value == min ) ? max : value - 1; }

But on processors (such as the PowerPC) where compare and branch is very costly these small one-liners can have a significant impact on performance when used in critical code. They also make optimization more difficult for the compiler for the surrounding code.