CellPerformance: Forums
Share your opinions and questions related to the Cell Broadband Engine (CBE) processor. (Forums hosted by Beyond3D.com)
CellPerformance is proud to be an Official Partner of the
IGDA
Fast Matrix Multiplication on Cell (SMP) Systems
Daniel Hackenberg wrote to tell me about some matrix multiply code he has written for the Cell.
Cleaning House
I'm working on a plan that will make the forums better and more useful. And hopefully, I can get a little help from some friends.
Handy PS3 Linux Framebuffer Utilities
While the documentation within Sony's vsync example should be enough to get you started with writing to the framebuffer, here's a couple of handy functions to test the framebuffer settings, open the virtual terminal and get access the the frame buffer.
HowTo: Huge TLB pages on PS3 Linux
Understanding the TLB and minimizing misses is a critical part of high performance Cell programming. Unfortunately some PS3 kernels do not come with huge page support enabled. Jakub Kurzak and Alfredo Buttari step through the details of recompiling the kernel for huge page support.
Cross-compiling for PS3 Linux
n this article, I will detail the basic steps I used to get started building on a host PC and running on the PS3.
Unaligned scalar load and store on the SPU
An example of unaligned loads and stores on the SPU. The solution to this problem is to remember that the SPU does not have a scalar instruction set or access local memory in anything except 16 bytes quadwords.
atan2 on SPU
A branch-free implementation of atan2 vector floats for the SPU.
Branch-free implementation of half-precision (16 bit) floating point
The goal of this project is serve as an example of developing some relatively complex operations completely without branches - a software implementation of half-precision floating point numbers.
Better Performance Through Branch Elimination
An introduction to branch penalties: Why it's a good idea to avoid branchy code.
Box Overlap
A look at a function to test for overlap between 3D boxes, and how to optimize it for the CBE.
A 4x4 Matrix Inverse
Study case about how to convert scalar code indo SIMD code for PPU and SPU using the matrix inverse as example.
Avoiding Microcoded Instructions On The PPU
Executing instructions from microcode can wreck havok on inner loop performance. Find out which instructions are microcoded and how to avoid them.
Choosing to Avoid Branches: A Small Altivec Example
An example of why less instructions doesn't always equal faster code.
More Techniques for Eliminating Branches
Some additional examples for eliminating integer and floating-point branches.
Programming with Branches, Patterns and Tips
GCC follows some straightforward rules that are useful to know when programming with branches.