Tuesday, February 10, 2009

AMD64 Function Calling Sequence

Gone are the days when function parameters were passed on the stack. In earlier architectures, the stack frame looked like this:
[local variables]
old ebp
return address
parameter 1
parameter 2
parameter 3
The function parameters were pushed onto the stack starting from the rightmost parameter. This was followed by the return address pushed by the call instruction, and the old ebp register value pushed by the function prolog. Then came the stack space for the local variables.

With the advent of 64-bit architectures such as the AMD-64, this scheme has changed. There are 8 more general purpose registers in the 64-bit architecture (labeled r8 through r15) which can be used for parameter passing. These eight registers are labeled r8d through r15d if only 32 of their 64 bits need to be accessed.

The AMD64 Application Binary Interface (ABI for short) defines detailed rules about how these registers are to be used. We will look at only two important rules:
  1. Registers rbp, rbx, r12, r13, r14, r15 belong to the calling function. Mnemonic: the B registers (rBp, rBx) and r12-r15 belong to the caller.
  2. Integer arguments are passed in the following registers in order: rdi, rsi, rdx, rcx, r8, r9. Integer values are generally returned in the register rax.
Let us see what these rules imply.

Rule number 1. If function bar is called by function foo, then foo expects the values of the specified registers to be unchanged when bar returns. To adhere to this convention, the compiler generates code to preserve the values of these registers in the function prolog of bar. If you look at the disassembly of function bar, then you will see a few push instructions at the beginning; the number of push instructions depends on how many of the caller-owned registers are actually modified in the callee. Correspondingly the function epilog contains instructions to pop the saved values back into these registers before the function returns.

The other registers belong to the called function. Which means that the called function is not required to preserve the other registers and is free to modify their values.

Rule number 2. Notice that upto six integer arguments can be passed in registers. This covers the most common parameter types and eliminates parameter passing on the stack. This speeds up function calls quite a bit since there is no need to do memory reads and writes - just fill up the registers and call the function. Remember that memory access is two or more orders of magnitude slower than register access.

Neat, eh? Note that there are several other details involved that have been skipped in this description. You can refer to the AMD64 ABI for more information.

Monday, February 9, 2009

Magic Squares

A magic square is a square whose cells are filled with numbers such that the sum of each row, column and diagonal is the same. Here's how to create a magic square having three rows. Fill the 9 cells with the numbers 1-9 sequentially. Start with a 1 in the center of the first row (x denotes an as yet unfilled cell):
x1x
xxx
xxx
Now there are two simple rules to follow.
1) Try to move North-East. If there is no available cell there, then wrap around. If we go out of the top of the square, then we wrap downwards; if we go out of the right of the square, then we wrap leftwards. To understand "wrapping" better, imagine the square rolled into a cylinder behind the screen from top down or right to left.
2) If a cell is present to the north-east and if it is already occupied, then enter the next number in the cell directly below the current one. Similar action is taken after filling the cell at the top-right corner of the square.
Using these two simple rules, the cells get filled up like this:
(x1x)---(x1x)---(x1x)---(x1x)
(xxx)---(3xx)---(3xx)---(35x)
(xx2)---(xx2)---(4x2)---(4x2)

(x16)---(x16)---(816)---(816)
(35x)---(357)---(357)---(357)
(4x2)---(4x2)---(4x2)---(492)
The same procedure applies to any magic square having an odd number of rows.
Happy squaring!