Tuesday, February 10, 2009

AMD64 Function Calling Sequence

Gone are the days when function parameters were passed on the stack. In earlier architectures, the stack frame looked like this:
[local variables]
old ebp
return address
parameter 1
parameter 2
parameter 3
The function parameters were pushed onto the stack starting from the rightmost parameter. This was followed by the return address pushed by the call instruction, and the old ebp register value pushed by the function prolog. Then came the stack space for the local variables.

With the advent of 64-bit architectures such as the AMD-64, this scheme has changed. There are 8 more general purpose registers in the 64-bit architecture (labeled r8 through r15) which can be used for parameter passing. These eight registers are labeled r8d through r15d if only 32 of their 64 bits need to be accessed.

The AMD64 Application Binary Interface (ABI for short) defines detailed rules about how these registers are to be used. We will look at only two important rules:
  1. Registers rbp, rbx, r12, r13, r14, r15 belong to the calling function. Mnemonic: the B registers (rBp, rBx) and r12-r15 belong to the caller.
  2. Integer arguments are passed in the following registers in order: rdi, rsi, rdx, rcx, r8, r9. Integer values are generally returned in the register rax.
Let us see what these rules imply.

Rule number 1. If function bar is called by function foo, then foo expects the values of the specified registers to be unchanged when bar returns. To adhere to this convention, the compiler generates code to preserve the values of these registers in the function prolog of bar. If you look at the disassembly of function bar, then you will see a few push instructions at the beginning; the number of push instructions depends on how many of the caller-owned registers are actually modified in the callee. Correspondingly the function epilog contains instructions to pop the saved values back into these registers before the function returns.

The other registers belong to the called function. Which means that the called function is not required to preserve the other registers and is free to modify their values.

Rule number 2. Notice that upto six integer arguments can be passed in registers. This covers the most common parameter types and eliminates parameter passing on the stack. This speeds up function calls quite a bit since there is no need to do memory reads and writes - just fill up the registers and call the function. Remember that memory access is two or more orders of magnitude slower than register access.

Neat, eh? Note that there are several other details involved that have been skipped in this description. You can refer to the AMD64 ABI for more information.

11 comments:

  1. Iam from a networking domain... Not in touch with assembly level programming concepts and processor specifications..
    This feature of amd64 looks amazing.. But The number of integer registers also seems to have doubled up.. Hence they are using them for all these cool things to make function call faster! :D
    6 integer arguments can be passed bcoz of using 6 registers right? But each register is a 64 bit register.. may be we would not want to pass a 64 bit integer. can this be used to make the number 6 to 12 ;) ?
    Just wondering.. I really dont know if it even makes sense for you.. as iam not used to all this much :D
    would love to see you write a blog on networking advancements :D
    But my blogging is restricted to poems, stories and entertainment stuff :)

    ReplyDelete
  2. Yes, processors are becoming powerful by the day; starting with large L1 and L2 caches, and even larger L3 caches, and complex instruction sets to large register files, CPU's have become very sophisticated. The trend everywhere seems to be to throw more and more hardware to speed things up.

    I'm not sure if I got your question right, but if you are referring to 6 to 12 *bits*, then yes, that's possible. The high order bits of the register would be zero, that's all.

    Future processors will all be 64-bit. Main memory size is increasing, and 32-bit addresses are no longer sufficient. Even now, there are special features such as Physical Address Extensions (PAE) to enable 32-bit processors access more memory. Native 64-bit processors don't require those tweaks. Another advantage of these gizmos is the wide 64-bit data path. This means more data can be transferred from memory to CPU (and vice versa) in one cycle.

    I haven't even begun to touch the amazing features of modern computers. For example, there are powerful instruction sets (MMX, SSE, SSE2) that support high-end games and other cpu-hungry applications.

    I'll keep your point about networking advancements in mind. Maybe you can suggest some topics that would be of general interest...

    ReplyDelete
  3. Ya. super if everything becomes based on 64 bit processors... as you said!
    Ya you got my one half of question right.. but i meant.. can we increase the number of parameters we pass from 6 to 12 ?
    Regarding the networking topics..
    Network Virtualization... I dont know much about it..
    Data Center Networking facts !
    etc.. I suppose you would have a touch on all of these facts!

    ReplyDelete
  4. Also, i feel good to see that the interface is named as ABI :D

    ReplyDelete
  5. If you have more than 6 integral parameters to pass, then the first six can be passed using the six registers. The remaining ones have to be passed on the stack like before.

    As an aside, if you ask a software engineering purist, (s)he may reply that if you are passing six parameters to a function, then you should probably split the function down into smaller functions. Another possibility could be to pack the parameters in a structure (if they are all related) and pass a pointer to the structure. Yet another way, which is probably not very desirable, is to use global variables.

    It may be interesting to know that none of the Unix system calls take more than six parameters!

    ReplyDelete
  6. Narasimha, it's nice to see you writing about function call stacks. It looks elegant at first glance, but here are some catches.
    - In real-world applications, function call depth goes to at least 5-6 levels down. So, re-usage of the regs is unavoidable. In those cases, i believe, it pushes old reg values. Won't this effort cancel the benefit of not pushing the parameters in the beginning?
    - It may sound like accessing mem variables is expensive compared to regs. But, i think when a function is called, the most part of its activation record (on stack) is on cache (mostly on L1). So, the expense may not be as bad as 2-3 magnitude difference.
    - We could have used these regs for other purposes which could've increased the performance. So, we should consider that compromise too.

    Cheers,
    Channa

    ReplyDelete
  7. splitting in to different functions will again involve memory stack push and pops.. not desirable just to memory read/write of 6 extra parameters..!

    rather, we left shift the second parameter by 32 bits and AND it with the first parameter and copy to one register.. :D now we can have 2 parameters being passed in 1 register :P !!! isnt it beautiful :D ?

    ReplyDelete
  8. Thanks for the comments, guys. These are what keep a blog alive.

    @Channa:
    Let me try to express my thoughts on the three points that you have made:
    1) Only caller-owned registers are pushed and popped by the callee. Further, out of these registers, only the registers that are actually modified by the callee are pushed and popped. I would expect the compiler to try to minimize using the caller owned registers as much as possible in the callee, unless of course there is register pressure. If you exclude the six caller owned registers, that leaves 10 more registers available for the callee. This may cover most small functions.

    2) I think you have a point here. The difference in accessing the parameters may not be as high as 2 or 3 orders of magnitude in practice.

    3) To access a parameter, it has to be brought into a register finally. Unless each parameter is accessed in code that is far apart, we would need more or less as many registers as there are parameters anyway.

    The general statement about registers is however true; however 16 general purpose registers is a LOT. Unless you have very big functions with lots of local variables whose live ranges span large blocks of code.

    The general point I want to make is the following. Every architecture provides certain beneficial features that can help performance. However this must be utilized by software too. Understanding the processor architecture while writing programs helps a great deal.

    @Abhishek:
    Good design always comes first. Performance tuning comes later. I have seen several instances where one function does several logical tasks; this requires passing several parameters. Too often this ends up in difficult to detect bugs.

    Shifting and OR'ing is fine as long as one makes sure that the code is understandable!

    ReplyDelete
  9. I meant, can that "Shifting and OR'ing " be done INTERNALLY.. so that code remains unchanged!

    ReplyDelete
  10. ಸಸ್ಯಾಹಾರಿ ಹೋಟೆಲ್ ಇತ್ತಾ ಅಲ್ಲಿ?

    ReplyDelete
  11. Hi Srikrishna, maybe you wanted to comment on my Shillong post. I'll post a reply there itself.

    ReplyDelete