C -> ASM
- "Programming" means "writing C code".
- On Linux-like (UNIX, *nix, POSIX) systems, the operating system
API is primarily exposed to C programs through the system C library.
- The hardware doesn't run C though - it runs amd64 machine code (on your
laptop) or ARM machine code (on your phone) or maybe some other machine
code.
- Machine code is for machines, not humans, so it's hard to read.
- Machine code is a series of instructions. If you write the instructions
down as text, you get assembly language.
- To run a C program, you need to translate to machine code (or "binary").
- Conceptually, and historically, you first translate C -> ASM, then
ASM -> binary.
- You can still do this if you explicitly ask for it.
Note: For the first few homeworks you will be writing ASM programs. You
should not have a compiler do this for you. Submitting compiler output
for an assembly assignment is worth zero points.
Example:
// A C program is a collection of functions.
// Here's a minimal program with one function
int
main(int argc, char* argv[])
{
printf("Hello C program\n");
return 0;
}
# Direct C => binary
$ gcc -o hello hello.c
$ ./hello
# C => asm
$ gcc -S -o hello.s hello.c
# take a look at hello.s
# asm => binary
$ gcc -o hello hello.s
$ ./hello
Interesting stuff in hello.s:
- The string is there, but no newline.
- The main function exists
- Starts at label "main"
- Ends at "ret".
- Declared ".globl"
- In the main function another function is called - not printf, but puts.
- The optimizer got to us.
Let's tell it to be less clever:
# C => asm
$ gcc -fno-builtin -S -o hello.s hello.c
# take a look at hello.s
- Now the string has a newline.
- And the function called is "printf".
How about with two functions:
// add1.c
long
add1(long x)
{
return x + 1;
}
int
main(int _ac, char* _av[])
// initial _ marks args as not used
{
long x = add1(5);
printf("%ld\n", x);
return 0;
}
# C => asm
$ gcc -S -o add1.s add1.c
# take a look at hello.s
- Two functions: add1, main
- each starts at label, ends at "ret"
- In main, the value 5 is moved to "%rdi"
- That must be where the function's first argument goes.
- No, that's "%edi"
- I said "%rdi", wait a second...
- Then add1 is called
- In add1, the value from %rdi goes to some places.
- Eventually, "addq $1, ..." happens to it.
- Back in main, %rax is moved to %rsi, and printf is called.
This almost makes sense, but it's a bit of a mess. Let's figure it out.