Welcome to 3650
- Instructor: Nat Tuck
- Course: CS3650 - Computer Systems
Where does this course fit in?
- You're a CS major, or maybe from COE.
- You can write computer programs.
- In this course, we explore some of the details of how
actual programs run on concrete computers.
- To do things, programs need to use hardware resources.
- 1980 personal computer: one program at a time.
- Two programs at a time means conflicts (who gets input from
keyboard? don't want to mix output to line printer!)
- Add a dedicated program to talk the the hardware: the OS. Other
programs ask the OS to access shared resources for them.
- To ask the OS to do stuff for you, you make a system call.
- This class is about writing programs that use system calls.
- System calls are different on different operating systems,
so we need to pick a specific one to use.
- We're using Linux. More specifically, Debian 10.
- Even with an OS, programs are still written to target a specific
- Compiled programs are binary data - machine code - and different
kinds of processors have different machine codes.
- We'll be using the normal archetecture for desktop / laptop computers,
the AMD64 archetecture.
- A platform is the combination of processor archetecture and OS,
for us that's AMD64 Linux.
- My site: http://khoury.neu.edu/~ntuck
- Course Site / Syllabus
- If you get stuck, you can ask questions here.
- You shouldn't generally post code.
- Not for direct messages to course staff: use email for email.
- AMD64 hints links
- Office Hours start Monday.
- Show Bottlenose
- Show HW01 A and B.
- Explain HW01a
- Delay HW01b
- There's a schedule. It may resemble what happens.
- Grades: Assignments.
- Homework: These are difficult programming assignments.
- Challenges: These are very difficult programming assignments that
you are not expected to get 100% on.
Copying code without clear, written attribution is plagarism.
If you submit plagarized work, you fail the course.
You're not allowed to share solution code with other students either.
If you cheat, you get reported to the college, which is bad.
You are given starter code for assignments, you can use that.
There is code shown in lecture. It's not starter code, so using
it without attribution is plagarism. This is the one case where I
might be lenient on the policy, but I also may just give you an
F for the semester on the first offense.
The best way to avoid cheating (and the best way to learn the content
in this course), is to personally type your own code. Don't download
other people's solutions, don't copy and paste other people's code, etc.
C -> ASM
- "Programming" means "writing C code".
- On Linux-like (UNIX, *nix, POSIX) systems, the operating system
API is primarily exposed to C programs through the system C library.
- The hardware doesn't run C though - it runs amd64 machine code (on your
laptop) or ARM machine code (on your phone) or maybe some other machine
- Machine code is for machines, not humans, so it's hard to read.
- Machine code is a series of instructions. If you write the instructions
down as text, you get assembly language.
- To run a C program, you need to translate to machine code (or "binary").
- Conceptually, and historically, you first translate C -> ASM, then
ASM -> binary.
- You can still do this if you explicitly ask for it.
Note: For the first few homeworks you will be writing ASM programs. You
should not have a compiler do this for you. Submitting compiler output
for an assembly assignment is worth zero points.
// A C program is a collection of functions.
// Here's a minimal program with one function
main(int argc, char* argv)
printf("Hello C program\n");
# Direct C => binary
$ gcc -o hello hello.c
# C => asm
$ gcc -S -o hello.s hello.c
# take a look at hello.s
# asm => binary
$ gcc -o hello hello.s
Interesting stuff in hello.s:
- The string is there, but no newline.
- The main function exists
- Starts at label "main"
- Ends at "ret".
- Declared ".globl"
- In the main function another function is called - not printf, but puts.
- The optimizer got to us.
Let's tell it to be less clever:
# C => asm
$ gcc -fno-builtin -S -o hello.s hello.c
# take a look at hello.s
- Now the string has a newline.
- And the function called is "printf".
How about with two functions:
return x + 1;
main(int _ac, char* _av)
// initial _ marks args as not used
long x = add1(5);
# C => asm
$ gcc -S -o add1.s add1.c
# take a look at hello.s
- Two functions: add1, main
- each starts at label, ends at "ret"
- In main, the value 5 is moved to "%rdi"
- That must be where the function's first argument goes.
- No, that's "%edi"
- I said "%rdi", wait a second...
- Then add1 is called
- In add1, the value from %rdi goes to some places.
- Eventually, "addq $1, ..." happens to it.
- Back in main, %rax is moved to %rsi, and printf is called.
This almost makes sense, but it's a bit of a mess. Let's figure it out.
AMD64: ISA and ASM
Intel released the 8086 processor in 1978. It was based on the earlier 8008
processor from 1972, but...
The 8086 was a 16-bit microproessor. That means:
- It had a 16-bit data bus connecting it to memory and maybe other stuff.
- That means a processor and RAM connected by 16 wires.
- How much RAM can we address with 16 bits?
- In addtion to RAM, this system gives us another place to put stuff called
registers. For a 16-bit processor, each register is 16 bits.
- The 8086 had 9-ish registers:
- "general purpose": ax, cx, dx, bx, si, di, bp, sp,
- "special purpose": ip, (segment registers, status register)
- What processors do is execute instructions. Kinds of instructions:
- Arithmetic: Example: add $5, %cx
- Test: cmp $5, %cx
- Conditional branch: jge bigger_label
- Movement instruction: mov (%sp), %dx
- A bunch of other stuff. You'll want to have a reference sheet.
- Instructions tend to operate on at least one register.
- Instructions can operate on memory addresses. If they do, the CPU needs
to stop and read or write from RAM.
The 80386 or i386 was a 32-bit microprocessor, backwards compatible with
the 8086. This was the first "Intel x86" processor:
- It had a 32-bit data bus.
- How much RAM can we address with 32-bits?
- It had 32-bit registers.
- If you used the old names (eg. %ax), you got the least significant
16-bits of the register.
- Each register got a new name with an "e" at the front to refer to
the full 32 bit "extended" register:
The AMD Athlon 64 was a 64-bit microprocessor, backwards compatible with the Intel
8086 and i386. This was the first "AMD64" processor:
- It had a 48-bit data bus, designed to be extended up to 64-bit later.
- How much RAM can we address with 64 bits?
- How about 48 bits?
- It had 64-bit registers.
- If you used the old names (e.g. %ax, %rax), you got the least significant
16 or 32 bits of the register.
- Each register got a new name with an "r" at the front to refer to
the full 64 bit register.
- 8 new general purpose registers were added: %r9, %r10, ..., %r15
And that's where we are today. Let's write an add2 program by hand in amd64
# long add2(long x)
# - the argument comes in in %rdi
# - we return the result by putting it in %rax
enter $0, $0
# long y = x;
mov %rdi, %rax
# y = y + 2;
add $2, %rax
# return y;
enter $0, $0
# long x = 5;
mov $5, %rdi
# y = add1(x)
# result in %rax
# printf("%ld\n", y)
# - first arg goes in %rdi
# - second arg goes in %rsi
# - for a variable arg function, we need to zero %al
# - %al is the bottom 8 bits of %ax/%eax/%rax
mov $long_fmt, %rdi
mov %rax, %rsi
mov $0, %al
long_fmt: .string "%ld\n"
To compile this simple hand-written assembly, we use:
$ gcc -no-pie -o add2 add2.s
A local Linux VM:
- The easiest way to do programming work is to have the development
environment installed locally on your personal computer.
- For Linux systems programming, Linux is our development environment.
- Having it installed as your main OS is probably best.
- But, for consistentency, the assignment is for everyone to install
exactly Debian 10 64-bit in a VirtualBox virtual machine.
- If you aren't developing on the VM and you run into weird problems later in
the semester, use this VM to rule out configuration issues.
The CCIS server:
This is a shared Linux server.
This is a generally useful tool, and it will be possible
to do some of your homework on this server.
Working directly on a remote server is a good reason to learn
a command line editor like vim.
Show Putty, WinSCP web page
Show ssh and scp on Linux
Show Cyberduck page
HW01b - First Programming HW
- Download starter code.
- Write some simple C and ASM code.
- Make sure it compiles and runs.
- Pack it back up and submit.
This assignment is mostly about structure, process, and getting annoyed
at the autograder.
Keep in mind:
- A C (or asm) program is a collection of functions.
- These functions can be in one source file or in a bunch of different files.
- C functions and ASM functions are the same thing. You can mix them together
in the same program.
- It's easiest if each file is all-C or all-ASM.
Object file example:
$ gcc -c -o add1.o add1.c
$ gcc -c -o add2.c add2.c
$ gcc -o example add1.o add2.o main.o
Another Assembly Example
- Scan through the AMD64 instruction list on course site.
- Example: cond_br
$ gcc -no-pie -o prog prog.s