Fo CA PART 1 x86 Assembler Programming with C++ & Visual Studio booklet 2018 PDF

Title Fo CA PART 1 x86 Assembler Programming with C++ & Visual Studio booklet 2018
Course fundamentals of computer architecture
Institution Sheffield Hallam University
Pages 26
File Size 1.8 MB
File Type PDF
Total Downloads 67
Total Views 122

Summary

Download Fo CA PART 1 x86 Assembler Programming with C++ & Visual Studio booklet 2018 PDF


Description

Fundamentals of Computer Architecture

x86 Assembly-Level Programming & Debugging with C++ & Visual Studio

Part 1 Student name:

Course: Group:

 This document contains the practical exercises for the coming weeks, please take care to read it and practice the procedures introduced. There is no substitute for practice, so continue to experiment with the techniques outside of the practical session.  Feel free to annotate these with your own notes, and let me know of any errors you find.  Bring this along to each week's session and work through the exercises at your own pace. Please fill in the boxes above in case you lose it or leave it lying around.

There are Tasks to do:

Questions to think about and answer

and things to take Note of: As always, if you miss a session through pestilence or disease there are several other sessions during the week - you are welcome to use one of those to catch up or to have more time to work on the material - just ask, or look on Blackboard for times of other sessions.

Dr Adrian Oram, [email protected]. Sheffield Hallam University, 2017

1

Section 1

Visual Studio and x86 Assembler

What & Why These tutorials are about starting to make you familiar with assembly level languages. Assembly is a language for programming a computer. The difference between assembly languages (such as the so-called x86 assembler on the Core 2 Duo and similar class of processor) and high level languages (HLL's) such as C++ or Java, is how close you are to the native machine language (as executed by the actual hardware) of the processor. Assembly is a low level language (LLL) and each instruction or statement you write in the language corresponds one-to-one with an instruction (or, machine code) that the processor can actually execute. This contrasts with HLL's which have to be put through a complicated process of compilation and linking to get them into machine code format. There isn't a CPU in existence that can execute C++ statements directly, for example. If you look at the machine code a compiler produces it often seems to bear little relationship with the HLL statements you original typed (but to an experienced eye, the relationship becomes more obvious). Programs written in assembly are quickly and easily translated to machine code by a simple assembler (comparable in function to a compiler, but less complex) - for example, the assembler for the Pentium and equivalent class of processors produced by Microsoft is called MASM  (no prizes for originality there, then). Because we will be writing programs in a language 'nearer' to the machine's you will find it is more cryptic and difficult to use (the so-called Semantic Gap) - and this is why HLL's were developed! However, the advantages of knowing assembly and being able to use it as a programming tool - especially in demanding environments such as games programming and real-time systems (e.g. aircraft control) - are manifold, for example: 

being able to optimise programs for speed and/or memory usage by fiddling with the code generated by a HLL compiler,



developing low level, efficient sections of code, such as device drivers, that need access to hardware registers and the like,



serious debugging of subtle run-time bugs,



when there isn't a compiler available for a specialist or dedicated piece of hardware or processor (assemblers are easy to develop compared to a compiler for a HLL).

Learning to program in assembly also increases your understanding of the hardware features that are normally hidden from you by a HLL - registers, stacks, memory usage, cache memories, input-output device control, and so on. It is also a useful skill to put on your programming CV (few programmers are skilled in assembly language!)

The Programming Environment To make life easier we will use the facilities available to us in the Visual Studio (VS) package to develop assembly programs - C++ and other high level languages allow us to write sections of assembly language code within the framework of a high level language. We shall be using the x86 assembly language as this is the ISA (Instruction set architecture) for the vast majority of PC's in use. The same ISA is used by AMD and Intel even though they are competitors in the processor marketplace. Other ISAs exist, of course, such as that used by IBM's PowerPC range, MIPS processors and ARM devices, and code written in x86 cannot be run by non-x86 processors such as these. However, there is a great  MASM is available for your use through the Academic Alliance scheme (DreamSpark), should you wish to experiment with it. 

2

deal of similarity in the types of instructions each of these separate ISAs provide and the assembly language programming skills you will acquire are transferable.

Getting Started You should be familiar with the VS environment from your other programming modules. Note - in what follows many of the screenshots originate from earlier versions of VS but the differences are generally cosmetic and the same functionality is available in VS2017; so you should easily become familiar with the more recent version. Task Start VS and create a new Empty Project called "FoCA first x86" (or similar)

Add a new C++ source file called Hello World.cpp …

hich you can paste i

3

// FoCA Hello World in C++ #include int main(void) { printf ( "Hello World!\n" ); return 0; } Build it as usual using 'Debug' mode, not 'Release'*. You will probably be familiar with this commonly-used introductory program, but make sure it works. Press F5 to build and run it.

*Note: Release mode is used when you are confident that your code is working properly and is ready to be used by, or sold to, clients. We're using Debug mode as in this configuration the compiler generates assembly code that is potentially readable and debuggable. In Release mode the code is obfuscated by the compiler as it attempts optimisation and reformats your original program in the quest for better performance or code density.

To see the output before it disappears place a Breakpoint by clicking in the left margin of the source next to the return 0 instruction- you should get a red dot as shown above (you can set more than one breakpoint too, and just about anywhere in your programs).

4

When you re-run the program now it will pause at this point allowing you to click on the black Console window in the bottom task bar to see the 'Hello World! string.

Note: if you wish to alter the Console screen display, right-click on the top bar and select Properties. Change the screen to suit yourself -- below the Font has been changed to Lucida Console and made much larger, Size 28:

A yellow arrow will appear on top of the red Breakpoint dot; this arrow indicates the next instruction to be executed - remember this! (For those who know, the yellow arrow depicts the Instruction Pointer, specifically, the EIP register.) Breakpoints are easily removed by clicking on them again. Rightclicking on a breakpoint gives you more options to control these features, take a look at them.

Press the Continue button or F5 to continue running the program - it will terminate.

What has this C++ program got to do with assembly? As mentioned all programs are converted to machine code (VS calls these 'code bytes') and we can use the debugger facility of VS to look at this. A listing follows that shows what the Hello World program looks like in x86 machine code, byte by byte (in red), with, on the left, the actual 32-bit addresses in memory where the codes are stored - notice that all numbers are displayed in hexadecimal!

5

_main: 00E31270 68 8C 31 E3 00 00E31275 FF 15 B4 30 E3 00 00E3127B 83 C4 04 00E3127E 33 C0 00E31280 C3 --- No source file ----------------------------------------------

As you can see, looking at machine code bytes (which are actually just binary numbers!) could be more than a little tedious but in common with many IDE's the debugger comes with a dis-assembler that converts machine code back to assembly language so that we can read it (try doing that with a HLL: a 'dis-compiler' would be very hard to develop!) Using the disassembly feature the Hello World code can be viewed like this, except this time it's showing x86 assembly language instructions (again, in red):

_main: 00E31270 68 8C 31 E3 00 push 0E3318Ch 00E31275 FF 15 B4 30 E3 00 call dword ptr ds:[00E330B4h] 00E3127B 83 C4 04 add esp,4 00E3127E 33 C0 xor eax,eax 00E31280 C3 ret --- No source file ---------------------------------------------

To see this for yourself, re-start your program and check it has stopped at your breakpoint (as per the previous page), then right-click in the main window, and choose Go To Disassembly from the pop-up menu. Right-clicking in this new disassembly window gives you options to turn on or off various attributes, such as Show Code Bytes, and Show Source Code; try them. Be aware that the addresses and list of instructions you see on your PC will differ from those shown here – compilers produce different code bytes depending on local settings and other options you can change or are defaulted to by VS. This disassembly display clearly illustrates the one-to-one relationship between machine code bytes and assembly instructions. For example, for the 'add' instruction:

83 C4 04

add

esp,4

the 83 means do an 'add' operation the C4 means use the register called 'ESP' to add something to, and… the 04 means, well, 4! This is the something, or immediate value, that is to be added to the ESP register's value. ESP and the 04 are called the add operation's operands. Every combination of operation plus operands has a unique set of code bytes, and it is this that distinguishes one CPU's ISA from another. An ADD operation in an ARM processor, for example, will have very different code bytes - this is why your program will not run on an Intel based PC one moment and an ARM based mobile device the next - it needs to be recompiled for the different target ISA each time.

6

Usually we don't bother looking at the code bytes, so we'll only concern ourselves with the assembly language listing part. In fact we can also view the original C++ source that the compiler has used to generate the assembly instructions, and thus tie them together – use right-click, Show Source Code to see the C++ (shown in red below, it’s black on the screen) and x86 assembly together:

printf ( "Hello 00E31270 push 00E31275 call 00E3127B add

World!\n" ); 0E3318Ch dword ptr ds:[00E330B4h] esp,4

return 0; 00E3127E xor eax,eax } 00E31280 ret --- No source file -----------------------------------------------The assembly instructions following the C++ statement are the actual instructions that do the work, no C++ statements are being stored in memory here, notice (no memory addresses are showing alongside those)! Assembly instruction names such as 'add' and 'call' are often referred to as 'mnemonics'. A list of those mnemonics you will see most frequently is given on page 23. Inspecting the assembly code produced by simple programs like the Hello World program is a good way of becoming familiar with the type of instructions available in x86 assembly programming (or any other ISA, in fact).

An assembly program is a sequence of statements just like you may write in C++ except that the instructions are simpler, do well-defined tasks that relate to the instruction name (e.g. add, mov, call), require a knowledge of the register names available in the CPU (e.g. EAX, ESP) and their roles, and require operands on which to operate. It's the operands that can cause the most difficulty as there is such a variety of combinations possible.

Note: writing programs exclusively in x86 assembly is rare these days, and requires very good programming skills as it is error prone and distinctly user un-friendly. However, learning assembly language gives valuable insights into how your C++ or other language code actually works, AND, can often be used to gain performance gains or to access CPU instructions that even the compiler doesn’t know about! Occasionally, specialist applications, such as embedded control systems, must be written in assembly language to gain the advantages offered by particular processor families or variants.

7

The Visual Studio C++ run-time debugger environment (pressing F11) Task Instead of simply running the Hello World program select Step Into from the Debug menu bar, (or use the F11 key) as shown below. Note: if the debug menu bar doesn't appear, right-click in the menu bar, and choose Debug from the pop-up menu, as shown.

This starts the single stepping debugger, which means that your program runs only when you press buttons - it's currently paused. A small yellow arrow should appear in the left-hand margin - this indicates the next line of your program to be executed when you single step using F11. Note: single stepping through C++ or assembly programs is extremely useful as a means of debugging faulty code, hence the names of the Debug menu, etc!

Once started the debugger gives you many options - either use the main Debug menu or right click in the main window to explore others. Depending on your local settings the debug menu bar should appear - you can drag this onto the menu bars at the top to get it out of the main screen (in the previous screenshot it is in the top-right corner). You can tailor this by adding a few extra buttons. Click the small down arrow indicated on the screenshot below and select "Add or Remove Buttons". Click on the green 'Continue' arrow icon, and the 'Windows' options, and then click outside of the pop up area. The extra buttons should appear in the bar. (You can do this kind of tailoring with the other menus too.) You will use F5 (Continue), F11 (Step Into, also known as Single Step) and the Windows buttons most often, but notice the others too.

8

Use this button to access other debug windows, later.

9

Tasks Use the Windows button on the bar to display a Registers window and a Memory1 window and re-size these so that they look like those shown below. Remember that registers are the smallest but fastest form of memory in a processor (built from D-type flip-flops). The memory displayed is not part of the processor, but is external to it - RAM.

The memory pane shows the 32-bit memory addresses on the left as 8 hex digits, followed by the data contents at those addresses, byte by byte (hence the 2 hex digits per memory cell). Here they are arranged as 4 columns - yours may show a different number so enter 4 for the number of Columns to match the screenshot. This is sensible as x86 generally works with 32 bit values (four bytes) - this is represented by the 8 hexadecimal digits in four pairs, each representing a byte. You could use 8 columns, or any other width of course right-click in the memory window for other options, and experiment with them (see opposite). Note that signed and unsigned, as well as integer and floating point numbers are displayed differently, as you might expect! The ASCII text in the right of the memory display can be turned off, but leave it here as we'll look for our 'Hello World' string in this memory window (as illustrated above).

10

Right-click in the Registers pane and select Flags - the most important flags for us are indicated in the screen shot below:

OVerflow

direction up/down

Enable Interrupts

Sign 'PLus'

Zero

Aux. Carry

Parity Even/Odd

Carry

Flags are 1-bit flip-flops - you are familiar with the idea of the Carry flag already. The others will become familiar. These flags are fundamentally important - they allow our programs to do what they do! Note: there are actually many more CPU flags than are shown here, but they are of less interest to us. Note the three-lettered names of the registers in this window as you'll see them frequently. The initial 'E' stands for 'Extended'. The first six (EAX, EBX, ECX, EDX, ESI and EDI) are termed general purpose registers, whereas the others have more specialist purposes: EIP is the Instruction Pointer, it holds the address of the next instruction to be executed, ESP is the Stack Pointer register, used for addressing memory, EBP is the Base Pointer register, also used for addressing memory, and EFL is the flags register (and it is the hex form of the individual 1-bit flags shown). You can see the 32-bit value contained in each register, displayed as 8 hexadecimal digits this is very useful information! Without these registers your programs would run extremely slowly!

Tasks Remove the breakpoint if it's still in your program. Your program is still in a paused state (as with an MP3 player). The yellow arrow represents the value of the Instruction Pointer register - we'll look more closely at that later. Hover your mouse over the yellow arrow and read the information displayed. Now single step through your program by pressing F11 (Step Into). Watch the yellow arrow in the left margin move down your program, as you do also notice how the hardware registers change colour (to red) to indicate they have changed value. As the EIP register contains the address of the next instruction it is always red! We'll look at this aspect of execution more closely later. What happens when you reach the call to printf? Keep pressing F11 - you are now deeply in the C++ libraries, specifically those that implement the printf function (notice how long they go on for - they represent quite a few instructions!) To escape from this no-man's land use the Step Out button in the debug bar (or Shift-F11) until you get back to the original Hello World listing. Check that the "Hello World" message has appeared on the console. Press F5 (Continue) to run the program normally to completion.

11

Task. Press F11 once again to step into your code - you now need to Disassemble it - press the appropriate button on the debug bar (or right click and select Go To Disassambly) - this is what we want - the assembly instructions that have been generated by the compiler - the mnemonic form of the machine code.

Original C++ code in bold

Assembly language Mnemonics in grey

Each instruction is stored in memory here are their addresses

This looks complicated but with a bit of description should become clearer. The original C++ lines are numbered 1:, 2:, 3: etc and are shown emboldened (if the numbers haven't appeared take the Show Line Numbers option on the pop-up debug menu). Otherwise, the left hand side of the screen shows the instruction address in memory. The x86 instruction (such as push, mov) is then given and finally the operands (data) used by the instruction are shown. The assembler instructions are slightly greyed out to distinguish them from the C++ statements from which they were generated - the assembly code follows the corresponding C++ statement. The original C++ lines are there purely as comments to help us understand the assembly code – remember, they are not stored in memory and hence no address is shown for these lines. The full assembly program is listed overleaf for convenience - for our purposes you can ignore those parts indicated as the Prologue and Epilogue - these are added automatically by the compiler (along with many other bits of code) and are needed to allow your program to be called from - and return to - the Visual Studio IDE. Again, the addresses on the left and other hexadecimal numbers in the instructions will be different on your displays. Note: you will see that compilers do lots of things on our behalf - most are beneficial ...


Similar Free PDFs