4. chapter 4-loaders-and-linkers System program and compiler PDF

Title	4. chapter 4-loaders-and-linkers System program and compiler
Course	Computer Science
Institution	Pillai College of Engineering
Pages	23
File Size	1.1 MB
File Type	PDF
Total Downloads	90
Total Views	190

Preview

CLICK TO PREVIEW PDF

Summary

lecture notes regarding linker and loaders from SPCC Mumbai university
syllabus...

Description

Chapter 4: Loaders & Linkers Topics in this Chapter: ● What is a Loader? o Basic Functions of a Loader ● What is a Linker? o Functions of a Linker ● General schematic of Linking / Loading function ● Loading Schemes: o Translate-and-go Loader o Absolute Loader o Relocation and Relocating Loaders ▪

Relocation Concept

▪

Relocation Bits

▪

Types of Programs w.r.t Relocation

▪

Relocating Loader

o Direct Linking Loader ● Loader Design Options ● Chapter Summary ● Expected Viva Questions

4.1 What is a Loader?

Dr.Sharvari Govilkar TE COMP A

Page 1

Loader is a system program that prepares an object program for execution, places it into the main memory and initiates the execution.

Dr.Sharvari Govilkar TE COMP A

Page 2

Following figure depicts the general working of a Loader:

3. Relocation: As per its need, the OS may move (i.e. relocate) one or more segments of the program from one are of the memory to another.

When the program gets relocated, instructions referring to code or data in these relocated segments must also be changed. Such instructions which must be changed when relocation occurs are called as “address sensitive” instructions.

The job of OS is to adjust addresses of all such address-sensitive instructions, when OS relocated one or more program segments.

(Note: Relocation function must be carried out every time the OS relocates the program segments; Relocation is often performed by a Linker for the Loader). 4. Loading: The Loader finally places the executable code in the main memory and initiates the execution by transferring control to the starting location of program in the memory. Dr.Sharvari Govilkar TE COMP A

Page 3

4.2 What is Linker?

A Linker is a system program that resolves all address references within and among several object modules of a program and combines them to produce a single “monolithic” .exe file.

A Linker essentially performs linking and relocation functions for a Loader. 4.2.1 Functions of a Linker A Linker basically performs following three functions:

1. Linking Object Files: A linker links multiple relocatable object files used by a program and generates a single .exe file that can be loaded and executed by the Loader.

2. Resolving External References: While linking those object files, the linker resolves inter-segment and inter-program references to generate a single continuous executable file.

3. Relocate Symbols: A linker relocates symbols from their relative locations in input object files to new absolute positions in the executable file.

4.3 General Schematic of Linking / Loading The following figure gives a general idea of a Linking / Loading scheme:

4.4 Loading Schemes There are 4 types of basic loading schemes: 1. Assemble (or Compile) – and – Go Loader 2. Absolute Loader 3. Relocating Loader 4. Direct Linking Loader (DLL)

4.4.1 Assemble (or Compile) – and – Go Loader

The Loader function here is simply an extension of Assembler. (Sometimes it’s also called as Translate-and-Go Loader). Following figure depicts the working of Assemble-and-Go Loader:

Dr.Sharvari Govilkar TE COMP A

Page 5

The address of storage in memory is decided by programmer and stays fixed (i.e. static).

The Loader simply takes each line of machine code generated by Assembler and places it into the main memory at a fixed location. This process continues until all of machine code has been placed into the memory.

Finally, the Loader initiates the execution by transferring control of execution to the starting location of the program.

As seen in the figure, the Assemble-and-Go Loader does not generate an object file.

Advantage of such loading scheme is that is a very simple scheme to implement as the only function of Loader is to place the object code in the memory and initiate the execution.

Disadvantages are: 1. Since no object file is generated in this scheme, the program needs to be assembled for every run, even if no change is made to the source code. Due to this the execution time is high.

2. Relocation would require the programmer to change the start address of the program (by using START pseudo-opcode) and re-assemble it in order to relocate it. So, relocation is a tedious and time-consuming job.

3. Also, this loading scheme cannot handle multiple program segments and subroutines in the program; the entire program must be written as a single piece of code. 4.4.2

Absolute Loader

In this scheme, the assembler generates the object file which can be stored on secondary storage instead of being directly loaded into the memory.

Dr.Sharvari Govilkar TE COMP A

Page 6

Along with each object file, the assembler also gives information about starting address and length of that object file.

Here, the programmer does the allocation and linking functions explicitly for the Loader i.e. he must know where the program must be placed in the memory and must link the subroutines explicitly and manually (by using the available instructions appropriately).

The loader simply does the task of loading the object file into the memory and initiating the execution.

Following figure indicates general schematic of an Absolute Loader:

procedures, subroutines etc.) with actual usable addresses in the memory before running the program.

Dr.Sharvari Govilkar TE COMP A

Page 7

Relocation is first done at compile-time typically by a Linker. During execution, one or more segments of the program may be placed to some other memory area (i.e. relocated) by the OS. In such case, the loader performs relocation before the program is executed.

An .exe file contains following information: 1. Machine code and data with relative address 2. Relocation Bits 3. Length of file

4.4.3.1 Relocation Bits When relocation is to be performed, the ‘new’ starting location of the program is added to relative address of all relocatable bytes.

To identify which byte of object file is relocatable and which is not, relocation bits are used.

Typically, for each byte of machine code, we have a single bit that is set to 1 if it is relocatable (i.e. if it’s an address-sensitive entity) or 0 if it is not relocatable.

There are two ways in which these relocation bits can be added to the object file in 2 ways: 1. At the end of each instruction – As shown in the following example:

Relative Address 00 02 04 06 07 08 09

Instruction

Machine Code Byte 1 Byte 2 LOAD A 08 07 ADD B 01 08 STORE C 09 09 STOP 12 A DB 09 07 B DB 18 06 A DW 0000 00 00

Dr.Sharvari Govilkar TE COMP A

Page 8

Relocation Bits Byte 1 Byte 2 0 1 0 1 0 1 0 0 0 0 0

As we can see in above code, opcodes and data definitions are not relocatable since they do not include any address-sensitive entities; only symbolic references, to variable names or subroutine / procedure names (not in the example), are relocatable.

One drawback with this way to supplying relocation information is that storage and retrieval becomes complex.

2. By using Relocation Bit Mask A relocation bit is associated with each byte of object code and all such bits form a relocation bit mask. This mask is appended at the end of machine code.

So, to retrieve this mask, length of machine code should be supplied in the header.

For example: Machine code of same source code as in the previous example can now be added with relocation bit as follows: Machine 08 Code Relocation 0 Bit Mask

07 1

01 0

08 1

09 0

09 1

12 0

09 0

18 0

00 0

The advantage here is that storage and retrieval of relocation bits is simplified.

Dr.Sharvari Govilkar TE COMP A

Page 9

00 0

4.4.3.2 Types of Programs w.r.t Relocation Based on relocation, programs can be broadly classified as:

1. Non-relocatable Programs – These are static programs whose memory area is fixed at the time of coding and remains static i.e. cannot be changed thereafter. (For example, the OS)

2. Relocatable Programs – These programs can be relocated to different memory areas as and when memory storage is needed by the OS.

With the help of relocation information in the .exe file, the linker (at compile time) or relocating loader (at run time) will perform the functions needed to relocate the program.

3. Self-relocatable Programs – Such programs have small part of code (or subprograms) embedded in them which handle the operations needed to relocate the program.

When OS relocates some (or all) part of code, control is transferred to the “relocating subprogram” which adjusts addresses of its address sensitive portions of the code.

4.4.3.3 Relocating Loaders

In all the previous loading schemes, allocation and linking had to be done explicitly and manually by the programmer. Also, relocation would require re-assembling of all segments (even if one segment is changed).

Dr.Sharvari Govilkar TE COMP A

Page 10

To overcome this problem, a general class of relocating loaders was introduced which allows multiple procedure segments and one data segments shared by all. For each program, the assembler produces following information to be used by the Relocating Loader: 1. Assembled version (machine code) of all segments 2. Inter-segment Reference (if needed) 3. Relocation Information 4. Length of Program 5. Transfer Vector

Transfer Vector (TV) is a global table for entire program; it contains list of external subroutines referenced by the program. Structure of Transfer Vector as follows: Subroutine Name . . .

Transfer Instruction . . .

For each subroutine, the transfer vector contains a transfer instruction that branches the execution flow to the start address of subroutine in the memory.

The Assembler replaces call to each subroutine in the program with a branch to its corresponding entry in the Transfer Vector.

After machine code and TV have been loaded into the memory, the loader loads all the subroutines in the memory.

So, for executing the call to a subroutine, first its corresponding entry in TV is referred and from there, a branch is taken to actual location of the subroutine in the main memory.

This “double-branch” is needed because TV is generated by Assembler, but the assembler doesn’t know beforehand where the subroutines will be loaded in the memory.

So, it simply replaces the call to subroutines with a branch to its corresponding location in TV and Loader fills the corresponding entry in TV with the actual address where subroutine is stored in the memory.

In short, we can say a relocating loader uses: ● Program Length – For Allocation ● Relocation Bits – For performing Relocation ● Transfer Vector (TV) – For Linking (by “double-branch”) Use of Transfer Vector has following advantages for relocating loader:

1. Any change in program size can be taken care of dynamically 2. Only the required subroutines can be kept in main memory rather than keeping all subroutines at the same time.

But the relocating loader has some disadvantages too:

1. The Transfer Vector (TV) based approach is suitable for resolving external subroutine linkages and not references to external program data.

2. If there are several subroutine linkages in the program, size of TV (and therefore the object code) increases. So, it occupies more space, as TV is always needed in the main memory.

4.4 Direct-Linking Loader (DLL)

Direct-Linking Loader is the most widely used loading scheme; it belongs to the class of relocating loaders i.e. it performs all four main functions of a loader (allocation, linking, relocation, loading)

It has two significant advantages over the general Relocating Loader scheme: 1. It allows use of multiple program segments as well as multiple data segments.

2. External program references i.e. references to data (and subroutines) defined in other programs can also be resolved.

The Assembler generates following information for every program segment and passes on to the DLL:

1. Machine Code 2. Relocation Information 3. Length of each segment 4. PUBLIC Table – includes name, type and (relative) definition addresses of symbols defined by a current program that can also be referred by other program segments (i.e. symbols specified by PUBLIC keyword). Structure of PUBLIC Table: Symbol Name

Type

. . .

. . .

Dr.Sharvari Govilkar TE COMP A

Definition Address

Page 13

5. EXTERN Table – includes name, type and (relative) usage addresses of symbols used by a current program that has been defined externally in some other program (i.e. symbols specified by EXTERN keyword). Structure of EXTERN Table: Symbol Name

Type

. . .

. . .

Usage Address

(Note that all above mentioned entities are generated by Assembler separately for each program segment) The addresses in these PUBLIC and EXTERN keywords are all relative to the starting location from where their respective program is loaded in the main memory.

DLL will concatenate machine code of all program and data segments and generate a single “load module”.

So, the basic tasks of DLL can be given as follows:

1. Replacing the relative addresses in PUBLIC and EXTERN tables with their actual (offset) locations within the concatenated load module.

2. While object modules are being concatenated, DLL also replaces all external references in machine code of the program with their actual (offset) locations within the load module.

3. Finally, load the concatenated load module.

DLL works on a Two-pass Algorithm: Dr.Sharvari Govilkar TE COMP A

Page 14

Purpose of Pass 1 is to construct a Global External Symbol Table (GEST) by gathering all externally referenced symbols.

Purpose of Pass 2 is to use GEST and generate a single concatenated load module.

4.4.1 Basic Data Structures of DLL DLL uses following data structures:

1. PUBLIC and EXTERN Tables for every program segment (supplied by Assembler) 2. Global External Symbol Table (GEST)

The structure of GEST is as follows:

Symbol Name

Type

. . .

. . .

Usage Address

Definition Address

GEST is simply a global collection of external references that gathers all symbols from all PUBLIC an EXTERN tables into one table. 3. Intermediate Load Module (ILM) – Pass 1 of DLL dumps machine code of all input object modules into ILM and reads only PUBLIC and EXTERN tables of each program segment to construct GEST. These ILM and GEST are then passed onto pass 2 which actually performs relocation. Apart from these data structures, DLL uses following pointers: 1. GEST_PTR – To keep track of location being read from GEST (in pass 1) or being written into GEST (in pass 2).

Dr.Sharvari Govilkar TE COMP A

Page 15

2. Location Counter (LC) – To keep a track of location being written into the Intermediate Load Module i.e. ILM (in pass 1) or into main memory (in pass 2).

4.4.2 Pass 1 of DLL Algorithm Initializations in Pass 1: 1. Open the Source Object Files in Read mode and Intermediate Load Module (ILM) in Write mode. 2. Setup GEST. 3. Initialize GEST_PTR to first entry of GEST.

Working of DLL Pass 1: Step 1: After initializations, read the main module Step 2: Read the length of module Copy all code statements to Intermediate Load Module (ILM) till end of machine code Step 3: Read PUBLIC table 3.1 ฀ Read a record of PUBLIC table. 3.2 ฀ Search the symbol in GEST. 3.3 ฀ If found, add current LC to definition address and update the new (modified) definition address in the corresponding GEST entry. 3.4 ฀ If not found, first copy its entry from PUBLIC table to GEST, then add current LC to definition address and update the new definition address in corresponding GEST entry; only the usage address is not updated here. 3.5 ฀ Is END of PUBLIC table reached? If NO, then Increment GEST_PTR and repeat from Step 3.1. If YES, then continue. Step 4: Read EXTERN table 4.1 ฀ Read a record of EXTERN table. 4.2 ฀ Search the symbol in GEST. 4.3 ฀ If found, add current LC to usage address and update the new (modified) usage address in the corresponding GEST entry. Dr.Sharvari Govilkar TE COMP A

Page 16

4.4 ฀ If not found, first copy its entry from EXTERN table to GEST, then add current LC to usage address and update the new usage address in corresponding GEST entry; only the definition address is not updated here. 4.5 ฀ Is END of EXTERN table reached? If no, then increment GEST_PTR and repeat from Step 4.1. If yes, then continue. Step 5: Is END of all source files (i.e. END of input object module) reached? If NO, then repeat from Step 2. If YES, then go to Pass 2

4.4.3

Flowchart for Pass 1 of DLL

4.4.5

Flowchart for Pass 2 of DLL

and writing the linked program into an executable image.

2. Linkage Editor – It performs all linking and (some) relocation prior to load time and writes the linked program into an “executable image”. This approach is suitable when the program has to be re-executed many times without being re-assembled. (The basic difference between Linking Loader and Linkage Editor is analogous to the difference between Assemble-and-Go Loader and Absolute Loader)

3. Dynamic Linking – Linking function is performed at run time and facilities of the OS are used to load and link subprograms at the time when they are first called. Dynamic Linking is often used to allow several executing programs to share one copy of a subroutine through a library; all supporting run-time routines common to a set of executing programs (e.g. programs of same language) can be packed into a single “library” of subroutines, which is shared by all programs. Dynamic Linking avoids the necessity of loading the entire library for each execution; subroutines in the library can be loaded only when required.

Dr.Sharvari Govilkar TE COMP A

Page 18

4.5.1 Comparison: Linking Loader vs. Linkage Editor

Editor:

Dr.Sharvari Govilkar TE COMP A

Page 19

1.

Linking Loader

Linkage Editor

A linking loader performs linking and

A Linkage Editor performs linking

relocation at run time and directly loads the

and (some) relocation operations

linked program into the memory.

prior to load time and writes the linked program to a file for later execution.

2.

3.

It resolves external references and performs

External references are resolved and

library searching every time the program is

library searching is performed only

executed.

once

Executable image is not generated; the linked

The linked program is written into

program is directly loaded into memory.

an executable image, which is later giv...