C compilation steps

“c compilation steps”

Use this command to generate files for each of these steps

gcc -Wall -save-temps cprogram.c -o cprogram

The are intermediate files that will be generated from the command:

  • cprogram.i: generated by preprocessor
  • cprogram.s: generated by compiler
  • cprogram.o: generated by assembler
  • cprogram: generated by linker

Now lets take this piece of code and compile it using gcc with the command mentioned above and see what it produces in each step.

#include<stdio.h>
#define x 10
int main(){
    //this is a comment
    printf("%d", x);
}

Step 1: preprocessor

Preprocessed code:

# 0 "cprogram.c"
# 0 "<built-in>"
# 0 "<command-line>"
....
....
extern FILE *fopen (const char *__restrict __filename,
      const char *__restrict __modes)
  __attribute__ ((__malloc__)) __attribute__ ((__malloc__ (fclose, 1))) ;
....
....
extern int __uflow (FILE *);
extern int __overflow (FILE *, int);
# 967 "/usr/include/stdio.h" 3 4
# 2 "cprogram.c" 2
# 3 "cprogram.c"
int main(){
                      // comment is gone
    printf("%d", 10); // x is replaced
}

we can also use this command to extract the preprocessed code using gcc:

gcc -E cprogram.c

or this,

cpp cprogram.c

Here cpp means c preprocessor.

So what does preprocessor do?

  • Removal of comments
  • Macro expansion (here xis replaced by 10)
  • Expansion of the included files
  • Handling conditional compilation
What is conditional compilation? Conditional compilation enables us to include or exclude specific sections of code based on certain conditions. We can use directives like #ifdef, #ifndef, #if, #elif, #else, and #endif to conditionally compile code blocks. This feature allows us to adapt our code to different platforms or configurations.

Step 2: compilation

compilled code:

	.file	"cprogram.c"
	.text
	.section	.rodata
.LC0:
	.string	"%d"
	.text
	.globl	main
	.type	main, @function
main:
.LFB0:
	.cfi_startproc
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register 6
	movl	$10, %esi
	leaq	.LC0(%rip), %rax
	movq	%rax, %rdi
	movl	$0, %eax
	call	printf@PLT
	movl	$0, %eax
	popq	%rbp
	.cfi_def_cfa 7, 8
	ret
	.cfi_endproc
.LFE0:
	.size	main, .-main
	.ident	"GCC: (GNU) 13.2.1 20230801"
	.section	.note.GNU-stack,"",@progbits

Use this command to generate assembly code using gcc:

gcc -S cprogram.c

So does compilation step do?

  • It checks the syntax and semantics of your code, performs type checking.
  • Generate intermediate object code.

Step 3: Assembling

Assembler generated code:

ELF>X@@
UH��
H�H�Ǹ��]�%dGCC: (GNU) 13.2.1 20230801 GNU��zRx�$A�C
_��$cprogram.cmainprintf���������������� .symtab.strtab.shstrtab.rela.text.data.bss.rodata.comment.note.GNU-stack.note.gnu.property.rela.eh_frame @$@�0&d,d1d90gB�R�0j�8e@�	��	��t

Create an Object file from the Assembly file:

as cprogram.s -o cprogram.o

or we can also use gcc:

gcc -c cprogram.c

NoTE: The object file contains a binary version of the machine language that was created from your c source code.

This object file is relocatable in nature.

what does relocatable mean? The term relocatable indicates that the object file is not yet tied to a specific memory location in the final executable program. It contains information about the code and data segments but doesn’t have fixed memory addresses. This flexibility allows the linker to determine the appropriate memory addresses for different sections of code and data when creating the final executable.
What does assembler do? An assembler generates object files that contain low-level machine code instructions. This code is not easily readable by humans.

Step 3: linker

linker

What does the Linker do?

  • combines multiple object files in a single output file. Link our program with the precompiled libraries provided to us by the C compiler(like printf()).

  • Resolving External References: When you use functions or variables defined in other source files or libraries, your program relies on external references. The linker connects the function calls and variable references in one object file with the corresponding definitions in other object files or libraries.

  • Symbol Table Management: The linker maintains a symbol table, which serves as a map connecting symbols (such as function and variable names) to their memory locations in the object files. The symbol table enables the linker to resolve references and establish the correct associations between the symbols used in different parts of the program.

  • Handling Library Dependencies: Many programs make use of external libraries that contain precompiled code for common functions or specialized tasks. The linker handles the inclusion of these libraries by linking them with the object files.

  • After all of these it generates the executable.

Example program to understand linker better:

Lets create our own header file test.h. This file contains a function prototype like this:

// test.h
void testBegin();

let’s provide the function definition of the header file in test.c

//test.c
#include<stdio.h>
#include "test.h"

void testBegin(){
    printf("Starting the test...");
}

Lets call this method from main function:

//cprogram.c
#include<stdio.h>
#define x 10
#include "test.h"
int main(){
    //this is a comment
    printf("%d", x);
    testBegin();
}

Here is the folder structure:

folder structure

Now that we have two .c files, how are we going to compile them? If you’re thinking something like gcc cprogram.c, you’re mistaken. Here’s what it will result in:

/usr/bin/ld: /tmp/ccNAESVa.o: in function `main':
cprogram.c:(.text+0x23): undefined reference to `testBegin'
collect2: error: ld returned 1 exit status

What is the solution?

we need to generate two separate object files, which will be linked by the linker to create a final executable object file. like this:

gcc -c cprogram.c
gcc -c test.c

The above commands create two distinct relocatable object files. Now, let’s proceed with linking these object files. We cannot use the ld command to link these two .o files. If we try this command ld test.o cprogram.o here is what we will get:

ld: warning: cannot find entry symbol _start; defaulting to 0000000000401000
ld: test.o: in function `testBegin':
test.c:(.text+0x14): undefined reference to `printf'
ld: cprogram.o: in function `main':
cprogram.c:(.text+0x19): undefined reference to `printf'

So our best bet would be to use gcc, and here is how to do it:

gcc test.o cprogram.o
./a.out

Output:

10Starting the test…

To summarize what we’ve done:

cprogram.c -> cprogram.o \
                          \
                    linker + ---> a.out
                          /
                         /
        test.c -> test.o

All the commands:

Where in Linux are all the standard libraries located? /usr/lib
Where are the include directories located in Linux? /usr/include

c-program-compilation-process