Sunday, September 15, 2013

ARM Bare Metal Programming

Embedded systems programming has been a passion of mine for a couple of years now. I really enjoy bringing a processor online and making it dance to the beat of my drum. Over the past few days I have taken an interest in writing my own linker scripts, start code and a CRT0 (C Runtime). In this article I will give you some background as to why this is important. I will guide you through the steps that I have taken to bring an ARM processor from reset to main() and beyond.

I am using the SAM4E16E processor on a SAM4E-EK development board. I have the Atmel SAM-ICE debugging tool to facilitate loading code. The steps that I am taking in this article will be very similar for other ARM processors. I did some bare metal programming on the Raspberry Pi about a year ago and it was similar.  My development host is an up to date Linux Mint 15 system.

Debugger (front) and Evaluation Kit (back)
SAM-ICE and SAM4E-EK

Toolchain


I have used the arm-none-eabi-gcc toolchain available from the GNU Tools for ARM Embedded Processors. I am using this alternate PPA because it supports more versions of Ubuntu (and by extension, Mint) but is based on the same source tree.
Follow these steps in a shell to install the arm-none-eabi toolchain.
sudo add-apt-repository ppa:terry.guo/gcc-arm-embedded
sudo apt-get update
sudo apt-get install gcc-arm-none-eabi

Compilation/Linking


In this section I will give you some background information about compilation and linking. From this simplified discussion, you will understand why a linker script is necessary.

The Anatomy of a Relocatable Object


Anatomy
There are numerous steps involved in compiling a C program. The first few steps include preprocessing, tokenization, syntax checking, building a symbol tree and more. Once the compiler has built an architecture-independent representation of your program it will create a relocatable object that contains the necessary instructions to implement your program.

I will present a simple program to show the various sections within an object file.
/*
 * Sample Program to Demonstrate Object Sections
 */

#include <stdint.h>

uint32_t i = 0x00;                  // .bss
uint32_t j;                         // .bss
uint32_t k = 100;                   // .data
char *phrase = "Andrew Rossignol";  // .rodata

int main(void) {                    // .text (all executable code)
    while(1);
    
    return 0;
}
An object file will have a number of sections, but we are specifically interested in:
  • .bss
    • Global variables whose values are 0 (uninitialized or assigned)
  • .data
    • Global variables whose values are initialized (non-zero)
  • .rodata
    • Read only globally defined variables
  • .text
    • Executable machine code
The remaining sections contain debugging information that I have safely managed to avoid for the sake of this project.

The program above can be compiled and disassembled using the following commands. The -c flag instructs the compiler to produce an object file that is not linked (more on this later). I have set the -g flag to enable debugging symbols. This will make our disassembly listing much easier to work with. Finally, I have disabled optimizations with -O0. The second command simply disassembles all sections and saves to a file.
arm-none-eabi-gcc -O0 -g -c -nostartfiles main.c -o main.o
arm-none-eabi-objdump -DS main.o > main.o.S
If you were to open main.o.S in a text editor, you would see the following disassembly listing. I have omitted the debug sections from this article as it would be quite long.
main.o:     file format elf32-littlearm

Disassembly of section .text:

00000000 <main>:
uint32_t i = 0x00;                  // .bss
uint32_t j;                         // .bss
uint32_t k = 100;                   // .data
char *phrase = "Andrew Rossignol";  // .rodata

int main(void) {                    // .text (all executable code)
   0: e52db004  push {fp}  ; (str fp, [sp, #-4]!)
   4: e28db000  add fp, sp, #0
   8: e24dd00c  sub sp, sp, #12
    int z = 0;
   c: e3a03000  mov r3, #0
  10: e50b3008  str r3, [fp, #-8]
    while(1);
  14: eafffffe  b 14 
Disassembly of section .data: 00000000 <k>: 0: 00000064 andeq r0, r0, r4, rrx 00000004 <phrase>: 4: 00000000 andeq r0, r0, r0 Disassembly of section .bss: 00000000 <i>: uint32_t i = 0x00; // .bss uint32_t j; // .bss uint32_t k = 100; // .data char *phrase = "Andrew Rossignol"; // .rodata int main(void) { // .text (all executable code) 0: 00000000 andeq r0, r0, r0 Disassembly of section .rodata: 00000000 <.rodata>: 0: 72646e41 rsbvc r6, r4, #1040 ; 0x410 4: 52207765 eorpl r7, r0, #26476544 ; 0x1940000 8: 6973736f ldmdbvs r3!, {r0, r1, r2, r3, r5, r6, r8, r9, ip, sp, lr}^ c: 6c6f6e67 stclvs 14, cr6, [pc], #-412 ; fffffe78 <phrase+0xfffffe74> 10: 00000000 andeq r0, r0, r0
You will see the phrase "Disassembly of section" numerous times in this file. It is easy to see how the variables and executable code relate between main.c and main.o.S. Take note that that the addresses in each section are starting at 0x00000000. This is because the object has not been "located". Each of these sections must be inserted into Flash Memory or SRAM at specific locations. This is the job of the linker and the linker script.

Linking


Now that you have an understanding of why linking is important I can explain what a linker script is. When you compile your program, you will end up with numerous object files that must be combined into one executable (memory space). This is known as linking. The linker needs to have an understanding of how big your flash memory is, how much RAM you have and where it should be putting the various sections. It is the job of the linker script to let the GNU Linker know where to locate the various sections of your binary. I will present a linker script that I have written for my C Runtime (more on this later).
/*
 * Linker script for SAM4E16E
 */

MEMORY {
    rom (rx)  : ORIGIN = 0x00400000, LENGTH = 0x00100000
    ram (rwx) : ORIGIN = 0x20000000, LENGTH = 0x00002000
}

_stackStart = ALIGN (ORIGIN(ram) + LENGTH(ram), 8);

...
In the MEMORY section we tell the linker what types of memory we have available. In the SAM4E we have ROM that is read-execute only. We also have RAM that is read-write-execute. We also indicate the start addresses and length of these memory segments. This information comes from the SAM4E datasheet (page 58).

I have also defined the default value of the stack pointer to be at the end of RAM. The stack on ARM processors grows downwards. This means that I essentially have no heap.
...

SECTIONS {
    .text : {
        *(.vectorTable*);
        *(.text*);
        *(.rodata*);
     } > rom
    
    . = ALIGN(4);
    _startFlashData = .;

...
I have defined three sections in my linker script. On line 4, I indicate that I wish to create a section named .text. In this data segment I am inserting a user defined section named 'vectorTable'. You will see this again in the C Runtime. I am also inserting the .text and .rodata sections that I presented earlier. On line 8 I am instructing the linker to insert these three sections into ROM which was defined in the MEMORY area.

On lines 10 and 11, I define a variable indicating the start of the data segment in flash. This may sound confusing at first glance. The data section contains globally defined variables that must exist in RAM before the C program executes. There is no way to permanently store these variables in RAM as SRAM is volatile storage. Instead the .data section is stored in flash and then copied to SRAM when the system boots. This is accomplished in the C Runtime (keep reading).

The special '.' variable refers to the current address in memory and ALIGN(4) will align the current address to the next even 32-bit address. Thus, we are snapping the current address to the next even 32-bit value. These lines create a "linker defined symbol" that can be referred to in C using the extern keyword or within the linker script itself.
...
    .data : AT(_startFlashData) {
        . = ALIGN(4);
        _startData = .;
        *(.data*);
        . = ALIGN(4);
        _endData = .;
    } > ram
    
    .bss (NOLOAD) : {
        . = ALIGN(4);
        _startBss = .;
        *(.bss*);
        . = ALIGN(4);
        _endBss = .;
    } > ram
}

...
Next, I am instructing the linker to store the .data section at the address stored in the _startFlashData variable. I define two variables within the .data section, the starting address and the ending address of the .data section in RAM. I will use these symbols in the C Runtime.

The last section, .bss, is very similar to the .data section except I am instructing the linker not to store it (NOLOAD) anywhere in the image. This makes sense as the .bss segment consists only of zero valued variables.

C Runtime (crt0)


As I mentioned earlier, a C program has a few dependencies that must be satisfied before it can execute. This happens in the C Runtime, also known as crt0. The C Runtime is executed before your program starts. It takes care of setting up the stack pointer, initializing RAM, setting up the standard library and calling your main(). I have taken some hints from the Atmel Software Framework when writing this code.
The following is a C Runtime that I have written. In the linker defined symbols I obtain the values of the symbols generated in the linker script.

I have used a compiler directive (__attribute__) in the Interrupt Vector Table to instruct the compiler to declare the IVT array in its' own section. The IVT must be located at the beginning of flash memory in these processors.
/*
 * C Runtime for the SAM4E Processor
 *
 * Initializes the processor by setting the stack pointer, initializing the
 * vector table, copying data into sram, clearing bss and calling main.
 */

#include <stdint.h>

/* Linker Generated Symbols ***************************************************/

extern uint32_t _stackStart;     // Start of the stack

extern uint32_t _startFlashData; // Beginning of the .data segment in flash
extern uint32_t _startData;      // Start of the .data segment in sram
extern uint32_t _endData;        // End of the .data segment in sram

extern uint32_t _startBss;       // Beginning of the .bss segment
extern uint32_t _endBss;         // End of the .bss segment

/* Interrupt Vector Table *****************************************************/

void crt0(void);

__attribute__ ((section(".vectorTable")))
const void *VectorTable[] = {
    &_stackStart, // Stack Pointer
    &crt0         // Reset Vector
};

/* C Runtime ******************************************************************/

extern int main(void);

void crt0(void) {
    // Copy data
    uint32_t *pSrc = &_startFlashData;
    uint32_t *pDest = &_startData;
    
    while(pDest < &_endData) {
        *pDest = *pSrc;
        
        pDest++;
        pSrc++;
    }
    
    // Zero bss
    pDest = &_startBss;
    while(pDest < &_endBss) {
        *pDest = 0;
        pDest++;
    }
    
    // Call through to main
    main();
    
    // Trap a main that returns
    while(1);
}

The last bit of C code initializes ram and calls main. The first thing I do is copy the .data segment from flash into ram. I write zeros to the entire bss region and then call main.

In embedded systems, it is atypical to have a main return. I have added a while(1) catch at the end of the runtime. This ensures that the processor is never in an undefined or unknown state.

Test Program!


I completed this project incrementally. I was testing at each step through the process to ensure that the final executable image contained the correct values at the correct memory addresses. I was able to declare global variables and use them at runtime (tested using the LEDs and by inspection of the disassembly listing).

I wrote a header file to describe the SAM4E PIO controller and used it to enable the blue and amber LEDs. I also made the green LED blink with a crude software delay.
/*
 * Header file for the SAM4E Processor
 *
 * Defines registers and macros for use of the SAM4E processor.
 */

#ifndef SAM4E_H
#define SAM4E_H

/* PIO Controller *************************************************************/

// Base Bank Addresses
#define PIO_A       (uint32_t)0x400E0E00
#define PIO_B       (uint32_t)0x400E1000
#define PIO_C       (uint32_t)0x400E1200
#define PIO_D       (uint32_t)0x400E1400
#define PIO_E       (uint32_t)0x400E1600

// Register Offsets
#define PIO_PER     (uint32_t)0x00000000
#define PIO_PDR     (uint32_t)0x00000004

...

#define PIO_PCISR   (uint32_t)0x00000160
#define PIO_PCRHR   (uint32_t)0x00000164

/*
 * Sets a PIO register given a bank, an offset and a new value
 */
#define pio_set_register(bank, offset, value) { \
    (*(uint32_t *)(bank + offset)) = value;     \
}

/*
 * Gets a PIO register given a bank and an offset
 */
#define pio_get_register(bank, offset) {   \
    return *((uint32_t *)(bank + offset)); \
}

#endif
I make use of this header file in my simple test program.
/*
 * Basic SAM4E Test File
 */

#include <stdint.h>

#include "sam4e.h"

// Watchdog Defines
#define WDT_MR *((uint32_t *)0x400E1854)

int main(void) {
    // Disable the watchdog
    WDT_MR = 0;
 
    // Configure the PIO lines
    pio_set_register(PIO_A, PIO_OER, (1 << 0));
    pio_set_register(PIO_D, PIO_OER, (1 << 20) | (1 << 21));
    
    // Enable the blue and amber LEDs
    pio_set_register(PIO_A, PIO_CODR, (1 << 0));
    pio_set_register(PIO_D, PIO_CODR, (1 << 20));
    
    // Blink the green LED
    while(1) {
        pio_set_register(PIO_D, PIO_CODR, (1 << 21));
        
        volatile uint32_t cnt = 1;
        while(cnt++ < 100000);
        
        pio_set_register(PIO_D, PIO_SODR, (1 << 21));
        
        cnt = 1;
        while(cnt++ < 100000);
    }
    
    
    return 0;
}
Everything works just as I had expected it to. It took me just over a week with a few hours here and there part time to put all of the pieces of this puzzle together.

The CRT0 Grin :]
If you have any questions at all, do not hesitate to leave them in the comments below. I have tried to write this as a guide for someone who was in my position a year ago. I am sure that there are some improvements that I could make to my CRT and I would really like to see them.

Here are the source files used for this post.

Thanks for reading!

13 comments :

  1. This is a good starting introduction, well done. The only improvement I'd suggest is looking into the arm ldm and stm instructions (load and store multiple) which can move upwards of 40 bytes per instruction to/from sequential memory addresses. They also put the bus into burst mode (double data rate). A memcpy implemented with these would speed up the loading of large programs by your runtime (rather than 4 bytes at a time in your loop).

    ReplyDelete
    Replies
    1. Thanks!

      I am just getting into ARM architecture and I will definitely keep this in mind.

      Delete
  2. Has someone who is starting to mess with embedded systems using arm micros, i really liked this post! Simple and easy to understand!

    There is just one thing i don´t understand. In the vector table (in the C Runtime, not the linker script), shouldn't the reset vector be in the first position?

    ReplyDelete
    Replies
    1. I work with a senior embedded systems designer and he told me that the stack pointer will automatically be pulled off of the first location in the vector table.

      Pretty awesome.

      Delete
    2. According to ARM's documentation, the first location in the vector table is actually the initial stack pointer. The reset vector comes second.

      http://infocenter.arm.com/help/topic/com.arm.doc.dui0552a/BABIFJFG.html

      Delete
  3. I've been wanting to do bare metal for the arduino due(ATSAM3X8E) and this article is exactly what I was looking for. This will help immensely.

    ReplyDelete
  4. Thanks for this! I've been dabbling around with ARM development for a while now, but I'm always getting hung up on unnecessary things like getting debugging working and picking the best IDE for the job. This got me ready to just drop all that and try writing some low level code for the TI Stellaris and LPCXpresso.

    ReplyDelete
  5. Hello Andrew, it's a nice blog the one you opened. Do you have a twitter account so I can follow your posts? Thanks!

    ReplyDelete
    Replies
    1. Hi! Unfortunately I do not use Twitter at this time. You can subscribe to an RSS feed or just bookmark my site and come back once in a while.

      Thanks for the comments!

      Delete
  6. Just a comment on where you say the character string goes.

    The string "Andrew Rossignol" goes into .rodata but the variable phrase actually goes into .data. This is because it wasn't declared the right kind of const. Looking at the linker map will show you where the variables end up. Below is a demo that will show you this.

    Additionally, if you were to declare phase as an char[] instead of char*, the full string would end up in both .data and .rodata (copied from .rodata into data during init).

    /* Compile with:
    gcc -O0 -g -c -Wl,-Map=main.map main.c -o main.o
    */
    #include

    char* phrase1 = "phrase 1 is in .data";
    char* otherphrase = "where did phrase 1 go?";
    const char* phrase2 = "phrase2 is also in .data";
    char* const phrase3 = "phrase3 is in .rodata";
    const char* const phrase4 = "phrase3 is in .rodata";

    int main( int argc , char** argv ){
    (void)argc;
    (void)argv;

    phrase1 = otherphrase;

    printf( "%s\n" , phrase1 );
    printf( "%s\n" , phrase2 );
    printf( "%s\n" , phrase3 );
    printf( "%s\n" , phrase4 );

    return 0;
    }

    ReplyDelete
  7. the pio_get_register define isn't currently used but probably should not have the return?

    ReplyDelete