io7m | single-page | multi-page | epub | The ATMega328P From (Almost) Nothing

The ATMega328P From (Almost) Nothing

IDENTIFIER ef5362ed-1b16-4632-92b5-9d2f49526423
TITLE The ATMega328P From (Almost) Nothing
CREATOR Mark Raynsford
DATE 2021-08-28T09:00:00Z
LANGUAGE en
RIGHTS Public Domain
This book describes a process that can be used to configure a factory-fresh ATMega328p IC such that it provides a convenient platform for microcontroller development. The intention is to fully document the process of extracting information from datasheets, configuring components on a breadboard, and writing code in a manner that fosters a real understanding of the underlying hardware. The process uses raw, individually-sourced electronics components, and does not use any third-party software outside of the C compiler used to compile code. The C code is written such that convenient platform "helper" header files are eschewed in favour of writing new definitions based on the contents of the hardware datasheets[1].
The reader is expected to be familiar with the following subjects: C programming (although no particular experience in embedded programming is required), and the basics of plugging components into a breadboard.

Footnotes

1
As opposed to simply copying and pasting Arduino sketches and hoping for the best.
References to this footnote: 1
The following components are required:

2.1.2. BOM

Quantity Item Description
1 Microchip ATMega328p Microcontroller IC.
1 USB ISP In-system programmer with USB connection.
1 FTDI LC234X USB ↔ USART adapter.
1 5V Breadboard power supply Breadboard power supply.
2 22 pF ceramic capacitor Capacitors for clock signals.
1 16mhz crystal oscillator 16mhz microcontroller clock.
1 20mA LED Status LED
1 220mΩ resistor Resistor to reduce current for LED
10+ Breadboard wires Required to connect components together.
The Microchip ATMega328p is widely available from electronics component stores. Be sure to order the 28 SPDIP package in order to make it easy to insert the chip into a breadboard.
There are many USB ISP programmers available, but the exact model used in this book is the DFRobot USBtinyISP - Arduino Bootloader Programmer.
The FTDI LC234X USB ↔ USART adapter is widely available from electronics components stores. Any USB ↔ USART adapter will be usable and they are largely interchangeable.
The breadboard power supply can be any power supply that can supply 5V to a breadboard.
The first step required in any electronics project is to obtain datasheets for all of the components. Whenever this book makes references to "the datasheet", it is referring to the official datasheet for the ATMega328P. The datasheet is available on the Microchip site. Specifically, this book was written against the DS40002061B datasheet (which contains Microchip branding as opposed to the older Atmel branding).
The first step required is to insert the microcontroller onto the breadboard. In the datasheet, on page 12, we find a diagram of the pin configuration for the dual in-line package form of the chip:
The chip features a small indentation at one end, and a small dot next to Pin 1. It doesn't matter how we orient the chip in the breadboard as long as we understand where Pin 1 is; if we can locate that pin, we can follow the pins counter-clockwise around the chip to locate any other numbered pin. It's common practice to insert dual in-line package chips across the center divider in a given breadboard, as this allows for the row of pins on one side of the chip to be electrically disconnected from the row on the other side.
On page 13, the datasheet indicates that we need to connect the IC to a voltage source, and also to ground. Specifically, it indicates that the Vcc pin must be connected to a voltage source, and also that the AVcc pin "should be externally connected to Vcc, even if the ADC is not used." The datasheet also indicates that the AVcc pin should be connected to the voltage source via some kind of low-pass filter if the ADC functionality in the microcontroller is going to be used [1]. Because the circuit in this example will not be making use of an ADC, this pin can be safely connected directly to the voltage source without using a filter.

4.2.2. Connections

  1. Connect pin 7 to the breadboard 5V rail.
  2. Connect pin 8 to the breadboard ground rail.
  3. Connect pin 22 to the breadboard ground rail.
  4. Connect pin 20 to the breadboard 5V rail.
By default, ATMega328p chips are configured to use an internal 8mhz oscillator. The chip is factory configured to divide the ticks of the internal oscillator by 8, yielding a configuration that runs the chip at 1mhz. We, however, want to run the chip at a higher clock speed of 16mhz, and therefore need to connect an external oscillator crystal.
Page 36 of the datasheet indicates how all of the various system clocks are derived:
Pages 39 and 40 of the datasheet indicate that we can't just connect an oscillator directly to the chip; we must include a pair of capacitors it refers to as C1 and C2. The reason for this, as ever, is noise: If a circuit is very noisy, then small spikes of noise could be misinterpreted as oscillator pulses, yielding a system clock that behaves erratically. The datasheet gives suggested values for ceramic capacitors for a 16mhz clock, and we're using the highest suggested values of 22pF. The datasheet indicates that the oscillator should be connected with two capacitors in parallel, and also connected to ground:
Looking back at the pin diagram, we can see that the two clock pins, XTAL1 and XTAL2, are on pins 9 and 10, respectively. Neither the crystal oscillator or the ceramic capacitors are polarized components, so don't be concerned about inserting them the wrong way round; they can be inserted in any orientation.

4.3.7. Connections

  1. Connect the two pins of the crystal oscillator to pins 9 and 10.
  2. Connect C1 to pin 9 and to the breadboard ground rail.
  3. Connect C2 to pin 10 and to the breadboard ground rail.
At this point, it might be surprising to learn that this is actually all that's required to run the chip. If we were to supply voltage at this point, the microcontroller would power up and would begin executing code. The problem, obviously, is that the microcontroller doesn't contain any code. The next step, therefore, is to set up the programmer required to actually get code onto the chip.

Footnotes

1
A low-pass filter is required in order to reduce any noise that may be introduced into the circuit from the voltage source.
References to this footnote: 1
By default, ATMega328p chips contain no code. It's therefore required to use some sort of hardware tool to actually get code onto the chip. The ATMega328p contains 32KBytes of programmable flash memory, and it's this flash memory that will contain the code we want to execute. According to the datasheet, there are multiple methods that can be used to write to the flash memory. However, because we're trying to put together a system upon which we can experiment and develop, we in particular don't want to have to use something where we have to continuously unplug and replug the chip into some kind of external programmer tool every time we want to try a new version of the code we're writing. We, ideally, want something we can leave plugged in on the breadboard, if possible. We also don't want to have to use a system that consumes all of the pins on the microcontroller; if we're going to leave whatever device we end up using plugged in, then we need to have at least some pins left over for our own applications! Additionally, we don't want to have to use any kind of proprietary software to program the chip. The ideal solution for our needs is to have some kind of device where one end of the device is plugged into an ordinary workstation, and the other end is connected to the breadboard, and we can execute a simple command-line tool to upload code to the microcontroller.
In the datasheet, on page 303 in the "Memory Programming" chapter, we find the following paragraph:
"Both the Flash and EEPROM memory arrays can be programmed using the serial SPI bus while RESET is pulled to GND. The serial interface consists of pins SCK, MOSI (input) and MISO (output). After RESET is set low, the Programming Enable instruction needs to be executed first before program/erase operations can be executed."
In other words, if we can connect together some sort of device that is capable of setting the microcontroller's RESET pin low, and then sending a series of programming commands over the wire using the Serial Peripheral Interface protocol, then this will be sufficient to get code onto the chip. This would appear to be the best choice for our needs as the SPI protocol only requires three pins to function.
The device we need is a USB in-system programmer (ISP). There are numerous inexpensive USB ISPs available as they're trivial to manufacture and numerous open source firmware distributions exist. Some people choose to use an existing Arduino board to act as an ISP. Alternatively, there are DIY kits available to self-build an ISP programmer. There are also pre-assembled devices available. For this book, the assumption is that the reader will be using the USBtinyISP programmer. The instructions differ only slightly for different programmers, and attempts are made to indicate where this may occur.
As mentioned earlier, we need to locate the correct pins on the microcontroller to which to connect the pins on the USB ISP. Typically, if one was using an existing manufactured development board, there would be a six-pin connector already connected to the board, with the pins already wired to the correct pins on the microcontroller. Such a connector might look like this:
This kind of connector is typically attached to a ribbon cable that may have been supplied with the ISP. The arrangement of the pins in the connector is specified in the AVRISP User Guide, and all ISP programmers tend to be compatible with this. The pins on the connector are described in the following diagram taken from the guide:
One needlessly frustrating aspect of this diagram is that although the pins are numbered, the diagram gives no indication as to the orientation of the connector. A pair of diagrams taken from avrfreaks indicates that pin 3 is closest to the open side of the connector:
We need to connect the MOSI, MISO, SCK, and RESET pins on the ISP to the corresponding pins on the microcontroller. We do not need to connect the Vcc or GND pins as we will be relying on a breadboard power supply to power the entire board. As such, this would also be a good time to connect a power supply to the breadboard.
Looking back at the pin diagram for the microcontroller, we can easily find the MOSI, MISO, SCK, and RESET pins.

5.2.11. Connections

  1. Connect the breadboard power supply to the breadboard.
  2. Connect the MOSI pin of the ISP to pin 17 on the microcontroller.
  3. Connect the MISO pin of the ISP to pin 18 on the microcontroller.
  4. Connect the SCK pin of the ISP to pin 19 on the microcontroller.
  5. Connect the RESET pin of the ISP to pin 1 on the microcontroller.
All AVR microcontrollers have so-called fuse bits that can be programmed using an ISP. The fuse bits are essentially software configuration bits that can be used to control, for example, whether the chip will use an internal or external oscillator, or whether EEPROM values will be preserved across a chip erase operation, or a large number of other configuration values. Fuse bits are stored in some kind of non-volatile memory inside the controller, and the values set will survive indefinitely without a power supply. A good test to determine whether we've wired up the chip correctly is to use an open-source command-line tool to attempt to read the current values of the fuse bits using the ISP.
The command-line tool we'll be using is avrdude [1]. Most Linux distributions and the BSDs come with precompiled packages of avrdude, so install it using whatever mechanism is appropriate for your system. The avrdude tool knows how to send all of the commands necessary to program an AVR microcontroller using a wide range of different programmers. The avrdude refers to the USBtinyISP programmer we're using as usbtiny, but you might find whatever programmer you're using on the list of programmers that can be viewed by executing avrdude -c ?:

5.3.3. Supported Programmers

$ avrdude -c ?

Valid programmers are:
  2232HIO          = FT2232H based generic programmer
  4232h            = FT4232H based generic programmer
  89isp            = Atmel at89isp cable
  abcmini          = ABCmini Board, aka Dick Smith HOTCHIP
  alf              = Nightshade ALF-PgmAVR, http://nightshade.homeip.net/
  arduino          = Arduino
  arduino-ft232r   = Arduino: FT232R connected to ISP
  atisp            = AT-ISP V1.1 programming cable for AVR-SDK1 from <http://micro-research.co.th/>
  atmelice         = Atmel-ICE (ARM/AVR) in JTAG mode
  atmelice_dw      = Atmel-ICE (ARM/AVR) in debugWIRE mode
  atmelice_isp     = Atmel-ICE (ARM/AVR) in ISP mode
  atmelice_pdi     = Atmel-ICE (ARM/AVR) in PDI mode
  avr109           = Atmel AppNote AVR109 Boot Loader
  avr910           = Atmel Low Cost Serial Programmer
  avr911           = Atmel AppNote AVR911 AVROSP
  avrftdi          = FT2232D based generic programmer
  avrisp           = Atmel AVR ISP
  avrisp2          = Atmel AVR ISP mkII
  avrispmkII       = Atmel AVR ISP mkII
  avrispv2         = Atmel AVR ISP V2
  bascom           = Bascom SAMPLE programming cable
  blaster          = Altera ByteBlaster
  bsd              = Brian Dean's Programmer, http://www.bsdhome.com/avrdude/
  buspirate        = The Bus Pirate
  buspirate_bb     = The Bus Pirate (bitbang interface, supports TPI)
  butterfly        = Atmel Butterfly Development Board
  butterfly_mk     = Mikrokopter.de Butterfly
  bwmega           = BitWizard ftdi_atmega builtin programmer
  C232HM           = FT232H based module from FTDI and Glyn.com.au
  c2n232i          = serial port banging, reset=dtr sck=!rts mosi=!txd miso=!cts
  dapa             = Direct AVR Parallel Access cable
  dasa             = serial port banging, reset=rts sck=dtr mosi=txd miso=cts
  dasa3            = serial port banging, reset=!dtr sck=rts mosi=txd miso=cts
  diecimila        = alias for arduino-ft232r
  dragon_dw        = Atmel AVR Dragon in debugWire mode
  dragon_hvsp      = Atmel AVR Dragon in HVSP mode
  dragon_isp       = Atmel AVR Dragon in ISP mode
  dragon_jtag      = Atmel AVR Dragon in JTAG mode
  dragon_pdi       = Atmel AVR Dragon in PDI mode
  dragon_pp        = Atmel AVR Dragon in PP mode
  dt006            = Dontronics DT006
  ere-isp-avr      = ERE ISP-AVR <http://www.ere.co.th/download/sch050713.pdf>
  flip1            = FLIP USB DFU protocol version 1 (doc7618)
  flip2            = FLIP USB DFU protocol version 2 (AVR4023)
  frank-stk200     = Frank STK200
  ft232r           = FT232R Synchronous BitBang
  ft245r           = FT245R Synchronous BitBang
  futurlec         = Futurlec.com programming cable.
  jtag1            = Atmel JTAG ICE (mkI)
  jtag1slow        = Atmel JTAG ICE (mkI)
  jtag2            = Atmel JTAG ICE mkII
  jtag2avr32       = Atmel JTAG ICE mkII im AVR32 mode
  jtag2dw          = Atmel JTAG ICE mkII in debugWire mode
  jtag2fast        = Atmel JTAG ICE mkII
  jtag2isp         = Atmel JTAG ICE mkII in ISP mode
  jtag2pdi         = Atmel JTAG ICE mkII PDI mode
  jtag2slow        = Atmel JTAG ICE mkII
  jtag3            = Atmel AVR JTAGICE3 in JTAG mode
  jtag3dw          = Atmel AVR JTAGICE3 in debugWIRE mode
  jtag3isp         = Atmel AVR JTAGICE3 in ISP mode
  jtag3pdi         = Atmel AVR JTAGICE3 in PDI mode
  jtagkey          = Amontec JTAGKey, JTAGKey-Tiny and JTAGKey2
  jtagmkI          = Atmel JTAG ICE (mkI)
  jtagmkII         = Atmel JTAG ICE mkII
  jtagmkII_avr32   = Atmel JTAG ICE mkII im AVR32 mode
  lm3s811          = Luminary Micro LM3S811 Eval Board (Rev. A)
  mib510           = Crossbow MIB510 programming board
  mkbutterfly      = Mikrokopter.de Butterfly
  nibobee          = NIBObee
  o-link           = O-Link, OpenJTAG from www.100ask.net
  openmoko         = Openmoko debug board (v3)
  pavr             = Jason Kyle's pAVR Serial Programmer
  pickit2          = MicroChip's PICkit2 Programmer
  picoweb          = Picoweb Programming Cable, http://www.picoweb.net/
  pony-stk200      = Pony Prog STK200
  ponyser          = design ponyprog serial, reset=!txd sck=rts mosi=dtr miso=cts
  siprog           = Lancos SI-Prog <http://www.lancos.com/siprogsch.html>
  sp12             = Steve Bolt's Programmer
  stk200           = STK200
  stk500           = Atmel STK500
  stk500hvsp       = Atmel STK500 V2 in high-voltage serial programming mode
  stk500pp         = Atmel STK500 V2 in parallel programming mode
  stk500v1         = Atmel STK500 Version 1.x firmware
  stk500v2         = Atmel STK500 Version 2.x firmware
  stk600           = Atmel STK600
  stk600hvsp       = Atmel STK600 in high-voltage serial programming mode
  stk600pp         = Atmel STK600 in parallel programming mode
  ttl232r          = FTDI TTL232R-5V with ICSP adapter
  tumpa            = TIAO USB Multi-Protocol Adapter
  UM232H           = FT232H based module from FTDI and Glyn.com.au
  uncompatino      = uncompatino with all pairs of pins shorted
  usbasp           = USBasp, http://www.fischl.de/usbasp/
  usbasp-clone     = Any usbasp clone with correct VID/PID
  usbtiny          = USBtiny simple USB programmer, http://www.ladyada.net/make/usbtinyisp/
  wiring           = Wiring
  xil              = Xilinx JTAG cable
  xplainedmini     = Atmel AVR XplainedMini in ISP mode
  xplainedmini_dw  = Atmel AVR XplainedMini in debugWIRE mode
  xplainedpro      = Atmel AVR XplainedPro in JTAG mode
We need to tell avrdude what kind of microcontroller we're using so that it knows what programming commands to send. In this case, we're using an atmega328p. We can now put all of this together and execute a command to read the current fuse bit values on the microcontroller. Ensure that your programmer is connected, and that the breadboard power supply is switched on, and execute the following command:

5.3.5. Supported Programmers

$ avrdude -c usbtiny -p atmega328p
avrdude: AVR device initialized and ready to accept instructions
Reading | ################################################## | 100% 0.01s
avrdude: Device signature = 0x1e950f (probably m328p)
avrdude: safemode: Fuses OK (E:FF, H:D9, L:62)
avrdude done.  Thank you.
If we take a look at the datasheet, we can see that the fuse bits are divided up into extended, high, and low fuse bits. The default values for the fuse bits are given on pages 292 and 293 (the table for the extended fuse is not reproduced here for reasons of brevity):
We can see that the default values for the extended, high, and low fuses are 0b11111111 (0xff), 0b11011001 (0xd9), and 0b01100010 (0x62), respectively. We can see that matches the values returned by avrdude and therefore we can be confident that both the programmer and the chip are working correctly.
As mentioned earlier, the ATMega328p chip is configured by default to use an internal 8mhz oscillator effectively reduced to 1mhz. We've connected an external 16mhz oscillator crystal and therefore we need to set some fuse bits in order to tell the microcontroller to actually use it. The information on this in the datasheet is, to say the least, arduous. Working through the "System Clock and Clock Options" section of the datasheet, starting on page 36, our first task is to disable the clock divider so that our 16mhz external clock isn't reduced to 2mhz. We quickly reach the following paragraph:
"The device is shipped with internal RC oscillator at 8.0MHz and with the fuse CKDIV8 programmed, resulting in 1.0MHz system clock. The startup time is set to maximum and time-out period enabled. (CKSEL = "0010", SUT = "10", CKDIV8 = "0"). The default setting ensures that all users can make their desired clock source setting using any available programming interface."
Therefore, we need to set the CKDIV8 bit to 1. Somewhat counter-intuitively, fuse bits are considered "enabled" or "programmed" if they are set to 0, and "unprogrammed" or "disabled" if they are set to 1. Searching the datasheet for CKDIV8 eventually leads us back to page 292 where we can see that bit 7 of the low fuse is the CKDIV8 bit and is set to 0 (enabled) by default. Fuse values cannot be programmed one bit at a time; it's necessary to set all eight bits of a given fuse in a single operation. We therefore need to know what all of the bits are going to be before we can set them. We at least now know that our final low fuse value will have to be 0b1???????, where ? indicates not-yet-known values.
Next, we need to tell the microcontroller to actually use our external clock. Looking back at page 37 of the datasheet, we can see that there are four "clock select" bits named CKSEL0, CKSEL1, CKSEL2, and CKSEL3. We happen to be using a 16mhz low power crystal oscillator, and we can see from the table that we should set CKSEL1..3 to 0b111:
We're also required to set the start-up time. This is required because some types of oscillators and clocks take longer than other types to stabilize and output a consistent pulse rate. There appear to be no downsides to picking the most conservative (highest) values, which in this case means a 14 tick + 65 millisecond delay when resetting the microcontroller. We can see from the table that we need to set the two "start up time" bits, SUT0 and SUT1, and the remaining CKSEL0 bit to 0b11 and 0b1, respectively.
Putting all of this together, we have CKSEL0..3 = 0b1111, SUT0..1 = 0b11, CKDIV8 = 0b1, which leaves only one remaining bit in the low fuse value: CKOUT. The purpose of the CKOUT bit is to instruct the microcontroller to output the clock pulse it is receiving on a separate pin. We have no use for this, and therefore CKOUT = 0b1. This, somewhat anti-climactically given the amount of datasheet scanning it took to get here, means that our resulting low fuse value will be 0b11111111. We can instruct avrdude to set this value:

5.4.9. Setting The Low Fuse

$ avrdude -c usbtiny -p atmega328p -U lfuse:w:0xff:m
avrdude: AVR device initialized and ready to accept instructions
Reading | ################################################## | 100% 0.01s

avrdude: Device signature = 0x1e950f (probably m328p)
avrdude: reading input file "0xff"
avrdude: writing lfuse (1 bytes):
Writing | ################################################## | 100% 0.00s

avrdude: 1 bytes of lfuse written
avrdude: verifying lfuse memory against 0xff:
avrdude: load data lfuse data from input file 0xff:
avrdude: input file 0xff contains 1 bytes
avrdude: reading on-chip lfuse data:
Reading | ################################################## | 100% 0.00s

avrdude: verifying ...
avrdude: 1 bytes of lfuse verified
avrdude: safemode: Fuses OK (E:FF, H:D9, L:FF)

avrdude done.  Thank you.
The -U lfuse:w:0xff:m option specifies that we want to perform a write operation (w) on the low fuse value (lfuse), and we want to specify an immediate value (m) of 0xff. It's also possible to read values from files if the m flag is not used.

Footnotes

1
"AVR Downloader/UploaDEr" for the morbidly curious.
References to this footnote: 1
The canonical "hello world" program for microcontrollers is commonly known as Blink. The program takes many forms, but generally amounts to this:

6.1.2. Blink

  1. Turn on an LED.
  2. Wait a second or so.
  3. Turn off the LED.
  4. Go to step 1.
We're going write an increasingly difficult version of Blink, referencing the datasheets for all necessary information. The first version will be written in AVR assembler (but assembled using the free GCC compiler), and we'll proceed to the C version afterwards. Assembly language examples are written using the GNU Assembler.
The AVR architecture used in the ATMega328p is slightly atypical in that it is an unmodified Harvard architecture with completely separate address spaces for code and data. C/C++ programmers are, at the time of writing, accustomed to having code and data live within the same address space. The reason this is important to mention is because it's often necessary to specify the addresses of functions and objects when programming microcontrollers, and it's critical to understand that address 0x0000 in program space is not the same as address 0x0000 in data space! The AVR actually exposes different instructions to read from and write to locations in program space as opposed to data space. This can have practical consequences when programming in C on this architecture, because although the ATMega328p has 32 kilobytes of code space, it only has 2 kilobytes of data space. A programmer that declares a variable of type const unsigned char[2048] might be dismayed to realize that they've just consumed the entirety of the working memory on the system, despite the fact that the variable is const and could fit comfortably into the 32 kilobytes of code space. In order for a const variable to be placed into code space, it's necessary to use Named Address Spaces from the current N1275 Embedded C draft standard. Named address spaces are available as a GCC extension , and we'll be revisiting them later on. The AVR microcontrollers actually use many different address spaces, but we'll only be seeing a few of them in this book.
Throughout this document, the following notation will be used:

6.2.3. Address Spaces

  • code@0xNNNN denotes address 0xNNNN in code space.
  • data@0xNNNN denotes address 0xNNNN in data space.
  • io@0xNNNN denotes address 0xNNNN in I/O space.
As soon as the microcontroller powers on, it begins executing code from a location specified by the fuse bit BOOTRST. The default configuration for the ATMega328p specifies that BOOTRST = 0, and page 276 of the datasheet indicates that this means that execution will start at program address code@0x0000 when the microcontroller is powered on. We won't be changing this default setting.
The code that lives at code@0x0000 represents the interrupt table. The interrupt table is a 26-element array where each element consists of two instruction words. The instructions will, in practice, typically always perform an unconditional jump to some subroutine in memory.
The full table of interrupts is described on page 74 of the datasheet:
The first element of the interrupt table is executed in response to a RESET interrupt, and therefore the instructions in this element of the table will always be the first instructions executed when the microcontroller powers on. Because, in this example, we aren't planning to use interrupts at all, we can actually provide a very simple interrupt table that simply jumps to a function we provide called __avr_setup:

6.3.6. AVR Interrupt Table

.text

.global __avr_setup
.global __avr_interrupt_vectors
.global __avr_unexpected_interrupt

.org 0x0000

__avr_interrupt_vectors:
  jmp __avr_setup
  jmp __avr_unexpected_interrupt
  jmp __avr_unexpected_interrupt
  jmp __avr_unexpected_interrupt
  jmp __avr_unexpected_interrupt
  jmp __avr_unexpected_interrupt
  jmp __avr_unexpected_interrupt
  jmp __avr_unexpected_interrupt
  jmp __avr_unexpected_interrupt
  jmp __avr_unexpected_interrupt
  jmp __avr_unexpected_interrupt
  jmp __avr_unexpected_interrupt
  jmp __avr_unexpected_interrupt
  jmp __avr_unexpected_interrupt
  jmp __avr_unexpected_interrupt
  jmp __avr_unexpected_interrupt
  jmp __avr_unexpected_interrupt
  jmp __avr_unexpected_interrupt
  jmp __avr_unexpected_interrupt
  jmp __avr_unexpected_interrupt
  jmp __avr_unexpected_interrupt
  jmp __avr_unexpected_interrupt
  jmp __avr_unexpected_interrupt
  jmp __avr_unexpected_interrupt
  jmp __avr_unexpected_interrupt
  jmp __avr_unexpected_interrupt

__avr_unexpected_interrupt:
  jmp __avr_interrupt_vectors
Essentially, the first entry of the __avr_interrupt_vectors table performs an unconditional jump to a not-yet-defined function called __avr_setup. Every other entry in the table first jumps to __avr_unexpected_interrupt, and then jumps back to the first entry of the __avr_interrupt_vectors table. In effect, this causes any interrupt raised to cause the microcontroller to behave as if it had been reset. The intermediate __avr_unexpected_interrupt function purely exists to assist with debugging; the user can set a breakpoint on the __avr_unexpected_interrupt function to be notified whenever the program receives an interrupt it wasn't expecting.
Now that interrupts have been configured, and the execution path of the code leads to the __avr_setup function, it's time to actually define that function. The function has three responsibilities:

6.4.2. Setup Actions

  1. The function must clear the AVR status register.
  2. The function must set up the AVR stack pointer.
  3. The function must begin executing the programmer's application code.
The AVR status register (referred to as SREG in the datasheet) is a register that enables and disables interrupts, and provides information about the most recently executed arithmetic operation such as indicating overflows, carries, and so on. The status register should be manually cleared on startup in order to ensure that the microcontroller has a clean slate with regards to execution state. The status register is accessible at address io@0x003f, and can be assigned using the special instruction out, which can store a value in the address range dedicated to the microcontroller's I/O registers.
The AVR stack pointer points to the top of the execution stack. As with most architectures, the AVR execution stack grows downwards from higher addresses to lower addresses. The AVR push instruction pushes data onto the stack, which results in the stack pointer being decremented by 1. In other words, if the stack pointer is currently pointing at address data@0x03ff, and a push instruction is executed, the stack pointer will now be pointing at address data@0x03fe. The AVR pop instruction is the exact inverse; the stack pointer will be incremented. It's necessary, on startup of the microcontroller, for the programmer to initialize the stack pointer to a sensible value. In practice, this value is always the address of the top of the SRAM. On the ATMega328p, the data sheet indicates on page 28 that the internal SRAM ranges from data@0x0100 to data@0x08ff inclusive. The 16-bit stack pointer is implemented as a pair of 8-bit registers, with the high 8 bits at io@0x003e and the low 8 bits at io@0x003d.
Given all of this information, we can now write the __avr_setup function:

6.4.6. AVR Setup

__avr_stack_pointer_h = 0x3e
__avr_stack_pointer_l = 0x3d
__avr_status_register = 0x3f

__avr_setup:
  # Clear the status register
  ldi r16,0x0
  out __avr_status_register,r16

  # Configure the stack pointer to start at 0x08ff
  ldi r16,0x08
  out __avr_stack_pointer_h,r16
  ldi r16,0xff
  out __avr_stack_pointer_l,r16

  # Execute main
  call main
The function loads 0 into register r16, and then stores the value of r16 into I/O location 0x3f. This clears the status register.
The function then loads 0x08 into register r16, and stores the value of r16 into I/O location 0x3e. It immediately follows by storing 0xff into I/O location 0x3d. This has the effect of setting the stack pointer to 0x08ff - the top of SRAM.
The function then calls a yet-to-be-defined function called main that will eventually contain our Blink code.
In order to implement Blink, we first need to connect an LED that we can turn on and off from the microcontroller. On page 13 of the datasheet, we can see numerous ports listed that span sets of pins. A port is, essentially, an I/O register: Setting a single bit in the register to 1 will set the corresponding pin voltage high, whilst setting the same bit to 0 will set the corresponding pin voltage low. The first port listed on page 13 is Port B, and we can see that this consists of eight pins/bits named PB0 to PB7. If we try to find PB0 on the pin configuration diagram on page 12, we'll find that PB0 is associated with pin 14, on the bottom left corner of the IC. We can see that pin 14 can also be configured to provide a number of different functions, such as CLKO (the pin used to for the clock output we briefly encountered earlier). We can search the datasheet for the definitions of all of these things but, as we don't need them and they're all disabled by default, it's clear that we can safely use pin 14 to control our LED.
We can see on page 323 of the datasheet that the minimum output voltage any given ATMega328p will provide on an output pin when the voltage is set high is 4.2V. The LED specified in the bill of materials will be damaged if we supply it with this much current, so we need to connect it in series with a 220Ω resistor. Note that LEDs are polarized components and therefore must be connected in the correct orientation. The negative side or cathode of the LED must be connected to ground. LED components always provide some way to indicate orientation, either by making the positive or anode leg of the LED longer, or by angling the internal construction of the internal leadframe such that it points towards the positive side[1]:

6.5.4. Connections

  1. Connect a 220Ω resistor to pin 14 on the microcontroller.
  2. Connect an LED to the resistor and to ground.
Note that, in the image above, the anode leg of the LED has been soldered to the resistor rather than both the resistor and the anode leg being plugged into the breadboard. This is purely for convenience and isn't necessary for the circuit to function.
Now that we have an LED connected, it's time to write the actual function that will turn the LED on and off and a fixed interval. We know that the LED is connected to Port B, pin PB0, and page 84 of the datasheet tells us that there are three I/O memory address locations associated with any given port: A DDR register that controls whether a pin is used as an input or an output, a PORT register that allows for setting a pin high or low when the pin is used as an output, and a PIN register that is used to read the value of a pin when the pin is used as an input. We can largely ignore the PIN register as we're only concerned with output at the moment. The datasheet, over the next few pages, describes the method to use for reading or writing pins. Reduced to the essentials: We need to set PB0 as an output pin by setting bit 0 of the DDRB register to 1, and then we can set the pin high or low by setting bit 0 of the PORTB register to 1 or 0, respectively. We can jump to the complete summary of all registers on the microcontroller on page 624 to determine the I/O locations of these registers:
We can see that DDRB is at io@0x04 and PORTB is at io@0x05 [2]. All of this amounts to the following assembler instructions:

6.6.4. Setting LED On/Off

__DDRB = 0x04
__PORTB = 0x05

  # Set the PB0 pin as an output pin
  ldi r16,1
  out __DDRB,r16

  # Set the PB0 pin high
  ldi r16,1
  out __PORTB,r16

  # Set the PB0 pin low
  ldi r16,0
  out __PORTB,r16
Note that, by assigning to the DDRB and PORTB registers like this, we're actually setting all eight bits of each register. In a more complex program that used a mix of input and output pins in each port, we would want to carefully shift and mask bits in order to avoid disturbing the existing values in the registers. In our Blink circuit, however, all of the other pins in Port B are unused, so we can recklessly assign them without worrying.
We still have one piece of the puzzle remaing: We need to wait for a fixed interval before turning the LED on and/or off, otherwise the microcontroller will simply flash the LED on and off too quickly for any human to perceive. At this point, the simplest way we can achieve this is to simply waste CPU time executing instructions that otherwise do nothing. In other words, we want the moral equivalent of this C function:

6.6.7. Spin Uselessly

void pause (void) {
  for (volatile int index = 0; index < 1000000; ++index) {
    // Do nothing
  }
}
Now, given that we only have 8-bit registers to work with, the simplest way to implement a loop like this is to implement three nested loops that each count to 100:

6.6.9. Spin Uselessly In 8 Bits

void pause (void) {
  for (volatile uint8_t z = 0; z < 100; ++z) {
    for (volatile uint8_t y = 0; y < 100; ++y) {
      for (volatile uint8_t z = 0; z < 100; ++z) {
        // Do nothing
      }
    }
  }
}
In AVR assembler, the pause function looks like this:

6.6.11. Spin Uselessly In Assembler

pause:
  ldi r16,1
.LZ:
  cpi r16,100
  breq .LZend
  inc r16
  ldi r17,0
.LY:
  cpi r17,100
  breq .LYend
  inc r17
  ldi r18,0
.LX:
  cpi r18,100
  breq .LXend
  inc r18
  jmp .LX
.LXend:
  jmp .LY
.LYend:
  jmp .LZ
.LZend:
  ret
It's not too critical to understand how this code achieves the above loop, although the implementation is straightforward given the definitions in the AVR instruction set manual. We will, in later revisions of Blink, be replacing this code with code that uses the hardware timers for accurate delays.
Our complete Blink program in assembler now looks like this:

6.6.14. Blink In Assembler

.text

.global __avr_setup
.global __avr_interrupt_vectors
.global __avr_unexpected_interrupt
.global main

.org 0x0000

__avr_interrupt_vectors:
  jmp __avr_setup
  jmp __avr_unexpected_interrupt
  jmp __avr_unexpected_interrupt
  jmp __avr_unexpected_interrupt
  jmp __avr_unexpected_interrupt
  jmp __avr_unexpected_interrupt
  jmp __avr_unexpected_interrupt
  jmp __avr_unexpected_interrupt
  jmp __avr_unexpected_interrupt
  jmp __avr_unexpected_interrupt
  jmp __avr_unexpected_interrupt
  jmp __avr_unexpected_interrupt
  jmp __avr_unexpected_interrupt
  jmp __avr_unexpected_interrupt
  jmp __avr_unexpected_interrupt
  jmp __avr_unexpected_interrupt
  jmp __avr_unexpected_interrupt
  jmp __avr_unexpected_interrupt
  jmp __avr_unexpected_interrupt
  jmp __avr_unexpected_interrupt
  jmp __avr_unexpected_interrupt
  jmp __avr_unexpected_interrupt
  jmp __avr_unexpected_interrupt
  jmp __avr_unexpected_interrupt
  jmp __avr_unexpected_interrupt
  jmp __avr_unexpected_interrupt

__avr_unexpected_interrupt:
  jmp __avr_interrupt_vectors

__avr_stack_pointer_h = 0x3e
__avr_stack_pointer_l = 0x3d
__avr_status_register = 0x3f

__avr_setup:
  # Clear the status register
  ldi r16,0x0
  out __avr_status_register,r16

  # Configure the stack pointer to start at 0x08ff
  ldi r16,0x08
  out __avr_stack_pointer_h,r16
  ldi r16,0xff
  out __avr_stack_pointer_l,r16

  # Execute main
  call main

pause:
  ldi r16,1
.LZ:
  cpi r16,100
  breq .LZend
  inc r16
  ldi r17,0
.LY:
  cpi r17,100
  breq .LYend
  inc r17
  ldi r18,0
.LX:
  cpi r18,100
  breq .LXend
  inc r18
  jmp .LX
.LXend:
  jmp .LY
.LYend:
  jmp .LZ
.LZend:
  ret

__DDRB = 0x04
__PORTB = 0x05

main:
  ldi r16,1
  out __DDRB,r16
.LED_ON:
  ldi r16,1
  out __PORTB,r16
  call pause
  ldi r16,0
  out __PORTB,r16
  call pause
  jmp .LED_ON
With that code placed in a file named blink.s, we can compile the code using avr-gcc and produce an ihex file that can be flashed directly to the ATMega328p using avrdude:

6.6.16. Compiling Blink Assembler

# Compile the code
$ avr-gcc -nostartfiles -nodefaultlibs -nolibc -nostdlib -ffreestanding -mmcu=atmega328p -o blink blink.s

# Convert the executable to ihex format
$ avr-objcopy -j .text -j .data -O ihex blink blink.hex

# Flash the code to the ATMega328p
$ avrdude -p atmega328p -c usbtiny -U flash:w:blink.hex:i
The first command compiles the code using avr-gcc. We're required to specify that we're compiling for the ATMega328p so that the compiler doesn't produce any instructions that the ATMega328p does not support[3]. We're required to specify all of the -nostartfiles -nodefaultlibs -nolibc -nostdlib -ffreestanding options because gcc is, after all, a C compiler, and by default it will try to insert its own version of all of the AVR platform-specific startup code that we've already written ourselves in assembler. See the GCC manual for the definitions of these options; they largely amount to saying "I've written all of the setup code myself so don't try to generate anything for me".
The second command uses avr-objcopy to copy the .text and .data sections from the resulting blink executable, and to convert the result into ihex format. This is the input format that avrdude is typically configured to use in order to flash code to the microcontroller. In our case, the .data section is actually empty because our assembler program doesn't declare any variables in memory.
The final command flashes the code contained in blink.hex to the microcontroller. We indicate that we want to write (w), to flash memory ( flash), the file blink.hex, and the input format is ihex (i). Note that avrdude erases the contents of flash memory, writes the specified data, and then reads it back from flash memory and verifies that the data was written as expected. Chips have a limited number of times (typically in the tens of thousands) that they can be written before the flash memory begins to exhibit errors, so this verification step is critical. As soon as the flash operation has completed, the microcontroller will be reset, and you should be presented with an LED blinking at a rate of about twice per second.

6.6.20. Flashing Blink Assembler

$ avrdude -p atmega328p -c usbtiny -U flash:w:blink.hex:i
avrdude: AVR device initialized and ready to accept instructions
Reading | ################################################## | 100% 0.01s

avrdude: Device signature = 0x1e950f (probably m328p)
avrdude: NOTE: "flash" memory has been specified, an erase cycle will be performed
         To disable this feature, specify the -D option.
avrdude: erasing chip
avrdude: reading input file "blink.hex"
avrdude: writing flash (186 bytes):
Writing | ################################################## | 100% 0.64s

avrdude: 186 bytes of flash written
avrdude: verifying flash memory against blink.hex:
avrdude: load data flash data from input file blink.hex:
avrdude: input file blink.hex contains 186 bytes
avrdude: reading on-chip flash data:
Reading | ################################################## | 100% 0.46s

avrdude: verifying ...
avrdude: 186 bytes of flash verified
avrdude: safemode: Fuses OK (E:FF, H:D9, L:FF)
avrdude done.  Thank you.

Footnotes

1
Do not be concerned about damaging the LED by connecting it in the wrong orientation. Most LEDs will not be damaged, but will simply fail to light up when connected in the wrong orientation.
References to this footnote: 1
2
The datasheet seems to assume that people will magically know that the values in parentheses are data address space locations, and the values to the left of those are I/O space locations.
References to this footnote: 1
3
Given that our input is a single assembler file, the compiler won't actually be generating any new instructions. We're required to specify the microcontroller regardless.
References to this footnote: 1
In the previous section, we put together a working Blink program using assembler. However, the program has a number of limitations:

7.1.2. Limitations

  • The program requires a lot of setup code. Roughly 50% of the actual assembler text has nothing to do with Blink.
  • The program doesn't use any kind of accurate timer for pausing between toggling the LED on and off. It simply wastes execution time in a manner that's highly dependent on the microcontroller clock speed.
  • The Blink program is usually accompanied by some kind of text debugging output that can be observed on a serial console or some similar connection when the program is running. Our Blink program doesn't do any of this.
Largely, the first problem can be eliminated by writing the program in C and allowing the compiler to generate its own platform initialization code. Essentially, we'll allow the compiler to generate __avr_interrupt_vectors and __avr_setup for us. We can inspect the generated code as a learning exercise to see if there's anything in the compiler-generated version that differs from our own.
The second problem can be eliminated by using the dedicated timer hardware present on the microcontroller.
The third problem can be eliminated by using the USART hardware included on the microcontroller to provide output that can be observed using a serial console.
We'll fix each problem one at a time, yielding a final program that has accurate timing and produces debugging output.
Rewriting our original pause function in C is trivial, as we actually specified it in C originally and then wrote the assembler version. However, rewriting the code that actually toggles the LED is somewhat more difficult, because we don't have direct access to the out instruction required to write to addresses in I/O space.
Thankfully, there's a solution to this. Page 30 of the datasheet has this to say:
"When using the I/O specific commands IN and OUT, the I/O addresses 0x00 - 0x3F must be used. When addressing I/O Registers as data space using LD and ST instructions, 0x20 must be added to these addresses."
What the data sheet is implicitly stating is that the registers in I/O space are also accessible in data space at addresses 32 bytes higher. This can be observed directly if we turn once again to the register summary on page 624, we can see that, for example the address of PORTB is io@0x05 and data@0x25. This means that we can actually access these registers using volatile uint8_t pointers in C. The pointers must be volatile because reading or writing to the target addresses produces I/O effects and the compiler must not be allowed to omit or reorder those operations for the purposes of optimization.
Given all of this information, we can trivially rewrite the Blink program in C to be behaviour-compatible with the assembler version:

7.2.6. Blink In C Poorly

#include <stdint.h>

void
pause (void)
{
  for (volatile uint8_t z = 0; z < 100; ++z) {
    for (volatile uint8_t y = 0; y < 100; ++y) {
      for (volatile uint8_t z = 0; z < 100; ++z) {
        // Do nothing
      }
    }
  }
}

volatile uint8_t * const PORTB = (volatile uint8_t *) 0x0025;
volatile uint8_t * const DDRB = (volatile uint8_t *) 0x0024;

int
main (void)
{
  *DDRB = 1;

  for (;;) {
    *PORTB = 1;
    pause();
    *PORTB = 0;
    pause();
  }
}
Assuming that we placed the code into a file called blinkBad.c, we can compile the program with minimal optimization:

7.2.8. Blink In C Poorly

$ avr-gcc -Os -mmcu=atmega328p -o blinkBad blinkBad.c
$ avr-objcopy -j .text -j .data -O ihex blinkBad blinkBad.hex
We can use the avr-objdump tool to disassemble the executable and view the resulting machine code:

7.2.10. Blink In C Object Code

$ avr-objdump -d blinkBad

blinkBad:     file format elf32-avr


Disassembly of section .text:

00000000 <__vectors>:
   0:	0c 94 34 00 	jmp	0x68	; 0x68 <__ctors_end>
   4:	0c 94 49 00 	jmp	0x92	; 0x92 <__bad_interrupt>
   8:	0c 94 49 00 	jmp	0x92	; 0x92 <__bad_interrupt>
   c:	0c 94 49 00 	jmp	0x92	; 0x92 <__bad_interrupt>
  10:	0c 94 49 00 	jmp	0x92	; 0x92 <__bad_interrupt>
  14:	0c 94 49 00 	jmp	0x92	; 0x92 <__bad_interrupt>
  18:	0c 94 49 00 	jmp	0x92	; 0x92 <__bad_interrupt>
  1c:	0c 94 49 00 	jmp	0x92	; 0x92 <__bad_interrupt>
  20:	0c 94 49 00 	jmp	0x92	; 0x92 <__bad_interrupt>
  24:	0c 94 49 00 	jmp	0x92	; 0x92 <__bad_interrupt>
  28:	0c 94 49 00 	jmp	0x92	; 0x92 <__bad_interrupt>
  2c:	0c 94 49 00 	jmp	0x92	; 0x92 <__bad_interrupt>
  30:	0c 94 49 00 	jmp	0x92	; 0x92 <__bad_interrupt>
  34:	0c 94 49 00 	jmp	0x92	; 0x92 <__bad_interrupt>
  38:	0c 94 49 00 	jmp	0x92	; 0x92 <__bad_interrupt>
  3c:	0c 94 49 00 	jmp	0x92	; 0x92 <__bad_interrupt>
  40:	0c 94 49 00 	jmp	0x92	; 0x92 <__bad_interrupt>
  44:	0c 94 49 00 	jmp	0x92	; 0x92 <__bad_interrupt>
  48:	0c 94 49 00 	jmp	0x92	; 0x92 <__bad_interrupt>
  4c:	0c 94 49 00 	jmp	0x92	; 0x92 <__bad_interrupt>
  50:	0c 94 49 00 	jmp	0x92	; 0x92 <__bad_interrupt>
  54:	0c 94 49 00 	jmp	0x92	; 0x92 <__bad_interrupt>
  58:	0c 94 49 00 	jmp	0x92	; 0x92 <__bad_interrupt>
  5c:	0c 94 49 00 	jmp	0x92	; 0x92 <__bad_interrupt>
  60:	0c 94 49 00 	jmp	0x92	; 0x92 <__bad_interrupt>
  64:	0c 94 49 00 	jmp	0x92	; 0x92 <__bad_interrupt>

00000068 <__ctors_end>:
  68:	11 24       	eor	r1, r1
  6a:	1f be       	out	0x3f, r1	; 63
  6c:	cf ef       	ldi	r28, 0xFF	; 255
  6e:	d8 e0       	ldi	r29, 0x08	; 8
  70:	de bf       	out	0x3e, r29	; 62
  72:	cd bf       	out	0x3d, r28	; 61

00000074 <__do_copy_data>:
  74:	11 e0       	ldi	r17, 0x01	; 1
  76:	a0 e0       	ldi	r26, 0x00	; 0
  78:	b1 e0       	ldi	r27, 0x01	; 1
  7a:	e6 ef       	ldi	r30, 0xF6	; 246
  7c:	f0 e0       	ldi	r31, 0x00	; 0
  7e:	02 c0       	rjmp	.+4      	; 0x84 <__do_copy_data+0x10>
  80:	05 90       	lpm	r0, Z+
  82:	0d 92       	st	X+, r0
  84:	a4 30       	cpi	r26, 0x04	; 4
  86:	b1 07       	cpc	r27, r17
  88:	d9 f7       	brne	.-10     	; 0x80 <__do_copy_data+0xc>
  8a:	0e 94 6f 00 	call	0xde	; 0xde <main>
  8e:	0c 94 79 00 	jmp	0xf2	; 0xf2 <_exit>

00000092 <__bad_interrupt>:
  92:	0c 94 00 00 	jmp	0	; 0x0 <__vectors>

00000096 <pause>:
  96:	cf 93       	push	r28
  98:	df 93       	push	r29
  9a:	00 d0       	rcall	.+0      	; 0x9c <pause+0x6>
  9c:	0f 92       	push	r0
  9e:	cd b7       	in	r28, 0x3d	; 61
  a0:	de b7       	in	r29, 0x3e	; 62
  a2:	1b 82       	std	Y+3, r1	; 0x03
  a4:	8b 81       	ldd	r24, Y+3	; 0x03
  a6:	84 36       	cpi	r24, 0x64	; 100
  a8:	30 f0       	brcs	.+12     	; 0xb6 <pause+0x20>
  aa:	0f 90       	pop	r0
  ac:	0f 90       	pop	r0
  ae:	0f 90       	pop	r0
  b0:	df 91       	pop	r29
  b2:	cf 91       	pop	r28
  b4:	08 95       	ret
  b6:	1a 82       	std	Y+2, r1	; 0x02
  b8:	8a 81       	ldd	r24, Y+2	; 0x02
  ba:	84 36       	cpi	r24, 0x64	; 100
  bc:	20 f0       	brcs	.+8      	; 0xc6 <pause+0x30>
  be:	8b 81       	ldd	r24, Y+3	; 0x03
  c0:	8f 5f       	subi	r24, 0xFF	; 255
  c2:	8b 83       	std	Y+3, r24	; 0x03
  c4:	ef cf       	rjmp	.-34     	; 0xa4 <pause+0xe>
  c6:	19 82       	std	Y+1, r1	; 0x01
  c8:	89 81       	ldd	r24, Y+1	; 0x01
  ca:	84 36       	cpi	r24, 0x64	; 100
  cc:	20 f0       	brcs	.+8      	; 0xd6 <pause+0x40>
  ce:	8a 81       	ldd	r24, Y+2	; 0x02
  d0:	8f 5f       	subi	r24, 0xFF	; 255
  d2:	8a 83       	std	Y+2, r24	; 0x02
  d4:	f1 cf       	rjmp	.-30     	; 0xb8 <pause+0x22>
  d6:	89 81       	ldd	r24, Y+1	; 0x01
  d8:	8f 5f       	subi	r24, 0xFF	; 255
  da:	89 83       	std	Y+1, r24	; 0x01
  dc:	f5 cf       	rjmp	.-22     	; 0xc8 <pause+0x32>

000000de <main>:
  de:	81 e0       	ldi	r24, 0x01	; 1
  e0:	84 b9       	out	0x04, r24	; 4
  e2:	c1 e0       	ldi	r28, 0x01	; 1
  e4:	c5 b9       	out	0x05, r28	; 5
  e6:	0e 94 4b 00 	call	0x96	; 0x96 <pause>
  ea:	15 b8       	out	0x05, r1	; 5
  ec:	0e 94 4b 00 	call	0x96	; 0x96 <pause>
  f0:	f9 cf       	rjmp	.-14     	; 0xe4 <main+0x6>

000000f2 <_exit>:
  f2:	f8 94       	cli

000000f4 <__stop_program>:
  f4:	ff cf       	rjmp	.-2      	; 0xf4 <__stop_program>
A number of similarities and differences stand out. Firstly, our original __avr_interrupt_vectors table is replaced with a compiler-generated version called __vectors which performs largely the same tasks. Our __avr_setup function is replaced with a compiler-generated version called __ctors_end that performs the exact same tasks including clearing the status register and setting up the stack pointer. The execution of the __ctors_end function leads directly to a function called __do_copy_data that does not correspond to anything we originally wrote in the assembler version of Blink. The purpose of this function is to actually to support programming in C. Note that we mentioned previously that AVR uses multiple addresses spaces for code and data, whilst C programmers are accustomed to working in a single address space on typical hardware architectures. The __do_copy_data function exists to copy data from the code space into the data space in order to initialize the values of any variables defined in C. Without going into too much detail, the code uses the lpm instruction to copy a region of memory from code space into locations in data space. The compiler's linker defines two symbols __data_start and __data_end that define the start and end addresses of the data in code space, and the addresses of these symbols are inlined into the generated code of __do_copy_data and used to perform the copying operation. Given that there is very little to be learned by writing this code by hand, and given that it only exists to support programming in C on the microcontroller, we're satisfied with allowing the compiler to generate it.
Once the __do_copy_data function has completed, it performs an unconditional jump to our main function. The object code generated for main is surprisingly almost identical to our assembler code. The reason that this is surprising is that we explicitly decided to write to PORTB and DDRB using addresses in data space, but the compiler was intelligent enough to translate this code to executing out instructions on addresses in I/O space!
The __do_copy_data also includes a jump to a generated _exit function that turns off all interrupts and then continues to a function called __stop_program that simply loops forever and does nothing. The purpose of these two functions is to "halt" the microcontroller should the main function ever return.
Lastly, the code generated for the pause function was the largest difference. The code is similar, but has been reorganized to execute the same algorithm but with more in the way of stack manipulation, and with use of the ability to treat certain pairs of 8-bit registers as single 16-bit registers. The differences here are only of interest to assembler programmers, and we won't bother to discuss them any further.
Flashing the resulting blinkBad.hex file to the microcontroller with avrdude will result in an LED that blinks in the same manner as the assembler version, except that it will almost certainly blink slightly more slowly due to the generated pause function implementation wasting more time than the pure assembler version.
The next step will be to use a hardware timer to precisely control the LED blink periods.
On page 120 of the datasheet, we can see that the ATMega328p comes equipped with a 16-bit timer unit. The timer unit can act as a counter that ticks at a rate we specify, and we can choose to act when the counter reaches whatever tick count we require. We'll use this timer to count out a period of one second so that we can turn the LED on and off at a rate that is independent of the CPU clock speed.
The first part of configuring the timer on the microcontroller is determining the clock prescaler value. The way the timer on the ATMega328p works is that it will tick at a rate that is some division of the system clock. The datasheet refers to this as CLKi/o. If the clock prescaler value is set to 1, then the clock will tick at a rate of 16mhz = 16000000hz / 1 = 16000000hz. If the clock prescaler value is set to 8, then the clock will tick at a rate of 16000000hz / 8 = 8000000hz = 8mhz. If the clock prescaler value is set to 1024 then the clock will tick at a rate of 16000000hz / 1024 = 15625hz = 15.625khz. The prescaler value can only be set to 1, 8, 64, 256, or 1024. Why would we pick one prescaler value over another? The timer, as mentioned, is a 16-bit counter. Therefore it can only count 65535 ticks before it overflows. At 16mhz, 65535 / 16000000 ≈ 0.0040959375, meaning that we'd be able to count out approximately 4ms before the timer overflowed. However, with the prescaler set at 1024, we get 65535 / (16000000 / 1024) ≈ 4.19424. This means that we could count out approximately four seconds before the timer overflowed. Larger prescaler values, however, make the clock less precise. A 16mhz clock with no prescaling effectively counts individual periods of 1 / 16000000 ≈ 0.0000000625 seconds or 62.5 nanoseconds. A 16mhz clock with a prescaler of 1024, however, effectively counts individual periods of 1 / (16000000 / 1024) ≈ 0.000064 seconds or 64 microseconds. We trade the ability to measure smaller slices of time, for the ability to measure longer overall periods before the timer overflows. Given that our blink program works in periods of one second, and that we don't care about millisecond precision, we can safely use a prescaler value of 1024. The prescaler value is specified using the lowest 3 bits of the TCCR1B register as described on pages 142 and 143. We want to use a value of 0b101 to select a 1024 prescaler. All of the other bits in the register can be left at 0.
The second part of configuring the timer simply involves initializing the counter to a known initial value. On page 143 of the datasheet, the 16-bit timer value is exposed using a pair of 8-bit registers TCNT1H and TCNT1L, containing the high and low 8 bits of the 16-bit counter, respectively. We can simply initialize these to 0 every time we want to start counting, but the datasheet does specify on page 122 that:

7.3.4. 16-bit Register Access

  • When writing to a 16-bit register, we must write the high byte followed by the low byte.
  • When reading from a 16-bit register, we must read the low byte followed by the high byte.
As long as we take care to get the order of operations right, there won't be any problems.
Turning once again to the register summary on page 622, we can see that the TCCR1B register is an 8-bit register at data@0x0081, TCNT1H is an 8-bit register at data@0x0085, and TCNT1L is an 8-bit register at data@0x0084. We can therefore use the rather unsurprising declarations in C to access them:

7.3.7. Timer Registers

volatile uint8_t * const TCCR1B = (volatile uint8_t *) 0x0081;
volatile uint8_t * const TCNT1L = (volatile uint8_t *) 0x0084;
volatile uint8_t * const TCNT1H = (volatile uint8_t *) 0x0085;
We can select a prescaler and initialize the counter with the following equally unsurprising statements:

7.3.9. Timer Register Configuration

// Select a /1024 prescaler.
  *TCCR1B = 0b00000101;
  *TCNT1H = 0;
  *TCNT1L = 0;
Note that we're careful to write the high byte of the counter first, followed by the low byte of the counter.
Now, we simply need to sit in a loop, checking the counter on every iteration to see if the desired number of ticks has elapsed. How many ticks do we need to count out one second? Remember that with a 1024 prescaler at 16000000hz, 1 / (16000000 / 1024) ≈ 0.000064 seconds, so we need 1 / 0.000064 = 15625 ticks to make one second. Writing the pause function is now straightforward, and we can fill in the entirety of the improved Blink program.

7.3.12. Blink Better

#include <stdint.h>

volatile uint8_t * const TCCR1B = (volatile uint8_t *) 0x0081;
volatile uint8_t * const TCNT1L = (volatile uint8_t *) 0x0084;
volatile uint8_t * const TCNT1H = (volatile uint8_t *) 0x0085;

static const uint16_t ticks_per_second = 15625;

void
pause (void)
{
  // Select a /1024 prescaler.
  *TCCR1B = 0b00000101;
  *TCNT1H = 0;
  *TCNT1L = 0;

  for (;;) {
    uint16_t time = 0;
    time |= *TCNT1L;
    time |= *TCNT1H << 8;

    if (time >= ticks_per_second) {
      return;
    }
  }
}

volatile uint8_t * const PORTB = (volatile uint8_t *) 0x0025;
volatile uint8_t * const DDRB = (volatile uint8_t *) 0x0024;

int
main (void)
{
  *DDRB = 1;

  for (;;) {
    *PORTB = 1;
    pause();
    *PORTB = 0;
    pause();
  }
}
An even better version of this program would define a function that takes the microcontroller speed in hz, and the prescaler value, and returns the number of ticks required for one second. This function would be called at run-time rather than hardcoding a value of 15625 ticks. This is left as an exercise for the reader!
The last part of the improved Blink program will be to produce output from the program that can be observed on a serial console. This will require both hardware and software components to achieve.
The ATMega328p has dedicated hardware for sending and receiving data using the USART protocol. Additionally, there are extremely inexpensive USB ↔ USART adapters available. The adapter specified in the bill of materials is the FTDI LC234X, and is actually sold as a development board in order to demonstrate the capabilities of the onboard FT234XD IC. Any USB ↔ USART adapter will work, but you'll need to adapt the instructions here slightly when it comes to actually physically connecting the device. The approach we're going to take is to have the ATMega328p print messages over a USART connection, and use a USB ↔ USART adapter to allow an ordinary Linux/BSD workstation to access the adapter as a serial console. We can actually use this connection to both send and receive data to and from the microcontroller, but we'll only use it in a receiving capacity on the workstation side for this book.
As usual, we're faced with the problem of working which pins must be connected. Consulting the pin diagram for the ATMega328p shows that pin 2 is the RXD pin for the USART, and pin 3 is the TXD. The terms RXD and TXD were found by reading the section of the datasheet on the USART on page 179. The USART protocol actually dictates that the RXD pin on the sending device must be connected to the TXD pin on the receiver, and vice versa.
Given this information, connecting the device to the breadboard is straightforward.

7.4.7. Connections

  1. Connect the TXD pin on the LC234X to pin 2 on the ATMega328p.
  2. Connect the RXD pin on the LC234X to pin 3 on the ATMega328p.
Note that two pairs of green and yellow wires have been used for the RXD and TXD pins. This was simply to make the board connections easier to see in the photograph.
It's now necessary to configure the USART on the microcontroller, and start printing messages to the serial console. In order to do this, we need to make some decisions as to the parameters we're going to use for serial communication, and then work out which registers we need to use in order to actually configure the hardware.
The first parameter we need to decide upon is the transfer speed that will be used. This is known baud rate, and is expressed in bits per second. A baud rate of 9600 has been a common convention for low power serial devices for many years, and is sufficient for our needs.
The next parameter we need to decide upon is the size of a single character in bits. There is very little reason to use anything other than 8 bits per character, as we can match these to the 8 bit bytes used on almost all existing computer hardware.
The next parameter we need to decide upon is whether we'll include parity bits in the stream. Parity bits are an error detection mechanism that can detect transmission errors due to noise and interference. For simplicity, we won't use parity bits.
Reading the datasheet section on the USART registers, on page 200, shows us that there are three 8-bit control registers (UCSRnA, UCSRnB and UCSRnC), one 16-bit register to specify the baud rate ( UBRRn), and a 16-bit register for sending and receiving data (UDRn). The datasheet is written in a style where the register names include a lowercase n that denotes the nth instance of the register in question on the microcontroller. The larger microcontrollers have multiple USART devices, and so you will see UCSR0A, UCSR1A, UCSR2A, and so on. On the ATMega328P, however, we only have on USART device, so the only registers we will see are numbered at 0.
Jumping ahead to the register summary on page 621, we can immediately extract the following C definitions in the same manner as we did for the I/O ports and timer registers:

7.5.7. USART Registers

volatile uint8_t * const UCSR0A = (volatile uint8_t *) 0x00c0;
volatile uint8_t * const UCSR0B = (volatile uint8_t *) 0x00c1;
volatile uint8_t * const UCSR0C = (volatile uint8_t *) 0x00c2;
volatile uint8_t * const UBRR0L = (volatile uint8_t *) 0x00c4;
volatile uint8_t * const UBRR0H = (volatile uint8_t *) 0x00c5;
volatile uint8_t * const UDR0 = (volatile uint8_t *) 0x00c6;
Inspecting the datasheet for the UCSR0A register shows us that we don't need to touch the register for the initial setup, but we will need to use it during transmission. For example, we're required to check the UDRE0 bit (bit 5) before we attempt to send any data.
Looking at the datasheet for the UCSR0B register indicates that we will need to set several of the bits in order to configure the transmission parameters we decided upon, so let's do that first. Firstly, we need to enable the transmitter by setting bit TXEN0 (bit 3) to 1. In order to select 8-bit characters, we need to set bits in both the UCSR0B and UCSR0C registers:
We also need to calculate the value that will be placed into the UBRR0 register in order to set the baud rate. The table on page 182 gives the equations that describe how to get from a baud rate in bits per second, to a value suitable to be inserted into the UBRR0 register:
We can encapsulate this definition in a function:

7.5.14. Baud Calculation

static uint16_t
usart_ubrr(uint32_t cpu_clock_hz, uint32_t baud) {
  return (cpu_clock_hz / (16 * baud)) - 1;
}
Evaluating usart_ubrr(16000000, 9600) yields 103, which matches the value in the table on page 199. We can now put together all of the initialization code:

7.5.16. USART Init

static const uint8_t TXEN_BIT = 0b00001000;
static const uint8_t UCSZn0_BIT = 0b00000010;
static const uint8_t UCSZn1_BIT = 0b00000100;

void
usart_init(uint32_t baud)
{
  /*
   * Configure the baud rate based on a 16mhz clock.
   */

  const uint16_t ubrr = usart_ubrr(16000000, baud);
  *UBRR0H = (ubrr >> 8);
  *UBRR0L = ubrr & 0xff;

  /*
   * Enable the sender.
   */

  *UCSR0B = TXEN_BIT;

  /*
   * Specify 8-bit bytes.
   */

  *UCSR0C = UCSZn0_BIT | UCSZn1_BIT;
}
The process for sending a single character over the USART interface is fairly simple. We wait for the UDRE0 bit in the UCSRA0 register to become set to 0 by the underlying hardware, and then we place the character we want transmitted into the UDR0 register. This can be encapsulated into a function, and we can add another function that allows for sending entire strings:

7.5.18. USART Transmission

static const uint8_t UDREn_BIT = 0b00100000;

void usart_put_char(uint8_t data) {

  /*
   * Wait for the transmission buffer to become empty.
   */

  while ((*UCSR0A & UDREn_BIT) == 0)
    ;

  *UDR0 = data;
}

void usart_put_string(const char *str)
{
  const char *ptr = text;
  for (;;) {
    if (*ptr == 0) {
      break;
    }
    usart_put_char(*ptr);
    ++ptr;
  }
}
With these changes made, we can now update our Blink program to send messages on startup and each time the LED is turned on and off.

7.5.20. Blink USART

#include <stdint.h>

volatile uint8_t * const UCSR0A = (volatile uint8_t *) 0x00c0;
volatile uint8_t * const UCSR0B = (volatile uint8_t *) 0x00c1;
volatile uint8_t * const UCSR0C = (volatile uint8_t *) 0x00c2;
volatile uint8_t * const UBRR0L = (volatile uint8_t *) 0x00c4;
volatile uint8_t * const UBRR0H = (volatile uint8_t *) 0x00c5;
volatile uint8_t * const UDR0 = (volatile uint8_t *) 0x00c6;

static const uint8_t TXEN_BIT = 0b00001000;
static const uint8_t UCSZn0_BIT = 0b00000010;
static const uint8_t UCSZn1_BIT = 0b00000100;
static const uint8_t UDREn_BIT = 0b00100000;

static uint16_t
usart_ubrr(uint32_t cpu_clock_hz, uint32_t baud) {
  return (cpu_clock_hz / (16 * baud)) - 1;
}

void
usart_init(uint32_t baud)
{
  /*
   * Configure the baud rate based on a 16mhz clock.
   */

  const uint16_t ubrr = usart_ubrr(16000000, baud);
  *UBRR0H = (ubrr >> 8);
  *UBRR0L = ubrr & 0xff;

  /*
   * Enable the sender.
   */

  *UCSR0B = TXEN_BIT;

  /*
   * Specify 8-bit bytes.
   */

  *UCSR0C = UCSZn0_BIT | UCSZn1_BIT;
}

void usart_put_char(uint8_t data) {

  /*
   * Wait for the transmission buffer to become ready.
   */

  while ((*UCSR0A & UDREn_BIT) == 0)
    ;

  *UDR0 = data;
}

void usart_put_string(const char *text)
{
  const char *ptr = text;
  for (;;) {
    if (*ptr == 0) {
      break;
    }
    usart_put_char(*ptr);
    ++ptr;
  }
}

volatile uint8_t * const TCCR1B = (volatile uint8_t *) 0x0081;
volatile uint8_t * const TCNT1L = (volatile uint8_t *) 0x0084;
volatile uint8_t * const TCNT1H = (volatile uint8_t *) 0x0085;

static const uint16_t ticks_per_second = 15625;

void
pause (void)
{
  // Select a /1024 prescaler.
  *TCCR1B = 0b00000101;
  *TCNT1H = 0;
  *TCNT1L = 0;

  for (;;) {
    uint16_t time = 0;
    time |= *TCNT1L;
    time |= *TCNT1H << 8;

    if (time >= ticks_per_second) {
      return;
    }
  }
}

volatile uint8_t * const PORTB = (volatile uint8_t *) 0x0025;
volatile uint8_t * const DDRB = (volatile uint8_t *) 0x0024;

int
main (void)
{
  *DDRB = 0b11111111;

  usart_init(9600);
  usart_put_string("start\n");

  for (;;) {
    *PORTB = 1;
    usart_put_string("led on\n");
    pause();
    *PORTB = 0;
    usart_put_string("led off\n");
    pause();
  }
}
When the LC234X is connected to a Linux workstation, the operating system will typically create a tty device with a name similar to /dev/ttyUSB0. It's possible to use any serial console application to observe data being sent over the LC234X connection. One such application is moserial, shown here receiving data from the serial connection:
This book contains images of many of the tables in the Microchip ATMega328p Datasheet. They are reproduced here to ensure that this book remains usable even if the datasheet is withdrawn from circulation.
io7m | single-page | multi-page | epub | The ATMega328P From (Almost) Nothing