Lab 1 - Calculating Performance

January 23, 2025

2. The following code fills the emulator's bitmapped display with the colour yellow. Paste this code into the emulator:

The code runs correctly code by pressing the Assemble button, then the Run button. No compiling errors.

4. Calculate how long it takes for the code to execute, assuming a 1 MHz clock speed.

The code works by initializing memory pointers and then running a loop to fill a 32x32 pixel display (spread across 4 memory pages) with a specific color. The setup phase runs once, using 16 cycles to prepare the pointers and initial values. The main loop, which repeats 1024 times (once for each pixel), executes instructions to store the color, update the index, and check if the loop should continue, taking a total of 10232 cycles. After completing each page, the program increments the memory page and checks if all pages are done, adding 47 more cycles. In total, the program takes 10295 cycles, or 10.295 milliseconds, to complete when running at a clock speed of 1 MHz

Performance Improvement

lda #$07 ; color number

ldy #$00 ; set index to 0

loop: sta $0200,y ; write to $0200 + Y

sta $0300,y ; write to $0300 + Y

sta $0400,y ; write to $0400 + Y

sta $0500,y ; write to $0500 + Y

iny ; increment index

bne loop ; repeat until 256 iterations

How I came up with the above calculation??

Instead of relying on pointers, this version of the program directly specifies memory locations in its instructions, making it more efficient. It starts by loading the color value $07 into the accumulator (LDA #$07) and setting the Y register to 0 (LDY #$00) to track the loop’s progress. Inside the loop, the STA instructions write the color value directly to memory addresses $0200, $0300, $0400, and $0500, with Y determining the pixel offsets. After each cycle, INY increases Y, and BNE checks if it has reached its limit, repeating the loop if necessary. This method skips the overhead of working with indirect addressing and speeds up execution. Each STA takes 5 cycles, while INY and BNE take 2 cycles each. Running through 256 iterations, the total execution time is 6403 cycles, or 6.403 milliseconds at a 1 MHz clock speed. By cutting out pointer lookups, this approach is 14% faster than the indirect addressing version, making it a more efficient way to fill the screen.

Search This Blog

Software Portability and Optimization

Lab 1 - Calculating Performance

Comments

Post a Comment

Popular posts from this blog

Lab 2 - Lab Results

Project Stage 1

Lab 5 - Part 1 - Aarch 64