Writing a SNES assembler compiler/disassembler - Day 4

Testing

It's time to test what we have written so far. If you look at the asar project, there are already some test files and they come with their own test syntax. It's actually pretty neat since it's embedded in the ASM files comments, so you don't need to write specific tests files.

The format is documented here https://github.com/RPGHacker/asar#test-format but for now, we will just keep the offset and byte value part.

Testing grammar

We could write a loop with like files.IO.slurp.lines and be done to handle this. But let use again a grammar. No action associated with it since it's very basic.

grammar Test-File { token TOP { <thing>+ } token thing { || <test-line> <.eol> || <other> <.eol> } token test-line { ';`' (<offset> | <byte>) (<.ws> <byte>)* { %test-data-array{$offset}[$pos-tab] = $dataas; $dataas = buf8.new; $pos-tab++; } } token byte { $<value> = (<xdigit><xdigit>) { %test-data{$offset}.append($<value>.Str.parse-base(16)); $dataas.append($<value>.Str.parse-base(16)); } } token offset { $<value> = <xdigit> ** 5..6 { $offset = $<value>.Str.parse-base(16); %test-data{$offset} = buf8.new; } } token other { <-[\n]>* } ... }

We fill 2 hashes. One is just a pairing of offset => all bytes for this offset the other is a table used for my own tests where we have only 1 instruction per line and one line of bytes corresponding to the instruction. It makes finding bad assembly errors easier to spot since I can test each instruction with its own assembling.

15: ;`71 35 16: ADC ($37), Y skarsnik@DESKTOP-UIA12T1:/mnt/f/Project/SnesASM$ raku -I lib bin/test-snes-asm.raku t/asm/04-all-instruction.asm Data stopped matching at line 16 for : ADC ($37), Y expected :Buf[uint8]:0x<71 35> got :Buf[uint8]:0x<71 37> Test data does not match

Generating a test file

I generated a file with the 255 instructions supported by the snes cpu asm and manually fill the expected bytecode. It was not perfect but it was still very useful to fix a lot of errors in the sub that generate the operand text.

The file looks like this

;`61 D2 ADC ($D2, X) ;`63 F2 ADC $F2, S ;`65 85 ADC $85 ;`67 A5 ADC [$A5] ;`69 F3

Starting to test...

And then immediately having to fix the ASM grammar. First, let's add a very basic error handling at the TOP token

token TOP { :my Int $*line-number = 1; (<.eol>* <thing>+ <.ws> <eol>* $) || <.error> } ... token eol { $<lines> = \n[\h*\n]* { $*line-number += $<lines>.lines.elems; } } method error { say "Error parsing the ASM: Error at line: ", $*line-number; exit 1; } }

This will probably need to throw later to be propagated.

Ordering issue

If you look at the first few line of the file and the way my instruction tokens are made you can maybe notice an issue with this:

;`63 F2 ADC $F2, S

If you don't see it, remember that you can write ADC $42 that is matched by token instruction:sym<ADC-DIRECT-PAGE> {:i "ADC"<.ws><DIRECT-PAGE>} Raku grammar engine does not backtrack much, so seeing ADC $F2, S the grammar matched a <ADC-DIRECT-PAGE> then when trying to continue the match for the <one-instruction>, it fails because , fit nothing. It will then try <instruction-line> and again match for <ADC-DIRECT-PAGE> and again can't make sense of the , and stop. (Thank the Grammar::Tracer module in Grammar::Debugger to see that)

There is a simple fix for that if you have an issue like that in your grammar: order your match with ||. But I would need to write something like token instruction-adc {<adc-stack-relative> || <adc-direct-page>} and since I did not want to manually write all the instruction grammar, I choose to define an end of instruction token and use the <?before> keyword that allows defining what is supposed to come after what you want to match.

The changed instruction grammar looks like this :

token eoi { <.ws> [\n | ';' | ':' | $] } proto token instruction {*} token instruction:sym<ADC-DP-INDEXED-INDIRECT-X> {:i "ADC"<.ws><DP-INDEXED-INDIRECT-X> <?before <eoi>>} token instruction:sym<ADC-STACK-RELATIVE> {:i "ADC"<.ws><STACK-RELATIVE> <?before <eoi>>} token instruction:sym<ADC-DIRECT-PAGE> {:i "ADC"<.ws><DIRECT-PAGE> <?before <eoi>>} ...

Finalizing

I had to tweak some instruction. The COP and BRK instruction can optionally take a byte operand to identify them, It was mostly a matter of changing the <STACK-RELATIVE> token in the addressing grammar and provide a default value when no byte is specified.

I also have all the instructions taking a <IMMEDIATE> addressing to fix, since the Accumulator and the 2 registers X/Y can work on 8 or 16 bits mode (depends on the CPU flags), LDA #$42 and LDA #$4269 are valid. But LDA #$42 alone is ambiguous, you could assemble it into A9 42 00 or A9 42. I will fix this later and add disambiguous instruction like asar choose to do, eg: LDA.w and LDA.b. For now, they only accept a byte

Let's run the test one last time :

skarsnik@DESKTOP-UIA12T1:/mnt/f/Project/SnesASM$ raku -I lib bin/test-snes-asm.raku t/asm/04-all-instruction.asm Compiled : 257 instructions Compiled succefully

Note that there are 257 instructions since there is both version for COP and BRK

Not bad, but we will need things like labels support to have a usable assembler :)

Leave a comment

About Sylvain Colinet

user-pic I blog about Perl 6.