Writing a SNES assembler compiler/disassembler - Day 2

First look at generating grammars

This will be very short even if that take me a lot of time to figure this part.

In my ASM65816Grammar.rakumod I manually wrote the Number and Addressing grammar but obiously for the instructions it's not really possible.

General ASM grammar

First let's focus on parsing something simple.

The basic gist of what you can write in an asm file is very short

lda $42 clc adc #3 cmp #0005:beq $4855 ; if $42 + 3 is 5 branch to $4855

You have an instruction per line, or you can have multiple instructions separated with a :, and ; are used to mark a comment.

I don't handle label for now since I just want to dumbly generate a grammar for all the valid instruction from my instruction list and see if that work well.

The final grammar look like that

grammar GrammarASM65816 is InstructionGrammar is export { token TOP { <thing>+ } token thing { || <asm-comment> || <instruction-line> <.ws> <.asm-comment>* || <instruction> <.ws> <.asm-comment>* } token instruction-line{ <instruction> (<.ws> ':' <.ws> <instruction>)+ } token asm-comment { ';' .* } }

Today I learn

Don't use <.ws>* <.ws> already have a quantifier, adding a * make Raku goes on forever. Maybe there should have a warning about that?

InstructionGrammar

Yes, I like to compose my grammar with multiple piece since I plan to support like 2-3 ASM grammars : My own for my SLANG and one compatible with xkas/Asar a tool used in romhacking (this include instructions to specify where to inject code in a ROM)

Before generating each token, let manually write 1-2 token to see if that work. I use a proto token for instruction since I will not really care about indivisual instructions in the Action class.

grammar InstructionGrammar is Addressing { proto token instruction {*} token instruction:sym<LDA> {:i "LDA"<.ws><word>} token instruction:sym<RTL> {:i "RTL"} }

The :i adverb makes the token ignore the case.

let's run this to see if that work

raku -I lib -e 'use ASM65816Grammar; say GrammarASM65816.parse("RTL")' 「RTL」 thing => 「RTL」 instruction => 「RTL」
raku -I lib -e 'use ASM65816Grammar; say GrammarASM65816.parse("RTL:lda 42;piko")' 「RTL:lda 42;piko」 thing => 「RTL:lda 42;piko」 instruction-line => 「RTL:lda 42」 instruction => 「RTL」 0 => 「:lda 42」 instruction => 「lda 42」 word => 「42」

We don't see the ;piko since It's not captured in the grammar. Also my grammar for word is not really right, 42 is not a word but a byte. This is fine for now :)

Generating the intruction tokens

From this advent post https://perl6advent.wordpress.com/2015/12/08/day-8-grammars-generating-grammars/ that generate Raku grammar from BNF grammar it look possible. We need to use ^add_method and EVAL to add our token.

Let's do only simple instructions like TXA that does not take an argument.

sub gen-instru { for @ASM65816::instructions -> $instruct { my $token-name = $instruct.inst ~ '-' ~ $instruct.addressing.Str; if $instruct.addressing == IMPLIED { InstructionGrammar.^add_method("instruction:sym<{ $instruct.inst }>", EVAL "my token instruction:sym<{ $instruct.inst }>" ~ '{:i "' ~ $instruct.inst ~ '"}'); } } }

And....

$ raku -I lib -e 'use ASM65816Grammar; say GrammarASM65816.parse("TXA")' TOP | thing | | asm-comment | | * FAIL | | instruction-line | | | instruction | | | * FAIL | | * FAIL | | instruction | | * FAIL | * FAIL * FAIL Nil skarsnik@DESKTOP-UIA12T1:/mnt/f/Project/SnesASM$

I added Grammar::Tracer to have more details, and that does not work. You can probably tell me, "But wait, in the article they create a new grammar and compose it at the end, maybe that why it does not work?"

I don't really know, let have a look if both grammars have the right method :

#After the Grammar definitons say "Instruction Grammar - Name : ", $_.name, " Method :", $_ if $_.name ~~ /TXA/ for InstructionGrammar.^methods; say "Grammar ASM65816 - Name : ", $_.name, " Method :", $_ if $_.name ~~ /TXA/ for GrammarASM65816.^methods; $ raku -I lib -e 'use ASM65816Grammar; say GrammarASM65816.parse("TXA")' Instruction Grammar - Name : instruction:sym<TXA> Method :token instruction:sym<TXA>{:i "TXA"} Grammar ASM65816 - Name : instruction:sym<TXA> Method :token instruction:sym<TXA>{:i "TXA"}

So yes, the token are here but it probably missing something to be a real token.

So for now I opted to generate the lines defining the instructions tokens and copy/paste then in the file. It's not really great if I want to tweak the whole grammar.

Leave a comment

About Sylvain Colinet

user-pic I blog about Perl 6.