Writing a SNES assembler compiler/disassembler - Day 1

Writing a SNES assembler compiler/disassembler

Why ? Because I can. More seriously I have a project where I need to inject new Snes code in a running game and I want to express directly this new code in my Raku component (A webserver service). I want to have special sub that returns me Snes bytecode but that contains Snes assembler.

I tried injecting a SLANG in Raku already. Like writing my $byte-code = SNES lda $42; sta $54; rtl; But it’s rather tricky and I will probably just have a additional Slang with its own grammar in a dedicated file.

use SNES-ASM; sub unlock-door (%door-id) { lda #%door-id sta $12 jmp $4565 rtl }

And later in code, I can just do my $unlock-bytecode = SNES::unlock-door(42)

I could just write a custom grammar and have an existing library (libasar) to generate me the bytecode. but since I will write the first part of an assembler (parsing and validating code), why not write a complete assembler anyways?

A byte on the Snes ASM

The Snes only have one accumulator (A) and 2 index register. Most instructions work on these 3 storages space (in 8 or 16 bits mode)

lda load a value in A, sta put a value in A. Number can be written like 42 or more commonly with a $ before to say it's an hexadecimal value $20. A word is 2 bytes long, a long is 3 bytes long.

Generating instructions

Since I don't want to type the whole instruction set and its associated bytecode. I will use the table I refer to when I write SNES code (Sometimes I question my sanity)

With Gumbo and the XML module, I generate a list of instructions from https://wiki.superfamicom.org/65816-reference

@instructions.push(Instruction.new(:inst("ADC"), :addressing(DP-INDEXED-INDIRECT-X), :description("Add With Carry"), :byte(0x61), :alias(""))); @instructions.push(Instruction.new(:inst("ADC"), :addressing(STACK-RELATIVE), :description("Add With Carry"), :byte(0x63), :alias(""))); @instructions.push(Instruction.new(:inst("ADC"), :addressing(DIRECT-PAGE), :description("Add With Carry"), :byte(0x65), :alias("")));

This table is not complete because some instructions are 'ambiguous in their normal form. Something like ldx 42 could be compiled differently if you encode 42 as a word or a byte so I will need to add some stuff later.

Addressing is a generated Enum

Everything is put in a ASM65816.rakumod file

Rant Time - HashSet

When generating this, the addressing part was put in a Set since I want to generate an enumeration from it.

Why does a Mutable Set have to be a Hash and not just a regular Array? It makes sense if you look at how to implement this since in a Hash each key is unique, but I don't get why this has to be exposed this way for the user. Having to write for $myset.keys -> $entry { do stuff} feel so wrong and dumb.

Addressing the Addressing

An instruction is basicly something like <keyword> <addressing>. Addressing is what you are trying to affect with the instruction. Some example :

  • Nothing : rtl
  • Constant/Immediate : lda #42 put 42 in A
  • Address/Absolute : lda $4545 put the value of the address $4545 in A
  • Indirect : lda ($4545) put the value of the address pointed by $4545
  • Indexed X : lda $42, X put the value of the address $42, + the value of the X register

You probably saw DP/Direct-page from the example of the instruction table. Direct Page is a special range of address that is basicly the beginning of the RAM (WRAM) of the Snes.

We can already write the Grammar for all the addressing and what I called Number (byte, word, dp, ect...)

grammar Number { token byte { | '$' <xdigit> ** 1..2 | \d+<?{ $/ < 0x100}> } token word { | '$' <xdigit> ** 3..4 | \d+<?{ $/ < 0x10000}> } token long { | '$' <xdigit> ** 1..6 | \d+<?{ $/ < 0x1000000}> } token bank { <byte> } token dp { <byte> } token pc-relative { | '$' <xdigit> ** 1..4 | \d+<?{ $/ < 0x10000}> } token pc-relative-long { | '$' <xdigit> ** 4..6 | \d+<?{ $/ < 0x1000000}> } };

I will probably rename thesebecause they are basicl what I need to encode after the instruction opcode.

This is part of the Addressing grammar. Absolute is a word, since an address that is < $100 is Direct Page.

grammar Addressing is Number { token ABSOLUTE { <word> } token ABSOLUTE-INDEXED-INDIRECT { '(' <word> ',' 'X' ')' } token ABSOLUTE-INDEXED-X { <word> ',' 'X' } token ABSOLUTE-INDIRECT { '(' <word> ')' } token ABSOLUTE-LONG { <long> } token ACCUMULATOR { 'A' } token DP-INDIRECT-LONG { '[' <dp> ']' } token DP-INDIRECT-LONG-INDEXED-Y { '[' <dp> ']' ',' 'Y' } token DIRECT-PAGE { <dp> } token IMMEDIATE { |'#'<word> |'#'<byte> } token IMMEDIATE-BYTE { '#'<byte> } token IMMEDIATE-WORD { '#'<word> } token PROGRAM-COUNTER-RELATIVE { <pc-relative> } .... }

Leave a comment

About Sylvain Colinet

user-pic I blog about Perl 6.