Writing a SNES assembler compiler/disassembler - Day 1

By Sylvain Colinet on December 3, 2021 8:02 PM under Perl 6

Writing a SNES assembler compiler/disassembler

Why ? Because I can. More seriously I have a project where I need to inject new Snes code in a running game and I want to express directly this new code in my Raku component (A webserver service). I want to have special sub that returns me Snes bytecode but that contains Snes assembler.

I tried injecting a SLANG in Raku already. Like writing my $byte-code = SNES lda $42; sta $54; rtl; But it’s rather tricky and I will probably just have a additional Slang with its own grammar in a dedicated file.

use SNES-ASM;

sub unlock-door (%door-id) {
    lda #%door-id
    sta $12
    jmp $4565
    rtl
}

And later in code, I can just do my $unlock-bytecode = SNES::unlock-door(42)

I could just write a custom grammar and have an existing library (libasar) to generate me the bytecode. but since I will write the first part of an assembler (parsing and validating code), why not write a complete assembler anyways?

A byte on the Snes ASM

The Snes only have one accumulator (A) and 2 index register. Most instructions work on these 3 storages space (in 8 or 16 bits mode)

lda load a value in A, sta put a value in A. Number can be written like 42 or more commonly with a $ before to say it's an hexadecimal value $20. A word is 2 bytes long, a long is 3 bytes long.

Generating instructions

Since I don't want to type the whole instruction set and its associated bytecode. I will use the table I refer to when I write SNES code (Sometimes I question my sanity)

With Gumbo and the XML module, I generate a list of instructions from https://wiki.superfamicom.org/65816-reference

@instructions.push(Instruction.new(:inst("ADC"), :addressing(DP-INDEXED-INDIRECT-X), :description("Add With Carry"), :byte(0x61), :alias("")));
@instructions.push(Instruction.new(:inst("ADC"), :addressing(STACK-RELATIVE), :description("Add With Carry"), :byte(0x63), :alias("")));
@instructions.push(Instruction.new(:inst("ADC"), :addressing(DIRECT-PAGE), :description("Add With Carry"), :byte(0x65), :alias("")));

This table is not complete because some instructions are 'ambiguous in their normal form. Something like ldx 42 could be compiled differently if you encode 42 as a word or a byte so I will need to add some stuff later.

Addressing is a generated Enum

Everything is put in a ASM65816.rakumod file

Rant Time - HashSet

When generating this, the addressing part was put in a Set since I want to generate an enumeration from it.

Why does a Mutable Set have to be a Hash and not just a regular Array? It makes sense if you look at how to implement this since in a Hash each key is unique, but I don't get why this has to be exposed this way for the user. Having to write for $myset.keys -> $entry { do stuff} feel so wrong and dumb.

Addressing the Addressing

An instruction is basicly something like <keyword> <addressing>. Addressing is what you are trying to affect with the instruction. Some example :

Nothing : rtl
Constant/Immediate : lda #42 put 42 in A
Address/Absolute : lda $4545 put the value of the address $4545 in A
Indirect : lda ($4545) put the value of the address pointed by $4545
Indexed X : lda $42, X put the value of the address $42, + the value of the X register

You probably saw DP/Direct-page from the example of the instruction table. Direct Page is a special range of address that is basicly the beginning of the RAM (WRAM) of the Snes.

We can already write the Grammar for all the addressing and what I called Number (byte, word, dp, ect...)

grammar Number {
    token byte {
        | '$' <xdigit> ** 1..2
        | \d+<?{ $/ < 0x100}>
    }
    token word {
        | '$' <xdigit> ** 3..4
        | \d+<?{ $/ < 0x10000}>
    }
    token long {
        | '$' <xdigit> ** 1..6
        | \d+<?{ $/ < 0x1000000}>
    }
    token bank {
        <byte>
    }
    token dp {
        <byte>
    }
    token pc-relative {
        | '$' <xdigit> ** 1..4
        | \d+<?{ $/ < 0x10000}>
    }
    token pc-relative-long {
        | '$' <xdigit> ** 4..6
        | \d+<?{ $/ < 0x1000000}>
    }
};

I will probably rename thesebecause they are basicl what I need to encode after the instruction opcode.

This is part of the Addressing grammar. Absolute is a word, since an address that is < $100 is Direct Page.

grammar Addressing is Number {
    token ABSOLUTE {
        <word>
    }
    token ABSOLUTE-INDEXED-INDIRECT {
        '(' <word> ',' 'X' ')'
    }
    token ABSOLUTE-INDEXED-X {
        <word> ',' 'X'
    }
    token ABSOLUTE-INDIRECT {
        '(' <word> ')'
    }
    token ABSOLUTE-LONG {
        <long>
    }
    token ACCUMULATOR {
        'A'
    }
    token DP-INDIRECT-LONG {
        '[' <dp> ']'
    }
    token DP-INDIRECT-LONG-INDEXED-Y {
        '[' <dp> ']' ',' 'Y'
    }
    token DIRECT-PAGE {
        <dp>
    }
    token IMMEDIATE {
        |'#'<word>
        |'#'<byte>
    }
    token IMMEDIATE-BYTE {
        '#'<byte>
    }
    token IMMEDIATE-WORD {
        '#'<word>
    }
    token PROGRAM-COUNTER-RELATIVE {
        <pc-relative>
    }
    ....
}

0 comments

Name

Email Address

URL

Remember personal info?

Comments (You may use HTML tags for style)

About Sylvain Colinet

I blog about Perl 6.

More info »

Sylvain Colinet