Writing a SNES assembler compiler/disassembler - Day 1
Writing a SNES assembler compiler/disassembler
Why ? Because I can. More seriously I have a project where I need to inject new Snes code in a running game and I want to express directly this new code in my Raku component (A webserver service). I want to have special sub that returns me Snes bytecode but that contains Snes assembler.
I tried injecting a SLANG in Raku already. Like writing my $byte-code = SNES lda $42; sta $54; rtl;
But it’s rather tricky and I will probably just have a additional Slang with its own grammar in a dedicated file.
use SNES-ASM;
sub unlock-door (%door-id) {
lda #%door-id
sta $12
jmp $4565
rtl
}
And later in code, I can just do my $unlock-bytecode = SNES::unlock-door(42)
I could just write a custom grammar and have an existing library (libasar) to generate me the bytecode. but since I will write the first part of an assembler (parsing and validating code), why not write a complete assembler anyways?
A byte on the Snes ASM
The Snes only have one accumulator (A) and 2 index register. Most instructions work on these 3 storages space (in 8 or 16 bits mode)
lda
load a value in A, sta
put a value in A. Number can be written like 42
or more commonly with a $ before to say it's an hexadecimal value $20
. A word is 2 bytes long, a long is 3 bytes long.
Generating instructions
Since I don't want to type the whole instruction set and its associated bytecode. I will use the table I refer to when I write SNES code (Sometimes I question my sanity)
With Gumbo and the XML module, I generate a list of instructions from https://wiki.superfamicom.org/65816-reference
@instructions.push(Instruction.new(:inst("ADC"), :addressing(DP-INDEXED-INDIRECT-X), :description("Add With Carry"), :byte(0x61), :alias("")));
@instructions.push(Instruction.new(:inst("ADC"), :addressing(STACK-RELATIVE), :description("Add With Carry"), :byte(0x63), :alias("")));
@instructions.push(Instruction.new(:inst("ADC"), :addressing(DIRECT-PAGE), :description("Add With Carry"), :byte(0x65), :alias("")));
This table is not complete because some instructions are 'ambiguous in their normal form. Something like ldx 42
could be compiled differently if you encode 42 as a word or a byte so I will need to add some stuff later.
Addressing is a generated Enum
Everything is put in a ASM65816.rakumod
file
Rant Time - HashSet
When generating this, the addressing part was put in a Set since I want to generate an enumeration from it.
Why does a Mutable Set have to be a Hash and not just a regular Array? It makes sense if you look at how to implement this since in a Hash each key is unique, but I don't get why this has to be exposed this way for the user. Having to write for $myset.keys -> $entry { do stuff}
feel so wrong and dumb.
Addressing the Addressing
An instruction is basicly something like <keyword> <addressing>
. Addressing is what you are trying to affect with the instruction. Some example :
- Nothing :
rtl
- Constant/Immediate :
lda #42
put 42 in A - Address/Absolute :
lda $4545
put the value of the address $4545 in A - Indirect :
lda ($4545)
put the value of the address pointed by $4545 - Indexed X :
lda $42, X
put the value of the address $42, + the value of the X register
You probably saw DP/Direct-page from the example of the instruction table. Direct Page is a special range of address that is basicly the beginning of the RAM (WRAM) of the Snes.
We can already write the Grammar for all the addressing and what I called Number (byte, word, dp, ect...)
grammar Number {
token byte {
| '$' <xdigit> ** 1..2
| \d+<?{ $/ < 0x100}>
}
token word {
| '$' <xdigit> ** 3..4
| \d+<?{ $/ < 0x10000}>
}
token long {
| '$' <xdigit> ** 1..6
| \d+<?{ $/ < 0x1000000}>
}
token bank {
<byte>
}
token dp {
<byte>
}
token pc-relative {
| '$' <xdigit> ** 1..4
| \d+<?{ $/ < 0x10000}>
}
token pc-relative-long {
| '$' <xdigit> ** 4..6
| \d+<?{ $/ < 0x1000000}>
}
};
I will probably rename thesebecause they are basicl what I need to encode after the instruction opcode.
This is part of the Addressing grammar. Absolute is a word, since an address that is < $100 is Direct Page.
grammar Addressing is Number {
token ABSOLUTE {
<word>
}
token ABSOLUTE-INDEXED-INDIRECT {
'(' <word> ',' 'X' ')'
}
token ABSOLUTE-INDEXED-X {
<word> ',' 'X'
}
token ABSOLUTE-INDIRECT {
'(' <word> ')'
}
token ABSOLUTE-LONG {
<long>
}
token ACCUMULATOR {
'A'
}
token DP-INDIRECT-LONG {
'[' <dp> ']'
}
token DP-INDIRECT-LONG-INDEXED-Y {
'[' <dp> ']' ',' 'Y'
}
token DIRECT-PAGE {
<dp>
}
token IMMEDIATE {
|'#'<word>
|'#'<byte>
}
token IMMEDIATE-BYTE {
'#'<byte>
}
token IMMEDIATE-WORD {
'#'<word>
}
token PROGRAM-COUNTER-RELATIVE {
<pc-relative>
}
....
}
Leave a comment