Originally I was planning on using the C lzss implementation you use as it was, however when I was doing initial tests with it, I found that decompressing the NDRV after compressing it with the same implementation resulted in garbage.
Not sure why the decode didn't work for you. Did you try the C decode algorithm or only the fcode decode algorithm? I found that using the default C decode/encode, the decode of the encoded ndrv correctly produced bytes that match the original ndrv:
/Volumes/Work/Programming/lzss/lzss/lzss e driver,AAPL,MacOS,PowerPC.bin driver,AAPL,MacOS,PowerPC.lzss
text: 54648 bytes
code: 30527 bytes (55%)
/Volumes/Work/Programming/lzss/lzss/lzss d driver,AAPL,MacOS,PowerPC.lzss driver,AAPL,MacOS,PowerPC_reverse.bin
bbdiff driver,AAPL,MacOS,PowerPC.bin driver,AAPL,MacOS,PowerPC_reverse.bin
/Volumes/Work/Open Firmware and Name Registry/ROM Firmtek 1S2/dosdude/driver,AAPL,MacOS,PowerPC_reverse.bin and /Volumes/Work/Open Firmware and Name Registry/ROM Firmtek 1S2/dosdude/driver,AAPL,MacOS,PowerPC.bin are identical.
md5 driver,AAPL,MacOS,PowerPC.bin driver,AAPL,MacOS,PowerPC_reverse.bin
MD5 (driver,AAPL,MacOS,PowerPC.bin) = dc939ef1b4a4145f552c012be20211d7
MD5 (driver,AAPL,MacOS,PowerPC_reverse.bin) = dc939ef1b4a4145f552c012be20211d7
Also, I intentionally left the named FCode words in there as I thought it would help prevent a potential issue where tokenized FCode numbers may conflict with those already in the ROM
Fcode words get fcode numbers regardless if they are
external or
headers or
headerless.
(as I wasn't able to detokenize the ROM into a re-tokenizable format).
You manually inserted the fcode of the compression algorithm and compressed ndrv into the fcode of the rom and then updated the checksum?
My DumpPCIRom.sh script can create tokenizable Forth text from fcode. (I need to add a small fix for non-PCI ROM fcode files, then I should make a GitHub repository). It's part of the process I used for modifying Nvidia GPU firmwares so they can work in Old World Macs for example.
https://forums.macrumors.com/threads/question-how-powerful-of-a-graphics-card-will-work-in-a-beige-power-macintosh-g3.2303689/It was also used nearly 20 years ago in the work for flashing PC Radeon graphics cards (7000, 8500, 9000, 9100) for Mac. Back then it was an MPW script.
I got around this issue by making a slight modification to the tokenizer source code, starting the FCode number at 0xA00 instead of 0x800, to ensure the generated FCode numbers would not conflict with those already there.
My tokenizer has a tokenizer command that you can put into the Forth text that changes the fcode number that will be used for the next word.
tokenizer[ a00 next-fcode ]tokenizer
I should fix my DumpPCIRom.sh script to add that line when it encounters a word definition that is not using the expected next fcode number as in the case with your compressed rom.
next-fcode isn't usually necessary. It's there so that the tokenization can more closely match the original fcode since the names produced by DumpPCIRom.sh for headerless words includes the fcode number as part of the name like in this example:
: colon_definition_function_801 \ (801) [0b5 0b7]
my-space \ [103]
9 \ [010]
+ \ [01e]
dup \ [047]
" config-b@" \ [012]
$call-parent \ [209]
5 \ [010]
or \ [024]
swap \ [049]
" config-b!" \ [012]
$call-parent \ [209]
; \ [0c2]