Author Topic: Decompiling Forth Words  (Read 4547 times)

Offline Daniel

  • Gold Member
  • *****
  • Posts: 300
  • Programmer, Hacker, Thinker
Decompiling Forth Words
« on: June 18, 2017, 08:38:41 PM »
Hi everyone,

Does anyone know how to decompile open firmware words and retrieve values from them? I need to get the addresses of certain variables that are named on some bootroms and just addresses on others. I found a suitable word that has the addresses contained in it, but I have no idea how to get them out. Any suggestions?

If it helps, I am trying to decompile the word called 'source' to get the addresses of 'xib' and '#xib' (they are only called this on some machines). I may also have to find a way to get the variable '>in' if it is not named on all New World macs.

Offline nanopico

  • Moderator
  • Platinum Member
  • *****
  • Posts: 767
Re: Decompiling Forth Words
« Reply #1 on: June 19, 2017, 12:11:12 PM »
Good luck.  Let me know when you figure it out.
Due to the way the implementation works, everything is a word, and words are just parts of words until you get to the like handful of basic operators.  So you would probably need to find other words to get at what you are doing.
I just found out that a guy who recently started at my employer used to do all sorts of stuff in forth.  It's about the only language he knows.
Kind of neat actually. I'll check with him.
If it ain't broke, don't fix it, or break it so you can fix it!

Offline Daniel

  • Gold Member
  • *****
  • Posts: 300
  • Programmer, Hacker, Thinker
Re: Decompiling Forth Words
« Reply #2 on: June 19, 2017, 12:27:21 PM »
I think I worked out my problem this time, but it would still be nice to know how to do it. Every word and variable in forth is part of a dictionary (or package, as they are called i open firmware), which is a linked list of words. Each dictionary entry uses some format that I am not aware of to store the words name, flags indicating what kind of word it is, and raw data such as code or elements of an array. By using ' (single quote) or $find , you can get an execution token, which points to the dictionary entry for that word. In theory, you can scan the dictionary entry to retrieve constants or other execution tokens. I have no idea how to do this. I just realized that the format is probably specified somewhere, but I have a bit too much to do right now to hunt it down.

Offline nanopico

  • Moderator
  • Platinum Member
  • *****
  • Posts: 767
Re: Decompiling Forth Words
« Reply #3 on: June 20, 2017, 05:49:29 AM »
My brains a little broken at the moment, but ultimately all that is stored as fcode in the end I believe.
In the source for BootX there is a small c utility that converts fcode into c.
May help a bit.
May not.
If it ain't broke, don't fix it, or break it so you can fix it!

Offline powermax

  • Enthusiast Member
  • ***
  • Posts: 80
  • Hobbyist programmer
Re: Decompiling Forth Words
« Reply #4 on: August 22, 2017, 04:16:41 AM »
Does anyone know how to decompile open firmware words and retrieve values from them?

I have a good knowledge of the Apple's OF internals (plus there is some documentation available). Which OF version are you trying to understand?

Offline Daniel

  • Gold Member
  • *****
  • Posts: 300
  • Programmer, Hacker, Thinker
Re: Decompiling Forth Words
« Reply #5 on: August 22, 2017, 07:23:04 AM »
I am mostly concerned with the New World OF implementations (all version numbers that are >= 3?) . I want the bootscript stuff I do to work across all New World systems, but there can be a lot of variation between them. Especially for the components that I try to mess with.

Information about OF would be appreciated. I am mostly interested in the binary format of dictionary entries and what the 'code' functions look like internally.

Offline powermax

  • Enthusiast Member
  • ***
  • Posts: 80
  • Hobbyist programmer
Re: Decompiling Forth Words
« Reply #6 on: August 30, 2017, 12:17:26 PM »
Information about OF would be appreciated. I am mostly interested in the binary format of dictionary entries and what the 'code' functions look like internally.

I don't know how the OF version > 1 looks like internally because I never seen any dump. But I know the internal structure of the OF v1.0.5 sitting in the TNT ROM (1st generation of the PCI PowerMacs). I assume that the internal structure of the later OF implementations by Apple mostly resemble the first implementation, just less bugs and more words.

Apple's OF implementation doesn't contain any interpreter. It contains a small setup program for initializing OF memory and runtime environment as well as low-level words compiled into small chunks of PowerPC machine code. High-level words are in the FCODE byte format.

OF execution works by passing control from one low-level word (that's a piece of machine code) to another. FCODE programs will be converted from byte code to a series of branches to the low-level words before actual execution. Each low-level word passes control to the next low-level word when ready. This runtime model relies on the following register conventions:

r16rHereHeap pointer
r19rTORTop value of return stack
r20rTOSTop value of data stack
r21rNOS2nd value of data stack
r22rTOLTop value of loop stack
r23rNOL2nd value of loop stack
r24rLPLoop stack pointer
r25rSVPointer to OF kernel globals (the so-called "start vector")
r26rFPFrame pointer, no clue what it means
r27rEPException frame pointer, used by CATCH
r28rTTPToken table pointer for translating FCODE token number to code address
r29rMyselfExecution context
r30rRPOF Return stack pointer (rstack)
r31rRPOF data stack pointer (dstack)

Please note that this execution model is specific to Apple's OF implementation.
« Last Edit: August 31, 2017, 03:47:22 AM by powermax »

Offline powermax

  • Enthusiast Member
  • ***
  • Posts: 80
  • Hobbyist programmer
Re: Decompiling Forth Words
« Reply #7 on: August 30, 2017, 12:45:21 PM »
The internal layout of the low-level words in Apple's OF looks like that:

Code: [Select]
Header
Padding to the 4byte boundary
PowerPC machine code for low-level words

The header has the following format expressed using a C-structure (everything is big-endian):

Code: [Select]
int32_t    link;  // offset to the previous word if any, otherwise NULL
uint8_t    flags; // flags describing properties of this word (see below)
uint8_t    type;  // word type (see below)
uint16_t   token; // FCODE token number
PStr       name;  // Pascal (length-prefixed) name of this word if bit 5 of the flags is unset

Now the header flags:

Code: [Select]
Bit 7 - isFindable
Bit 6 - isImmediate
Bit 5 - isHeaderless
Bit 4 - isAlias
Bit 3 - isInstance

Now the word types:

Code: [Select]
0xB7 = subroutine
0xB8 = value
0xB9 = variable
0xBA = constant
0xBC = defer (?)
0xBD = buffer (?)
0xBE = field (?)
0xBF = low-level word, contains raw PowerPC machine code following the header

That's an example of an internal low-level word from OF 1.0.5:

Code: [Select]
dl      0xFFFFFFD8; // offset to previous word
db      0x80; // isFindable
db      0xBF; // type = low-level word
dw      0x0422; // FCODE token number
pstr    "pvr@"; // name = read processor version register
db      0,0,0; // padding bytes for code alignment

stwu   rTOS, -4(rDP); // push cached top of stack value to data stack
mfpvr  rTOS; // load processor version register into rTOS
blr        ; // jump to next word

Now, armed with this information you could try to find out what your precise OF implementation does.
« Last Edit: August 30, 2017, 03:19:48 PM by powermax »

Offline Daniel

  • Gold Member
  • *****
  • Posts: 300
  • Programmer, Hacker, Thinker
Re: Decompiling Forth Words
« Reply #8 on: August 31, 2017, 07:50:45 AM »
The dictionary structure seems to be the same for the mac I tested it on. I can't really test the register assignments right now but they are presumably also correct.

Offline Daniel

  • Gold Member
  • *****
  • Posts: 300
  • Programmer, Hacker, Thinker
Re: Decompiling Forth Words
« Reply #9 on: October 09, 2017, 07:32:10 PM »
I finally figured out how to do this. I can now extract arbitrary addresses from forth words with little effort.
As far as I can tell, all words in the OF dictionary have at least some powerpc code in them. Even words such as value and buffer. Calls to another word are usually done with either the b or the bl instruction.
Code: [Select]
: get-bl-adr dup @ dup 7fffffc and dup 3000000 and if fc000000 or then swap 2 and if swap drop else + then ;This useful function takes the address to a b or bl instruction and decodes it to get the address it points to. It is meant to be used like this:
Code: [Select]
' .registers c + get-bl-adrThis retrieves the execution token of the word called ci-regs. ci-regs returns the address of the buffer used to hold the registers of the client interface program in OF. This lets you read and/or modify those registers easily.

Annoyingly, ci-regs does not have an official name on most NewWorld systems. But the code I figured out lets me access it anyways. This means that the trick I am about to describe will (probably) work on all NewWorld Macs.

The following code displays register and call stack information every time the Trampoline calls the client interface function 'open':
Code: [Select]
: get-bl-adr dup @ dup 7fffffc and dup 3000000 and if fc000000 or then swap 2 and if swap drop else + then ;
' .registers c + get-bl-adr 8 + @ value reg-buf
: print-frame-info dup dup ." @" 8 u.r @ ."  SP: " 8 u.r ."  CR: " dup 4 + @ 8 u.r ."  LR: " dup 8 + @ 8 u.r cr @ ;

: dumpcistack  reg-buf 4 + @ begin print-frame-info dup 0= until drop ;
dev /openprom/client-services
: open .registers cr reg-buf 4 + @ ." cur-stack is " 8 u.r cr dumpcistack open ;
mac-boot
Here is an example of its output:
Code: [Select]
Client's Fix Pt Regs:
 00 00000080 0010fcf8 deadbeef 001198c8 001198c4 00000001 0010ff34 001198c8
 08 001198d4 0010fd5c 0010fda4 0010ff65 001198b8 00000000 00104800 001047e0
 10 00000000 00104798 00000000 001170a0 9e9044d8 9e9044d8 00106b34 00218288
 18 0026f1e0 00000000 00000000 001170a4 ffffffff 00119890 001198d8 0010fd60
Special Regs:
    %IV: 00000300   %SRR0: 0020dc68   %SRR1: 00003030
    %CR: 4884204c     %LR: 0020dc68    %CTR: ff80a290    %XER: 00000000
   %DAR: 00b94000  %DSISR: 42000000   %SDR1: 1ffe0000

cur-stack is 0010fcf8
@0010fcf8 SP: 0010fd38 CR: 00104798 LR: 00000000
@0010fd38 SP: 00110268 CR: 28242087 LR: 0020a910
@00110268 SP: 00110798 CR: 2884204c LR: 0020b558
@00110798 SP: 00110cc8 CR: 2824204c LR: 0020b480
@00110cc8 SP: 001111f8 CR: 28242086 LR: 0020b558
@001111f8 SP: 00111728 CR: 28242086 LR: 0020b480
@00111728 SP: 00111c58 CR: 28242044 LR: 0020b558
@00111c58 SP: 00112188 CR: 28442087 LR: 0020b558
@00112188 SP: 001126b8 CR: 28242087 LR: 0020b558
@001126b8 SP: 00112be8 CR: 28242087 LR: 0020b558
@00112be8 SP: 00113118 CR: 28242087 LR: 0020b558
@00113118 SP: 00113648 CR: 28242087 LR: 0020b558
@00113648 SP: 00113b78 CR: 28242087 LR: 0020b558
@00113b78 SP: 001140a8 CR: 28242087 LR: 0020b558
@001140a8 SP: 001145d8 CR: 28242087 LR: 0020b558
@001145d8 SP: 00114b08 CR: 28242087 LR: 0020b558
@00114b08 SP: 00115038 CR: 28242087 LR: 0020b558
@00115038 SP: 00115568 CR: 28242082 LR: 0020b558
@00115568 SP: 00115a98 CR: 28242082 LR: 0020b558
@00115a98 SP: 00115fc8 CR: 28242082 LR: 0020b558
@00115fc8 SP: 001164f8 CR: 28242082 LR: 0020b558
@001164f8 SP: 00116a28 CR: 28244082 LR: 0020b480
@00116a28 SP: 00116f78 CR: 8440004c LR: 00205644
@00116f78 SP: 00000000 CR: 00000000 LR: 00000000
I haven't gotten around to it yet, but the stack also stores function parameters and saved registers...