I wanted to add a bit of information. I did some pretty extensive exploration of the PowerSurge (the whole x500/x600 family is called PowerSurge) family back when. My main interest at the time was the Umax S900, but much of the info carries over.
Disclaimer: Some of this stuff is NITPICKING. I admit it up front. My goal is not to take away from the excellent information that Syntho took the time and trouble to post. I just want to clarify a few points and add some detail that might be useful.
1. Speculative Processing:
First of all, you should not have any trouble with Speculative Processing being enabled on a Kansas motherboard. Apple fixed the SP bug between the 9600 and the 9600 Enhanced (Kansas) models.
In fact, if one builds a ROM module with Kansas ROM code on it and installs it in an earlier machine, it will fix the Speculative Processing bug. My current S900 has Kansas ROMs installed.
It sounds like some problems that got blamed on SP were actually incompatibility between Sonnet's drivers and MOTU.
2. PCI Slots and "Bus Mastering"
There's been a lot of confusion about this over the years, because the belief that slots A1 and D2 were "bus mastering" slots was repeated on Mike's XLR8yourmac.com website a lot. A lot.
All of the PCI slots are bus mastering slots.
Excursion to background material:
The PCI system is a bus. Busses support two or more devices. Deciding which device gets to send information onto the bus at any given time is called bus arbitration. The bus arbiter receives requests (BUS REQUEST) to master the bus and grants (BUS GRANT) mastery of the bus to only one device at a time. The device which is currently sending information is the Bus Master. Any card/device which can request the bus and any slot from which mastery of the bus can be requested, is a Bus Mastering Device or Slot.
On all consumer PCs and Macintoshes, all the PCI slots support Bus Request, and Bus Grant from the bus arbiter. They are all Bus Mastering slots.
On the 9500/9600 the two little square chips labeled 343S0182 are the bus arbiters. If you trace the connections to these chips, you will find that they connect to every BR (Bus Request) and BG (Bus Grant) pin on all of the PCI slots, plus one each to the Bandit chips (343S0020).
End Excursion
So, when you have PCI issues it is not because of a lack of bus mastering capability. There are three primary things that can cause slot order to affect compatibility.
First, the bus arbiter starts with slot A1 (D2) or perhaps the Bandit chip when choosing priority for Bus Grants. It's supposed to use a Round Robin scheme after that, but a card that monopolizes the bus could prevent other cards which, perhaps, request and release teh bus frequently, from getting enough access.
Second, card order affects what order the drivers are loaded into memory from the cards' firmware chips. This gets you into the realm of software weirdness. Who knows what the effects are of the order the firmware loads, plus the order the various extensions and CPs load.
Third, there is a software artifact either in the ROM or the OS by Apple that has to do with support for 32 byte cache line transfers. If your card doesn't need these transfers, it won't care whether it's in the top slot or not. This issue is documented in Apple's Tech Note TN_1008.pdf, "Understanding PCI Bus Performance":
===========================
As an example, if two cards (card x and card y) have addresses mapped into segment 8, one at 0x80800000 and another at 0x80801000, the first call to SetProcessorCacheMode from the driver of card x to make a cacheable address space in segment 8 will work. A second call, say from the driver of card y, to modify the cache setting in segment 8 will not work nor will it report an error. This scenario will most likely result in a lower than expected performance for card y, because card y address space is actually cache inhibited which disables PCI transactions of 32-byte cache lines. If the two cards are mapped into different segments, such as 8 and A, then they both can modify the cache settings withintheir perspective segments. This limitation will be relaxed in the future.
============================
So, if a card or its software wants to do this thing, which slot it is in affects which order its request would go in, which affects whether it can or not.
Finally, last thought about PCI slots -- In theory, slot D2 should offer slightly better performance than A1. The 9500/9600 has two separate/independent PCI busses (hence two separate arbiter chips). However, the PCI bus which supports A1, B1 and C1 also supports Grand Central which is Apple's catch-all I/O interface. The floppy, SCSI busses, ethernet, sound, etc. all hang off of Grand Central.
The caveat to the above is that the two PCI busses interface with the CPU through the CPU bus which Hammerhead (343S1190) arbitrates. It is possible that Hammerhead gives Bandit 1 (PCI bus 1) higher priority on the CPU bus than it gives Bandit 2, but I don't know. The reason they might have done that is that all the I/O is hanging off of Bandit 1. However, I doubt that they did.
The PowerSurge architecture was meant to support up to 4 Bandit-type devices with I/O arbitrarily spread amongst them, so it would not have made sense to give one device on the CPU bus higher priority than the others, probably.
3. Twin Turbo Video Card:
The Twin Turbo card was sold in two different versions. The OEM version sold by Apple only works with Apple's 9600 Graphics Extension and limits one to Apple's selection of resolutions in the Monitors CP. IxMicro's retail version of the card works with IxMicro's Extensions and Control Panel and provides much more extensive control of the card's abilities.
The only difference between them (two differences? does the IxMicro version have a VGA connector? cna't remember) is the ROM on board the card and it is easily swapped, as it is in a socket. If one copies the ixMicro ROM to a Flash or EEPROM the chip can then be installed in the Apple version of the card making it more versatile.
4. SCSI:
The last device on the chain must always be terminated. No exceptions.
If you don't terminate the last device and your SCSI chain works, that's lucky. That's also what leads to SCSI voodoo.
Look, when configured according to the fairly simple rules, SCSI works reliably and predictably. SCSI voodoo happens when someone configures SCSI wrong, then it works for a while, then it stops working, usually when they change something, and then they declare SCSI voodoo. The problem wasn't that it didn't work when it should have worked. The problem was that it works sometimes when it should not.
SCSI rules:
1) Every device must have a unique ID -- usually set by jumpers on the device.
2) Both ends of the SCSI bus must be terminated. On Macs with Internal only busses, the Logic board is one end of the SCSI bus (cable starts plugged in there) and the logic board contains termination.
2b) The End of the SCSI bus must be terminated. Not the next device, nor n devices towards the middle. The end of the cable is where the terminator goes.
3) SCSI busses may only have two ends. No Ys allowed.
There's a lot of detail that can go in those rules, but that's the basics.
5. Differences in the 9500/9600 Family:
A) The 9500 was first. Most people dislike the case. The logic board still has six slots.
B) The original 9600 has four differences from the 9500.
1) The power supply connector(s) have been changed a little bit.
2) The case and power supply are different.
3) The ROM changes from $77D.28F2 to $77D$34F2. Also, the ROM chips change from 341S0169 - 341S0172 to 341S0280 - 341S0283 (four ROM chips per board).
4) The CPU card options are upgraded.
So really, the only real difference in the logic boards of the 9500 and original 9600 are the ROMs and the power supply connector, but the latter doesn't affect performance.
C) The Power Macintosh 9600 Enhanced (Apple's public name for Kansas) has 3 differences from the original 9600:
1) The L2 cache on the logic board has been removed.
2) The pins in the CPU slot have been changed slightly. I haven't traced them, but I'm pretty sure there are more 3.3V supply pins and these were taken from other supply pins.
3) The ROMs were updated to $77D.34F5 with chips labeled 341S0380 - 341S0383. This is where the fix for Speculative Processing resides. Note, that the 8600 Enhanced also has these ROMs. No other Power Surge machine has them. The 7300, e.g., was left with the $77D.34F2 ROMs.
Everything else on the 9500/9600 boards is identical. They all used the same chipset and the same logic board layout.
If all else fails, you can distinguish the three boards by reading the part numbers off of the ROM chips. The ROM chips are about 1.1" X .5" long and have 44 pins; 22 down each long side.
The lack of cache chips on the 9600 Enhanced is a fairly big way to identify the Kansas board, though.