I haven't done much with the 32X, but my impression is that you can setup DMA to the PWM registers so for basic sample playback the CPU overhead should be quite low. If you want to do synthesis in a full 3D game, it seems to me the most reasonable choice is to have the 68K do most of the work. If you're not using the Genesis video hardware much (if at all) it doesn't have a whole lot else to do.
My thinking was that unless you're playing back samples, reading from the cartridge isn't really necessary. For FM+PSG, fitting all the song data in Z80 RAM is pretty doable. Nothing wrong with supporting the 32X use case of course, but if you're trying to shave cycles it seems like a pretty reasonable thing to ignore given the availability of the PWM hardware.
I was under the impression that Z80 access to Word RAM did not work properly (or at all). I suppose $400000-$7FFFFF is useful for a cartridge with !CART disconnected.
I remember fixing that. I think ld (hl), h was just plain broken in my Z80 core. IIRC, the bug had to do with the fact that you can't mix the old high byte registers (e.g. ah, bh, ch, dh) with the registers added in x86-64 (e.g. r8-r15). In retrospect, trying to use the legacy register pairs was probably a bad idea, but the workaround I ended up with was to rotate the 16-bit version of the register by 8 in cases when I need to use the high byte of one of those pairs in combination with one of the new registers. This works fine in most cases, but for ld (hl), h I needed both the high byte and the full word. Before I fixed the bug the write was going to $160 instead of $6001 since H and L were flipped.
I remember some discussion around that, but has that ever been confirmed? Do you remember which revisions? I seem to remember there's at least one commercial game that depends on Z80 writes to work RAM working, though I forget which one.

