The table of tracks is at offset $1210F0 in the rom. Each entry is 4 LONG values (in big endian format... everything in the Jaguar is big endian). The first LONG is a track global volume as a WORD, followed by a flag (either FFFF or FF05); the second is a pointer to the compressed track score (AND with $3FFFFF to get the offset in the rom); the third is a bitmask of envelops used by the track, and the fourth is a bitmask of waveforms used by the track. A bit set for an envelope without a corresponding waveform bit set indicates an FM instrument as opposed to a wavetable instrument. A track score pointer of 0 indicates the end of the table.
Track scores are compressed using Rob Northen Compression mode 2. They consist of a number of pairs of LONG values. The first LONG is the time you should wait for before processing the second LONG in the pair. The time is incremented by timer 2 at the tempo rate of the song. It's not a relative delay, it's an absolute time. The second LONG is the score event. One field is the voice number - if the voice is currently off, the event is always a note on event with the following form:
b31-29 = unused
b28-26 = voice #
b25-21 = patch # (instrument to play on this voice)
b20-07 = pitch
b06-00 = volume (0 to 127)
There are only 29 instruments used by Rayman; they are in order: acoustic bass, clarinet, kick drum 2, open hihat, electric snare 2, fretless bass, electric piano 1, closed hihat 2, percussive organ, acoustic nylon guitar, rock organ, a custom fm instrument, kalimba, contrabass, synth pad 2, harmonica, oboe, pizzicato strings, distortion guitar, synth brass 1, acoustic grand piano, vibraphone (or is it a vibra slap?), cello, soprano sax, percussive conga (not sure about this?), sitar, xylophone, melodic toms, and some instrument called a hulotte (no idea - but it's French for a tufted owl). No bits are set in the bitmasks for bit 29 to 31, so those are unused.
If a voice is on, the event has a different form:
b31-29 = event, 0 = note off, 1 = pitch change, 2 = change controller, 3 = jump, 4 = pitch change, 5 to 7 unused
For everything but jump, the following fields are defined:
b28-26 = voice
b25-21 = patch #
b20 = flag, 1 = modify value in patch table, 0 = modify value in voice
b19-15 = controller # (for change controller, defined controls are 7 = volume, 9 = pitch bend, 10 = pan)
b14-00 = value (volume, pitch, or pan depending on the event)
Note that there are three ways defined to set the pitch... they all do the same thing. So far that I've seen, the way the pitch is set is using event 4.
For jump, the following fields are defined:
b28-08 = signed offset of jump divided by 8; multiply value by 8 and add to current pointer to get the jump address
b07-00 = count; if b7 is set, ignore the count and always jump; if b7 is not set, skip jump if 0, otherwise decrement b6-0 and do jump
The time is reset to the time for the target address of the jump (the first long).
Note that the count isn't very useful as you'd have to reload the count when replaying the music. None of the tracks I've examined so far use the count - they set b7 so that it always jumps... in fact, it's always the last event in the score - jump back to the very beginning of the score.