Goals:

Ideas for ALSA sequencer kernel

draft 0.01, 26 April 1998
updated with:
- Comments Paul Leonard, April 29 1998
- Comments Frank van de Pol, April 30, 1998

$Id: idea1.html,v 1.1 1998/05/22 09:54:28 frank Exp $

Frank van de Pol (F.K.W.van.de.Pol@inter.nl.net)
Rough version, this version is not yet well structured. All subjects are presented in random order.

Current Situation

The currently available sequencer interfacer for Linux is the /dev/sequencer and /dev/music from the OSS. Though these interfaces are sufficiently usefull for most sequencer applications they have a few shortcomings:

  1. Only one application at a time can have access
  2. Because of non realtime character of a time-shared system like Linux the driver offers a queue in the kernel which is needed to prevent events to be scheduled too late. This queue introduces big latency in event processing.
  3. It's one big monotlithic driver.
Especially the 2nd issue restricts building midi oriented applications that can perform on-par with applications on Apple Macinctoshes and Atari ST's regarding real-time response. Examples:

  1. If one wants to have a sequencer perform a MIDI THRU function, it will suffer form big delays, and using the OUT_OFF_BAND ioctl() results in hanging notes because the driver has no clue what events are playing.
  2. Events have to be enqueue ahead. Parameter changes (eg. volume control, muting tracks or intruments) will not be effective instantly.

New Sequencer

To overcome these disadvantages I'd like to propose a new architecture for scheduling and dispatching MIDI and MIDI oriented events within the Linux sound driver. Note that this is still 'Paper ware', and has yet to be developed.

Some of the idea's I'll present are inspired by the MidiShare "MIDI operating system" (http://www.grame.fr/english/MidiShare.html), which exists for Apple Mac's.

This new sequencer is intended as a replacement for /dev/music.

Hightlights:

Ascii graphic of architecture:

  =========================
  ||                     ||
  ||                    \||/                 \
  ||  +--------+      +--\/------------+      |
  ||  |        |      |                |      |
  ||  | Timing | -->- | Priority Queue |      |
  ||  |        |      |                |      |
  ||  +--------+      +--------||------+      |
  ||                           ||             \ Sequencer
  ||                          \||/            /   Core
  ||  +--------+  +------------\/--------+    |
  ||  | Client |  |                      |    |
  /\  | Manager|  |     Event Router     |    |
 /||\ |        |  |                      |    |
  ||  +--------+  +---||-----------||----+    |
  ||                 \||/         \||/        /
  ||               +--\/--+     +--\/--+
  ||               |Client|     |Client|   
  ||               |  1   | ... |  n   |
  ||               +--||--+     +--||--+
  ||                  ||           ||
  ||                 \||/         \||/
  ||                  \/           \/
  ===================================


All events flow through the queue. Because a priority queue is used instead of a simple FIFO, it's no problem to accept events from multiple clients. (At long as the queue is not full anyway, a well behaved application should send only as much events that are needed to achieve tight playback (eg. 1 second), and not to try keep the queue overflowing.)

Clients can either be user-land applications, accessing /dev/seq or kernel modules that can directly submit events, and are directly called when a event it dispatched to them (even from interrupt mode).

To change internal parameters of the sequencer like eg. tempo & synchronisation a special client will (always) be present within the system. If an application wants to have a tempo change it then simply can send a TEMPO event into the system.

Paul Leonard's comments:

I think we may need several schedulers to allow different clients to vary their tempos independently.

Frank van de Pol's comments:

Good point.

The architecture was designed as a 'single user' system, like is common with musical workstations (and other sequencer systems). In this single user environment it's an advantage to have a single 'master' time/song position.

If there's a need to allow multiple independent (with regarard to timing) concurrent applications; this scheme can be expanded by adding a priority queue and 'master clock' (the "Timing" block from above diagram) per 'user'. The addressing of message will have to be extended with an additional field that determines in which priority queue (or perhaps even queues???) an event has to be put.

Regarding 2nd situation:

I also had some thoughts in this issue. When implementing a priority queue, we need to be able to compare time stamps of events. If there is a constant tempo, that can easily be done. But when a tempo is changed, the number of clock ticks (ppq) per second changes, and events in the queue will not have correct ordering anymore. To avoid this problem, only one type of timestamp is needed. We have basicly two options to choose from:

  1. real/clock-time, expressed in seconds (or fractions of it of course, eg. ms or us.
  2. songposition, expressed in clock ticks, which are related to the tempo the song is playing. for an internal resolution of eg. 1920 PPQ (parts per quarter noter), and a tempo of 135 BPM, there are 1920*135=259200 clock ticks per minute = 4320 ticks per second.
Because of the specific character of music (ie. it has tempo, groove etc.) the second one is prefered. Conversion between these two types is trivial once the currently active tempo and timestamp (in both units!) of last tempo change is known.

>  
> > If there's a need to allow multiple independent (with regarard to timing)
> > concurrent applications; this scheme can be expanded by adding a priority
> > queue and 'master clock' (the "Timing" block from above diagram) per 'user'.
> > The addressing of message will have to be extended with an additional field
> > that determines in which priority queue (or perhaps even queues???) an event
> > has to be put.
> 
>  Yep.
I personally do not see many applications for the 'multi user' architecture, but having a framework that allows to route events to more than one queue (for the multi user system) gives us the possibility to distribute sequencing over multiple boxes!!! This is definitly something that should not be started with, but it is good to know that the architecture allows us to build a whole sequencer cluster somewhere in the future.

So I think "YES", the system should _allow_ multiple queue's and timers (basicly this are multiple sequencer systems), but we can start we supporting only one queue, and add the multiple systems (and perhaps allow distribution over multiple nodes) later.

For terminology:

System = a sequencer Queue + Timer within the ALSA sequencer.
Node   = a computer, that is running the ALSA sequencer.
One ALSA sequencer core can have multiple 'Systems'.

New idea:

I thought my plan over, and only supporting either of two types seems to be always a compromise. Certain timing information like MTC and SMPTE is based on real-time, and pumping it through an songposition based queue while the tempo changes will result in the timing event being delivered at the wrong time...

You just mentioned the magic word: "...having 2 queues...". This concept will leave the decision what type to use to the user or client application. The master clock will have to keep the current time AND songposition; and we can just use two priority queues in parralel. One for the 'clock time' events, and one for the 'song position' events. Having a tempo change will then NOT upset event ordering. If an event is enqueued by an client, we can simply look at the field that determines the type of timestamp, and enqueue it in the appropiate queue.

With this new feature, combined with the event routing for multiple systems, the architectual picture looks like something like:

(Yummy, yet some more ascii art :-)

    [select on system]
 +-->-----O---->-------O------->--.........>.. 
 |        |            |  (to other systems, queue/timer combinations)
 |        V            :  (future)
 |        |
 |        O [select on timestamp type]
 |       / \
 |      /   \
 |     |     |
 |     V     V
 |   +---+ +---+
 |   | P | | P |    Priority Queue 1 = clock time
 |   | r | | r |    Priority Queue 2 = song position
 |   | i | | i |
 |   | o | | o |
 |   | Q | | Q |
 |   | 1 | | 2 |
 |   +---+ +---+
 |     |     |
 |     V     V  Dispatch event, triggered by clock    dispathers from other 
 |   +---+ +---+       +--------------+               systems on this node
 |   | D | | D |  <=== | Master clock |             | (future)
 |   +---+ +---+       +--------------+             +---+
 |     |     |                                        |    |
 |     V     V                                        V    V
 |  +---------------------------------------------------------+
 |  | Event Router                                            |
 |  +---------------------------------------------------------+
 |       |      |    |        |     |     |
 |       V      V    V        V     V     V
 |    [Client] [C]  [C]      [C]   [C]   [C]
 |       |      |    |        |     |     |
 +-<-----+---<--+-<--+----<---+--<--+--<--+
Btw: I used something similar (time stamps in two formats) in a sequencing program:

// class capable of storing time stamps, stores either absolute time (...s)
// or midi tick. 
class EvTime {

public:
        typedef enum {
                Tm_MIDI_Tick = 0,
                Tm_Abs_Time = 1
        } Time_t;

private:
        // should pack in 1 machine word (32 bits)
        //
        unsigned int time:31;   // event time
        Time_t type:1;          // type of timestamp: time/beats

public:
        // constructor
        EvTime() {
                // fill in the defaults
                time = 0;
                type = Tm_MIDI_Tick;
        };


        void SetTick(int t) {
                time = t;
                type = Tm_MIDI_Tick;
        };

        void SetTime(int t) {
                time = t;
                type = Tm_Abs_Time;
        };


        // should pass pointer to tempo track to perform time<->tick conversions
        int GetTick(void) {
                if (type == Tm_MIDI_Tick) {
                        return time;
                } else {
                        cerr << __FILE__ << " " << __LINE__ << " time to tick conversion not yet supported" << endl;
                        return 0;
                }
        };

        int GetTime(void) {
                if (type == Tm_Abs_Time) {
                        return time;
                } else {
                        cerr << __FILE__ << " " << __LINE__ << " time to tick conversion not yet supported" << endl;
                        return 0;
                }
        };
};
Antonio Larrosa's comments:

I'm not too sure of this being a good solution . A ms. is a time unit that never changes (not aplying theorical physics :-)), but controlling clock ticks is very tricky when there are several time changes .

Suppose this situation :

Music starts playing, you place a change tempo event at tick 5000 . Before it is processed, you place another change tempo event at tick 10000 . Now, what is tick 10000 ? , is the 10000 ticks ms. calculated using the current tempo ?, or you are going to run over the list of queued events to take care of that tempo change that is at tick 5000 ?

That's why I think that time should be expressed in ms. or us.

Frank van de Pol's comments:

When using the songposition notation, a tick is and keeps being a fraction a a bar; changing the tempo while playing, will just result in a different number of ticks per second. Events (eg. hihat pattern) that is queued with the 'old' tempo will then follow the new tempo while staying in groove. The tick->second calculation has only to be performed for the event that is to be dequeued. If a time notation (in seconds) was used instead, we indeed should have to run over the list of queued events.

If multiple tempo changes are queued, like in your example, that will work out fine as I'll illustate later.

> 
> That's why I think that time should be expressed in ms. or us. 
Both methods have their advantages and disadvantages. For note events the tempo method is really the best (less trouble for tempo changes). But for events that we want to fire at an exact time stamp (eg. synced to sample playback!), we better have time stamps in seconds (or ms, or us). If there are no (zero, 0) tempo changes, it even doesn't matter wich method to use!

To overcome this compromise, we can use two parallel queues, one for the real-time (seconds), and one for the song position (ticks).

Regarding tempo calculations:

The internal clock always runs at a constant time (well, at least that's what we assume, every crystal deviates a few PPM :-). To illustate what happens with tempo changes that are enqueued (like in your example) I'll try to give an example.

Initial tempo 120 BPM, sequencer resolution 384 PPQ, starting time and song position start at 0, assume 4/4 signiture.

Little piece of terminology: the 384 PPQ means there are 384 parts ('ticks') per quarter note, the 120 BPM means that there are 120 beats (=quater notes) per minute. The 4/4 signiture only indicates we have 4 quarter notes (/4) per bar.

Assume we want to have a note every beat for a whole song, then the sequencer will have to enqueue the note with a 'songpos' of 0, 384, 768, 1152, 1536 etc. etc. (n * 386). Note that this is independent of tempo, the note (say a bassdrum) is always triggered at the same interval within a bar.

I assumed a initial tempo of 120 BPM, so there are 120/60 = 2 beats per second, or (120/60)*384 = 768 ticks per second. or 1/((120/60)*384) = 1.3E-3 seconds per tick.

If the current clock time is 5.2 seconds since start of the song, and the tempo didn't change, the event with a timestamp of 5.2*768 = 3993.6 (or earlier) has to be played.

If a tempo change was scheduled at say time stamp 6144 (the first beat of the 4th bar, perhaps the end of an intro), and sets the new tempo to 145 BPM, the song will keep on playing with the old (last known) tempo of 120 BPM until it reaches timestamp 6144 (at t=8.0 s). So far so good. The tempo change will now be fetched from the queue and update the tempo 'variables'.

store:<
time of tempo change    = 8.0 s
songpos of tempo change = 6144
new tempo = 145.
(our bass drum is still pumping every beat!)

With this new tempo we have (145/60)*384 = 928 ticks per second. If the current clock time is now at 11 seconds since start of song, the event will have a timestamp of [here comes the trick!]: 6144+((11-8.0)*928) = 8928. Calulating the songposion as an offset from last tempo change makes sure we always get the right time & timestamp.

The first beat of bar 12, with a timestamp of (12*4*384)=18432, has a schedule time of 8.0+((18432-6144)/928) = 21.24 seconds.

More (and/or often) tempo changes do not matter for this scheme; if we keep the time and songposition of last tempo change, and the new tempo, we can always (very easy) calculate the songposition from clock time and clock time from songposition.

The internal clock can of course just keep on ticking at the same interval, 10ms for the standard system clock, or faster if another timer source is used.

Is there perhaps some subtile issue I overlooked or misunderstood from your posting?

User-land clients

The applications that are seen as user-land clients just open a /dev/seq device (or something similar), and read()/write() data from/to it, just like they used to do with the /dev/music device. For registering and interrogation the sequencer an ioctl() interface is used.

I'm not sure yet how to provide interface to multiple clients; I'm thinking of a device that can be opened r/w multiple times, or of that's not possible or very difficult a range of devices (/dev/seq0 .../def/seqxxxx). The latter is far more ugly than the first one.

The presense of multiple user-land applications allows for some nice features:

Kernel mode clients

The kernel mode clients reside in kernel modules. These can be stand-alone loadable kernel modules, or modules that offer other functionality (eg. MIDI driver, soundcard driver).

The module level interface exists of following:

  1. client can register itself, provide information about itself and it's capabilities to the system by calling a function exported by the sequencer.
  2. client registers a call-back function that will be called when a event is to be dispatched to the client. This can be called from an interrupt handler. All the call-back function addresses are stored in some sort of a jump table.
  3. the client informs sequencer what broadcasted messages it's interested in.
  4. the client can call an (exported) function within the sequencer to enqueue a message.
  5. unregistering also goes by calling unregister function.
Because the events can be dispatched to kernel mode client immediately, this offers possibilities to provide good midi thru and filtering functions.

Example:

Midi input driver receives midi bytes (interrupt driven), once a complete midi message is received, it is send to the sequencer. This event (which has current timestamp) goes directly to the clients that requested reception of the note events. If there is a kernel mode midi thru module, it is directly called, the midi event is perhaps transformed (swapped channel/port), and send back to the sequencer, where it's directly dispatched to the midi (or synth) client, which plays the event. Nice low-latency midi patch-bay! Because midi runs at 31.250 kps, it will take 0.96 ms to receive a note on message (without running state). Sending this event will also take 0.96 ms, so the event arrives 1.92 ms later at the playback device, which is faster than the typical delay within syntheseisers before a sound is produced.

Other applications (apart from drivers) could be support modules for high-end sequencers, that need fine-graded real-time control.

Client communication

The event passing mechanism is well suited for real-time controls, note events etc. But to access very specific functions of a device (client) like for instance downloading samples to a sample player, or changing the microcode for a DSP or onboard processor a different interface will have to be provided. Make this a special 'for the device or application', or use the sequencer as a multiplexer to pass the data to the driver.

MIDI ports

Note that there is no such thing as a MIDI port, synth device or mixer in this picture. The idea behind this is that a driver for a midi port should be implemented as a 'kernel module'. This 'MIDI Port' module can of course be part of a low-level midi port, and register is from there. There is no need to have both a MIDI lowlevel module and a MIDI sequencer interface module.

To prevent the missery of stuck notes a low-level midi driver should keep an image of which notes are active. In case a note_off message is missed (which is an application error!) the driver can shut the notes when asked.

The precense of a priority queue also offers the opportunitiy to process NOTE events what come with an length. On receiption of such an event (by eg. a midi driver client), the note can be started and the corresponding note off can be enqueued. Using this facility the chance of hanging notes because of abrubt abortion of a midi player will be reduced to 0.

OSS Compatibility

Because allmost every MIDI application that exists for unix makes use of the OSS /dev/sequencer or /dev/music interface, it's a must to provide backwards compatibility for these applications. Users then can use the new sequencer engine, while keeping their old applications. The application writers then can migrate to the new sequencer and make use of improved capabilities.

To achieve the compatibilty a wrapper for /dev/sequencer and /dev/music devices has to be implemented (as a loadable module). This wrapper can simply map the OSS events to the corresponding sequencer calls.

Using OSS as the workhorse

At the time of writing for this document, the only currently available midi and syth driver is the OSS. If a client task is developed that presents itself to the sequencer core as a bunch of input and output device, and simply does read/writes to the OSS /dev/music interface (perhaps directly call the exported functions); a sequencer can be developed and used while using the OSS as workhorse / low-level driver.

Client Registration

To get an application to know what other clients have registered, a few functions have to be provieded to interrogate what clientss are present, and what capabilities they have.

eg.:

	Client 1
		Name: System Timer
		Capabilities: Tempo, Sync input, Sync output

	Client 2
		Name: GUS MIDI Port
		Capabilities: MIDI input, MIDI output, Sync input, 
			Sync output

	Client 3
		Name: GUS GF1
		Capabilities: MIDI input

	Client 4
		Name: GUS MAX Codec
		Capabilites: Sync input, Sync input, Sync output

	Client 5
		Name: Timidity (Soft Synth)
		Capabilites: MIDI input

	Client 7
		Name: XG Editor
		Capabilities: MIDI output

	Client 8
		Name: My Sequencer
		Capabilites: MIDI intput, MIDI output 

Event Structure

All the events have (apart) from their specific content a few common fields:

  1. timestamp (in midi ticks), like oss /dev/music
  2. message type/id (eg. NOTE_ON, CHANGE_TEMPO,...)
  3. destination, to which client(s) is the message to be send. An event can be send to either:
    1. a specific client, in this case the client number has to be given.
    2. all clients that have registered for this (class of) event this is basicly a broadcast.
    This destination also can have (for eg. note events) a port and channel.
  4. source, where did this message come from
Events a modeled after MIDI, but are not restricted to be MIDI events. Any event that one can think of can be fed into the sequencer and dispatched at the specified time to the specified device(s).

Paul Leonard's comments:

MIDI events suck. MIDI should be a wrapper layer for those who have existing applications.

However, the design of good generic interface for the different devices is still quite tricky. If I have written a piece of music for my AWE32 using some of it's cool filter sweeps how will a GUS card play this?

One solution is to have adaptors e.g. AWE32->GUS but this explodes as the number of devices increases. A generic device would need to know about main features of all cards. Then we can have AWE32->generic generic->GUS. The raw Awe32 will provide the neatest possible interface to all the features of the AWE32. If you are not concerned about portibility you can use this direct. If you are concerned about portility you should use the generic interface and an adaptor?

Frank van de Pol's comments:

???? I'm not sure we are talking about the same MIDI. If 'your' MIDI is the low-level MIDI protocol, with all those byte sequences, and only limited to a few standard events I totally agree with you. But (perhaps I was not clear enough) the type of events I was thinking of are modeled after MIDI in the spirit of "Make sound with volume aaa, pitch bbb, on channel ccc of intrument (port) ddd", and "Change parameter aaa to value bbb on channel ccc of instrument dddd". MIDI is limited in may ways because it consists of short packets of 7 bit values. We do not have that limit for specifying an event.

Because I had my own home studio (with Steinberg Cubase as sequencer, an XG synth module, a Kawai K1 synth, a Novation BassStation synth and a Boss RV70 Reverb) in mind while composing my proposal, I know that even with the restricted 'bare bones' MIDI standard it is possible to access all parameters in my equipment (and there are much more parameters to fiddle with than can be found on a GUS or AWE...). When dealing with MIDI equipment there are a few commonly used ways to give access to the sound characteristics (I'm sure you are familiar with these):

  1. Program changes (only 0-127). To work around this restriction, synth vendors came up with two alternatives to enhance addressing: GS (Roland) and XG (Yamaha). Both standards use a controllor to select the bank the program will selected from.
  2. Other channel messages eg. pitch bed, channel aftertouch, polyphonic aftertouch.
  3. Control changes. (128 controllers, 7 bits). This is intended for real-time control of sound characteristics. Think of panning, volume, modulation, filter control: Frequency/Resonance (also known as 'Brightness' and 'Harmonic Content' etc. To extend the numer of available controllers and range of the value (to 14 bits) Non Registered and Registered Parameter Numbers are used (RPN/NRPN). XG Makes heavy use of NRPN.
  4. System exclusive messages. Vendor/Synth specific. Used in almost all syntheseisers to allow access to every configurable or programmable bit. Allows even download of samples... These messages are mostly non portable and require 'computer programmer musicians' or special applications (synth editors, bank managers.
I think because of the huge installed base of MIDI equipment that should be the first target to aim at, and use the methods shown above. For the synths that are not 'stand-alone boxes that are hooked up with MIDI cables' like the ones that are integrated on soundcards (OPL3, GUS, AWE, ...) I think the best approach is to let the outside world (ie. the sequencer applications) see these devices as 'just another synth'. All the specials and extras, including programming capability should be wrapped in the standard available control messages.

Advantages of this approach:

  1. From musicians's point of view an internal synth (eg. my GUS) doesn't look different from an external unit (both have there specific messages to massage the sound).
  2. Common interface for application programmer.
  3. If the Linux system is only used as patchbay, with MIDI->Internal Synth routing, one can use an external sequencer and make fully use of the internal unit. (Great PLUS, my GUS is basicly a neat sampler, but only usefull for playing MOD files... If I can drive it from Cubase on my other computer, I've just another synth. Same applies if one wants to hook up a MIDI keyboard, and want to play using internal sounds.
> However, the design of good generic interface for the different devices
> is still quite tricky.
> If I have written a piece of music for my AWE32 using some of it's cool
> filter sweeps 
> how will a GUS card play this ? One solution is to have adaptors e.g.
If it's a special AWE32 feature that itsn't supported by the GUS, it will just play, but without the great effects. This same applies using 'normal' syntheseisers. If a send data sequenced for my K1 to the XG module, it will sound very different, and the XG lacks the great analog sound of the BassStation, and my reverb has far more parameters than the reverb unit of the XG synth etc. etc. etc. etc.

Bottom line: I don't think we should make life more complicated than it is, and just accept the differences between the devices (otherwise one would only but 1 synth and that's it; the reason for excistence of many devices are their own strength and weaknesses.)

To make life a little more easy, we can try to standardize the internal synths to GM and perhaps extend it with some of the XG features.

> AWE32->GUS but this
> explodes as the number of devices increases. A generic device would need
> to know about main features
> of all cards. Then we can have AWE32->generic   generic->GUS
No. If you have a song composed for AWE32, it will only sound _exactly same_ on AWE32 and on nothing else (ehh, perhaps on a AWE33 or AWE64 of these are enhanced versions of the former.)

I think a AWE32 can be seen as: "generic + AWE32 specials". The GUS can be seend as "generic + GUS specials". A standalone XG module (eg. Yamaha MU90R) is "generic + MU90R specials". If a AWE32 specific song is played to another device, eg. GUS, the latter will not understand the specials and just discard the extra data; the generic events will be played though

 
> The trouble is that to make the application software easy to use it
> needs to understand
> how to control your synth. For example some filter sweep may be
> controlled using 
> the non registered midi controls. This as you know requires sending a
> high order byte
> and a low order byte. Using cakewalk aprentice (it came with my card)
> implemeting a filter
> sweep is a long boring task because  . .  . well you can imagine why.
This can be made more user-friendly if the sequencer supports the NRPN controller as ONE event (instead of distinct messages).

Usage can be made even easier if the is a standard for the controller messages, and the application has predefined controller codes for the often used ones. Steinberg Cubase support this, and lets you just pick a controller from a list (also RPN/NRPN).

But sadly there are only few things standarised in the not-so-perfect MIDI world. To live with this, the user will always have to be given access to the send 'just a controller' (and even sending the ugly high/low byte sequences like you did with Cakewalk).

As the interface to the synths that are build into the soundcards can be (and has to) defined by ourselfes, we can make them 'compatible'. For the outboard MIDI equipment this is not possible. I think is a great win if this standard is compatible with the majority of outside MIDI equipment.

> > >
> > >  MIDI events suck. MIDI should be a wrapper layer for those who have
> > > existing applications.
> > 
> > ???? I'm not sure we are talking about the same MIDI. If 'your' MIDI is the
> > low-level MIDI protocol, with all those byte sequences, and only limited to
> > a few standard events I totally agree with you. But (perhaps I was not clear
> > enough) the type of events I was thinking of are modeled after MIDI in the
> > spirit of "Make sound with volume aaa, pitch bbb, on channel ccc of
> > intrument (port) ddd", and "Change parameter aaa to value bbb on channel ccc
> > of instrument dddd". MIDI is limited in may ways because it consists of
> > short packets of 7 bit values. We do not have that limit for specifying an
> > event.
> 
> 
>  Yep I was mainly talking about the byte sequence from the external
> devices. 
> I think as far as an aplication is concerned it wants to see a Voice. A
> voice
> should support all the standard midi functions if possible. The
> device/channel
> distinction is not needed at the highest level but is required when you
> create
> allocate a voice.
I think we have a subtile misunderstanding here. I'm thinking of the MIDI concept, I guess I did not communicate that clearly when using the instrument/port in my example (stupid me). I should NOT have used the word "instrument" when refering to a "port".

port    = (for MIDI) the interface to hook up the instruments (you know, the
          5 pin DIN connectors). When refering to an on-board syntheseiser,
          this can be seen as one (or perhaps more) ports. Each MIDI port
          supports 16 channels. Instead of port, we also refer to is as MIDI
          buss. 

channel = one of the 16 channels on a MIDI buss. There can be multiple notes
          playing concurrently on one channel.
In a physical MIDI setup, one can daisy chain the MIDI devices. Logically it looks like all the daisy chained devices are parallel. One can setup the devices (ie. synths) to 'listen' for a specific channel. Or one can have synths that listen to many (or all) channels (most current synths do so).

A typical use is to use the MIDI channels to specify which 'instrument' should play a certain events. Eg. the piano is set to channel 1, the bass has channel 2, strings on channel 6, and drums on channel 10.

If the synth is capable off (most do), we can sent multiple note on event to a certain channel, to get the piano mentioned earlier to play some chords.

Small MIDI system: one or more synths connected to one single port. The musician has only 16 channels for addressing events to the right 'sound'. To overcome this limitation, more MIDI output ports can be added to a system, where each port drives a separate MIDI buss. On a system with N ports, the musician has access to N*16 different channels.

Frank van de Pol's comments:

> 
> [cut]
> 
> > To make life a little more easy, we can try to standardize the internal
> > synths to GM and perhaps extend it with some of the XG features.
> [cut] 
> > I think a AWE32 can be seen as: "generic + AWE32 specials". The GUS can be
> > seend as "generic + GUS specials". A standalone XG module (eg. Yamaha MU90R)
> > is "generic + MU90R specials". If a AWE32 specific song is played to another
> > device, eg. GUS, the latter will not understand the specials and just
> > discard the extra data; the generic events will be played though
> 
> I agree completely in both points.
> 
> However I have to note something. In the initial message  (sorry, not quoted),
> you told something about multiple clients being able to open the same
> device r/w, so that many applications can use midi output at the same time.
Perhaps some more explanation is needed to clarify my idea (which is completely different from OSS's model):

"Client" in my proposal is just _anything_ that communicates with the sequencer core. The sequencer core is nothing more than a foundation that does nothing but _queueing_ and _routing messages_ between clients. These clients can either reside in userland or kernel land, and can be applications like sequencers, midi players etc., or can be device drivers that play sounds (MIDI, synths, soft synths) or have inputs (MIDI keyboards, drums etc.), or simply 'massage' the data and act as filters, mappers.

In a typical minimal system that is simply playing a MIDI file, there are always a few clients:

MIDI players sends events into the queue, with a destination address set to the MIDI port. The sequencer core will do the scheduling, and deliver the event to the MIDI port (or syntheseiser) when the time is there. This is basicly the same as you can get with OSS.

The difference with OSS is that there is a router after the queue, and that a priority queue is used instead of a simple FIFO. Clients can put put in chains if you want.

In the above simple example for a MIDI player, one can add a additional client that allows for instance to change the pitch of certain note events (only specific channels). All this imaginary client does is simply shift the pitch of the received data by 2 semi-tones, and send it to the MIDI output. If now the MIDI player does not send it's events directly to the MIDI outpu, but sends it to this 'pitchshift' client; we get to hear the same song, now with a different pitch. Okay you'll think; whats the use for this pitch shifter? Not much I think, but it illustates one big strenght of this concept:

The Sequencing and event handling is build upon a few components and not on one big monolithic "we want it to do everything" driver. There is NO NEED to implement MIDI thru, filtering etc. in the sequencer core. If one wants to have that functionality, just add a client. If many users (or even all) want to have a certain piece of functionality, we can event include it as a seperate client (either user-land application or kernel module) separate with the driver.

The presencee of a priority queue instead of a FIFO allow us to add new events to be played now or shortly to an queue with events that are already queued for the future. No need to mess with the out-of-band ioctl's.

         
> 
> I think this is not desirable. No, don't get me wrong, of course many
> applications should be able to open midi devices, but I think that it would
> be better done with a midi server than in kernel mode .
Reason for me to want to have the scheduling in the kernel is because that's the only place where I can expect events to be delivered in time. Playback (and recording) can here be triggered by any timer or other (MIDI rx) interrupt. That's the best timing you can get. Why does timing suck on microsoft windows, when compared to the good old Atari? That's because the Atari can have everything interrupt driven, and windows can't.

The concept is not designed (be does allow) to have multiple application playing with the same devices. The basic idea is to make a framework that can be used build sequencer systems. This can range from low-end midi or MOD players to high-end (possibly distributed over many nodes) sequencer and real-time computer synthesis systems. The framework should not be restrictive and allow to grow in the future (That's why it is not an OSS /dev/music emulator.).

> Everything that could be outside the kernel is better out than in . Perhaps
> we could distribute the midi server with the driver, but not inside it .
It's up to the client developer where to place his or her client: in or outside the kernel. I can imagine we could like to have a soft synth. Instead of going though the trouble of developing one in kernel, we can start off with running user-land timidity, and let our MIDI playing applications send their data to the 'timidity client' instead of the 'GUS MIDI' client. For the player it doesn't make a difference (only another destination), and whooom, we have a soft-synth.

> 
> Another argument on this is that the driver should be kept the simpler we
> (Jaroslav) can make it :-)
Fully agreed!!! That's why the sequencer core only has a few queue's and a event router.

> The raw Awe32 will provide the neatest possible interface to all the
> features of the AWE32.
Of course. That's the whole idea. And this raw AWE32 interface can be implemented as a generic set of commands, that is extended with some AWE32 specific. These who combined should give access to every bit in the AWE.

> If you are not concerned about portibility you can use this direct. If
> you are concerned
> about portility you should use the generic interface and an adaptor ?
This not an issue anymore if we use the 'generic + device specific' concept. Only one interface is needed.

Apart from this addressing scheme, a client can also request to get every message, even one that are meant for other clients. This promiscuous-mode allows a device to snoop all data.

Apart for the obious events like MIDI note on/off, control change etc. Some other events can be thought of:

  1. Announcement that new client has registered
  2. Announcement that client as unregistered (is gone)
  3. Change of capabilities for a client messages (?)
  4. Other 'change of state' messages
  5. Change tempo, timer resolution
  6. trigger /dev/dsp devices for instantanious sample starting
  7. wake-up, for creating periodic tasks (when rescheduled) within the kernel drivers.

Difference between /dev/sequencer and /dev/music

For some (historical?) reason OSS provides two different sequencer interfaces, /dev/sequencer (the old one), and /dev/sequencer2, also known as /dev/music. Is there a good reason why a new sequencer core also should provide 2 interfaces? What's exactly the difference between these interfaces. For backwards compatiblity I understand why these two should be implemented, but is there any reason why the functionalty cannot be provided by one (good) sequencer?

Paul Leonard's comments:

_I think_ /dev/music adds some voice/patch management for the OPL3 and GUS cards. Whem I asked I was was told that there was no point using /dev/music for the AWE32 interface.

The AWE32 driver does it's own voice management. [ within the device the AWE32 there are 30 sample players when the user wants to play a note something needs to be clever enough to allocate these sample players for the job ] The /dev/sequencer GUS interface required the user to do this.

Frank van de Pol's comments:

I got the same feeling, though I didn't know of the AWE32 voice allocations in /dev/sequencer level. So for AWE32 there is little difference between the two APIs.

If the only real issue is that the old interface gives lower-level access to a synth device, and such can't be achieved by a simple interface wrapper (eg. access to every single voice in the GUS for playing MOD files), it could be an idea to provide such synths with a CAP_LOWLEVEL_SYNTH, or CAP_LOWLEVEL_AWE32 capability flag. (And perhaps a message to switch from one mode to another.

Idea: If one wants to have access to individual voices, these can also be addressed as a buch of midi channels (eg. 0..31), with each channel representing a singe voices.

Paul Leonard's comments:

With the AWE32 the voices are done in software, each time a note is played the pointers to the sample in memory and all the synth parameters are loaded into the device.

I do not know how the GUS works.

Frank van de Pol's comments:

The gus has a number of (up to 32) 'Sample players' that play data from the on-board memory. There is also some LFO and volume envelope (using volume ramping in hardware).

When looking from programmer's perspective directly to the card at chip level, we see just (up to 32) 'single monophonic' voice generators. Every voice generator has to be (and can be) independently programmed, making it ideal for playback of MOD files, but a pain in the ass for MIDI playback because it lacks dynamic voice allocation. This is how the GUS is used in the /dev/sequencer interface.

The /dev/music tries to turn the GUS into a synth like device, with 16 channels, something special for drums (different sample per note), and up to 32 voices polyphonic, dynamicly allocated over the 16 channels. For MOD players, there is no way to force the card to use a specific tone generator for a specific 'track'.

Comparison both (OSS) interfaces for GUS:

                       /dev/sequencer       /dev/music
 +--------------------+------------------+---------------+
 | total voices       |   14-32          |    14-32      |
 |                    |                  |               |
 | total channels     |   14-32          |    16         |
 |                    |                  |               |
 | voice alloc hidden |   no             |   yes         |
 | from user          |                  |               |
 |                    |                  |               |
 | Drum channel       |   none           |   channel 10  |
 +--------------------+------------------+---------------+
Note that the total number of voices (14 up to 32) is determined by the mode the card is programmed into, and directly affects the sample rate (gets lower with more voices).

The /dev/sequencer is _the_ prefered interface for MOD player writers because it gives them more channels. For the MIDI player writers the /dev/music interface is (or should be) the prefered interface, because it looks more like a normal MIDI device, and the sequencer does not have to do GUS specific voice allocations. The sequencer can then be more generic.

I would love to end up with a sequencer core that has only ONE interface, and can also be used for the MOD players and similar stuff.

Perhaps it helps to make one single GUS driver, that just has more channels:

- total voices:    14-32 (yep, determined by the hardware...)
- total channels:  32, presented as a 'MIDI like' GUS synth with 2 ports 
                   of 16 channels each. (or provide more ports? don't think 
                   that's usefull.)
- voice alloc:     dynamic of the 32 channels.
If there is a need to, we can even add a parameter to set the minimum and maximum number of voices to be allocated for a specific channel (like the Partial Reserve feature offered by Roland Syntheseisers). Set the max. to 1 to get a monophonic channel, set it to 0 to essentially mute the channel.

The Drum/Percussive channel can default to 10 (GM compatible), but should be user configurable. The user should be able to set it to any channel he or she want, and even switch it off. Conforming GS/XG standards, multiple (eg. 3) drum maps can be provided.

For changing the voice allocation and drum parameters we can simple use NRPNs or sysex!!!

Paul Leonard's comments:

I was doing some work on my sequencer over the weekend. I was wondering what the best way to organise the allocation of voices:

In my scheme of things a Voice has a table of effects which implements the interface to a given device. A voice also has an lookup table which maps midi input events onto these effects. However this is not really anything to do with my querry.

 Plan.

 A singleton VoiceManager provides the high level interface.

 An application requests a voice using the folling infomation.


SYNTHTYPE:VOICETYPE:patchname:id
e.g 

 AWE32:VOICE:piano2:1

 *:VOICE:harp:2

 GUS:DRUM:jazz:1
explination:

 - SYNTHTYPE  is the synth type the voice manager will attempt to allocate a voice on the requested
              device. Otherwise ?

 - VOICETYPE  the type of channel required. This is a normal voice of drum. 
	      Maybe polyphonic or mono ?

 - patchname  name of the patch (for example GM). It could also be A
	      program id + bank id ?
              it could also be a traditional midi port and channel.

 - id         two or more tracks may want to share the same voice
              channel. e.g. splitting the drum track
              into snare/ high hat etc. If you want to share a channel
              you use the same id. If you want
              to use the patch but have a different channel you would
              use 2 distinct ids. 
Note that a wild "*" flag could be used if we don't care.

I was going to code this up in a string ?

The voice manager has the task of returning a reference to a voice when given a string. YOu can also free voices (the voice manager can reference count if it is owned by more than one).

Hmmmm, I am not so happy about all this. But I am not happy with the midi port + channel stuff because in my opinion it sucks. If you want a drum channel you use channel 10 (OR is it 16). The AWE32 driver supports the DRUMS on any channel. This is useful if you want to use a heavy reverb on a snare but leave the rest of the kit dry (impossible if you play the kit using a single channel).

Frank van de Pol's comments:

I'm not sure I understand your plan.

Did I read correctly that the voice allocation is completely done by your app? If so, than it looks more like a MOD player.

Is is possible to use another SYNTHTYPE? Can one also use an externaly connected synth, or it it AWE32 only?

 
>  Hmmmm, I am not so happy about all this. But I am not happy with the
> midi port + channel stuff because 
> in my opinion it sucks. If you want a drum channel you use channel 10
> (OR is it 16). The AWE32 driver
> supports the DRUMS on any channel. This is useful if you want to use a
> heavy reverb on a snare but leave
> the rest of the kit dry (impossible if you play the kit using a single
> channel).
Having drums at channel 10 (like 'dictated' with general midi) is not a must.

The extended standards like GS & XG, and all other non GM synths & samplers can have the drums on ony channel one wants. The capability of AWE & GUS to play drums on every channel _is_ compatible with midi, it's just a matter of what standard (if any) to support.

Typical scenario achieveble with XG synths (and thus also with onboard synths if we make'm sort of XG compatible):

quite some knobs to twiddle with...

Synchronisation

One of the points the currently available sequencer solution lacks is synchronisation. Synchronisation can be used in a few places:

  1. Normal master clock is the system timer (typically 10ms).
  2. To get higher resolution, any timer within the system can be used. Most soundcards have a timer onboard that is capable of generating interrupts. This can be used as master clock for the sequencer.
  3. Audio playback can also act as a timesource for synchronisation. By using a counter of the number of samples played for syncing the master clock, it's a good starting point to get MIDI in sync with digital audio.
  4. MIDI clock can also be used as time source for the sequencer. Or one can decide to use MTC. These two protocols can be received and used to adjust the internal clock, or even simpler can be transmitted.
  5. Some cards have a special synchronisation port (SMPTE code, FSK or something similar). This port can also play it's role in the synchronisation game.

Copyright (c) 1998 by Frank van de Pol, Netherlands Advanced Linux Sound Architecture