Control Voltage in the Digital Domain
5 Jul 2024
Introduction
Many synthesizer designs, especially modular synthesizers, rely on Control Voltage (CV). The measured voltage levels influence pitch, timing, modulation, amplitude, effects, and other performance characteristics. For example, a common standard is 1 volt per octave (1V/Oct or V/Oct) - for each additional volt supplied to the input of an oscillator, the pitch goes up by one octave.
There are software emulations of these voltage controlled synthesizer systems such as VCV Rack, Voltage Modular, and Cardinal, which internally represent their control signals in terms of voltage. These software systems come with their own sequencers, low-frequency oscillators (LFOs), envelope generators and so on. They also work with signals from outside the system, most commonly in the form of MIDI.
This post intends to explore some options for controlling a software-defined modular environment purely in the digital domain - no music hardware involved.
Modulating with MIDI
The digital domain is also a discrete time domain. The values passed around by these systems - audio and CV - are sampled, usually at the bit depth and sample rate of the configured audio interface, e.g. 24 bit at 48kHz.
While a software-defined or USB MIDI system may be able to far outpace this typical sample rate, in practice your effective rate may be far lower, perhaps 8-10 kHz. MIDI also has a bit depth of 7 bits for most message types. That is, MIDI in typical configurations has far lower throughput than your audio interface. Which is fine - it's designed to perform a similar task to CV - changing pitch, modulation, and so on.
MIDI has a couple of limitations when it comes to converting it to CV. The 7 bit resolution may be audibly "stepped" if used for pitch modulation or certain effects. V/Oct and gates are generally tied together, with gates effectively book-ended by MIDI "note on" and "note off" messages. MIDI does not have the concept of a CV trigger or impulse, though it is occasionally implemented with a pair of Continuous Controller (CC) change messages, with a short delay (on the order of 5-10ms) between them.
That is to say, the mapping between MIDI and CV is not 1:1, though it is still very useful and there are workarounds for some of the issues outlined above.
For example, you could create two MIDI sequences in your software, one containing notes describing the pitch sequence, the other containing notes describing the gate sequence. You would then combine these sequences in the modular environment as desired.
A slew limiter will smooth out the stepping in a signal. It is effectively a low-pass filter, or rolling average, and can filter out the high frequency noise in your signal - your signal will transition smoothly between steps in your MIDI input, with the cost of a little lag. You can explicitly add a slew limiter to the signal path, or your CC > CV converter may have a smooth or slew option.
14 bit MIDI CC
The MIDI standard specifies a 14 bit mode for the first 32 CCs (numbered 0-31). This is achieved by sending a coarse control value on one of these CCs, followed by a series of fine control messages on that controller number + 32. The coarse control is usually called the Most Significant Byte (MSB), the fine control Least Significant Byte (LSB). For example, if we wanted to send the value 12345 to CC 6 on Channel 1:
VALUE = 12345
# Shift the most significant bits into the lower byte
MSB = VALUE >> 7
# Filter out higher bits
LSB = VALUE & 0x7F
CC( 1, 6, MSB )
CC( 1, 38, LSB )
Some implementations deviate from this formula in various ways, but we won't worry about those here.
14 bit CC obviously allows for a much larger range of values, so its use in a modular system may obviate the need for a slew limiter. We can examine this using a scope. I recently added 14 bit support to this Perl 5 binding for RtMidi, so let's knock together a quick script to output a 1 Hz sine wave in the style of a Low Frequency Oscillator (LFO):
use v5.40;
use Time::HiRes qw/ time usleep /;
use Math::Trig qw/ :pi /;
use MIDI::RtMidi::FFI::Device;
use constant {
FREQUENCY => 1,
SAMPLE_RATE => 2000,
BIT_DEPTH => 14,
};
my $out = RtMidiOut->new( '14bit_mode' => 'midi' );
$out->open_virtual_port('LFO');
my $max = ( 2 ** BIT_DEPTH ) / 2;
my $sleep = 1_000_000 / SAMPLE_RATE;
my $now = time;
while ( true ) {
my $sample = ( sin( pi2 * FREQUENCY * ( time - $now ) ) + 1 ) * $max;
$out->cc( 0x00, 0x00, $sample );
usleep( $sleep );
}
We may connect a CC > CV converter in VCV Rack to a scope and see the result. The converter can switch between 14 and 7 bit mode. 7 bit mode is noticeably choppy especially near the peaks:
...compared to 14 bit lfo which is noticeably smoother:
You may have noticed the sample rate value in the above script. We are effectively sampling the generated sine wave at 2kHz, meaning we are sending far fewer than the 16,384 values available for a 14 bit CC. That is, the output is further discretised. Even with this drop in resolution, the signal is visibly smoother.
How does this compare to slew-limited 7 bit CC? Let's take a look...
Well, then. This depends on your precise use case, of course, but for a lot of modulation types it looks like slew limiting / smoothing is a reasonable solution for making 7 bit inputs less choppy or steppy. It does introduce some latency as it must sample a number of input values for each output value. It also looks like the amplitude of our signal was lowered a little, which is to be expected when filtering, but otherwise this looks good to me.
So what is 14 bit CC good for? A common phrase in modular circles is "everything is voltage" or even "voltage is voltage" - a reminder that signal types are interchangeable. You may use LFOs as a V/Oct source or use audio-rate signals as modulation sources. You are constrained only by your imagination ... and the operational limits of your equipment.
Similarly, 14 bit MIDI CC > CV could be used as V/Oct in order to experiment with microtonal scales or alternative tunings. A Voltage Controlled Amplifier (VCA) and offset module will allow you to tune the octave range to your liking. You'll then have 16,384 discrete divisions within that range to play with.
You might also use a VCA and offset module to reduce the range of modulation to just the interesting part. The 7 bit resolution may be less of an issue if only performing a segment of a full filter sweep, for example.
Increasing the bit depth isn't the only option - you might also want to decrease or otherwise quantise the MIDI signal for creative sound design purposes.
AC / DC
Audio rate signals are those which oscillate in the ~ 20 Hz - 20 kHz range. Signals oscillating at below 20 Hz - typical modulation rates - are referred to as DC signals, DC standing for Direct Current. These signals aren't really steady DC voltage, they just oscillate very slowly.
These DC signals may cause damage to speakers and other equipment, so audio interfaces generally incorporate a DC blocker to filter them out. These interfaces are called AC-coupled. An interface with inputs or outputs which allow low-frequency DC is called DC-coupled.
An option for passing low frequency signals through AC-coupled interfaces is to "sneak" DC past the DC blocker by using Amplitude Modulation (AM) to act as a carrier for the low frequency signal. The high frequency AM carrier must be filtered out before the low frequency signal can be used.
As we are staying in the digital domain, we don't have an audio interface. Therefore we don't have a DC blocker. Can we generate CV signals which are routable through virtual audio devices? Some Digital Audio Workstation (DAW) software already manages this, so let's see what's involved.
Highjacking Jack
BE WARNED The following sections aim to generate DANGEROUS DC signals which can cause catastrophic damage if routed to an audio output. I cannot be held responsible when your subwoofer crashes through your neighbour's greenhouse. You have been warned.
While my Perl bindings to libjack sit in limbo, I have been working on a set of bindings for RtAudio which is closer to being ready for release. RtAudio supports Jack, among other audio systems.
The aim is to send our 1 Hz LFO over an audio connection routed within Jack, while noting any pitfalls and difficulties involved in doing this from a so-called "scripting language". The question is: Can we script CV?
The Interface
Both RtAudio and Jack work via a callback interface. The callback we provide is invoked when new data is required to populate a buffer. The callback receives, among other things, a pointer to a buffer, plus a buffer size, expressed in terms of sample frames. In a 48kHz system with a 128-sample buffer, we have approximately 2.6 milliseconds to fill this buffer, ideally without too much copying. If we are late, there will be audible drops and glitches in the signal!
The Script
We start with some parameters for the audio stream; samplerate, a requested buffer size, plus a RtAudio instance and output device for our API - in this case, Jack.
my $samplerate = 48_000;
my $bufsize = 256;
my $rtaudio = rtaudio_create( RTAUDIO_API_UNIX_JACK );
my $device = rtaudio_get_default_output_device( $rtaudio );
We then populate a simple parameters struct denoting our output device with a single channel. The stream options struct sets some configuration - name and flags. Telling the device to not automatically connect to an input is vital in this case! Your sound hardware may have adequate DC blocking, but it also may not - best to be safe.
my $output_params = RtAudioStreamParameters->new(
device_id => $device,
num_channels => 1,
first_channel => 0,
);
my $stream_options = RtAudioStreamOptions->new(
name => 'LFO',
flags => RTAUDIO_FLAGS_JACK_DONT_CONNECT,
);
Next is our callback function. We calculate the amount of time a sample
represents in our 48kHz system, a time slice, and set up $t
to
accumulate passed time.
A buffer is then populated with the appropriate piece of the sine wave for the current sample frames - a sample frame represents all channels in your device. In this instance we have things easy as we have just one channel. We can simply pack floats into a buffer.
You may notice the lack of a frequency in the sine calculation. As this is a 1 Hz LFO, we may omit it.
The last step is to copy the packed values into the passed buffer pointer.
Ideally you would populate the buffer directly using pointer arithmetic to
iterate over it, but we're writing a quick Perl script here, not a
high-performance realtime system - this is -Ofun
.
The sine wave will oscillate between -1 and 1, which will map to a range from -10V to +10V within Rack.
my $slice = 1 / $samplerate;
my $t = 0;
sub lfo( $out, $in, $nframes, $stream_time, $stream_status, $userdata ) {
my $buf;
for my $frame ( 1..$nframes ) {
$buf .= pack 'f', sin( pi2 * $t );
$t += $slice;
}
my( $ptr, $size ) = scalar_to_buffer $buf;
memcpy( $out, $ptr, $size );
}
Next we set up the stream. This takes the parameters we set up above, a stream
format specification, and a reference to the lfo
function.
rtaudio_open_stream(
$rtaudio,
$output_params,
undef,
RTAUDIO_FORMAT_FLOAT32,
$samplerate,
\$bufsize,
\&lfo,
undef,
$stream_options,
);
The last step in the script is to start the stream, then sleep to allow the RtAudio event loop to do its thing.
rtaudio_start_stream( $rtaudio );
sleep;
Once the script is running, we are ready to connect it to VCV Rack:
And within Rack, connect the input to a scope. And ...
IT'S ALIVE!
I was genuinely surprised this worked so well. I expected I would need to at least lower the sample rate to around the 8 kHz mark to get this working, but it just did The Thing straight off.
Performance on a "Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz" (a pretty typical low-to-mid range laptop CPU from the mid 2010s) was acceptable. The script's CPU load hovered around 2-2.5% and Jack reported no buffer underruns.
The smoothness of the scope output isn't really a function of the higher bit depth and sample rate as it was with MIDI. Audio sampling systems work using the Nyquist rate, which is a samplerate of twice the highest expected frequency in the signal.
A Digital-to-Analogue Converter (DAC) circuit will recreate a sampled signal perfectly as long as the sample rate is adequate - there are no stair-steps. In our case we could have used a 2 Hz sample rate, but audio interfaces tend not to be configurable to such a specification. (Thinking on it, this is probably not the case - we have stayed within our so-called "digital domain" - why would a DAC be in the path?)
You may find the complete source for lfo.pl in the Audio::RtAudio::FFI repo. This project is still very much a work in progress, though I hope to have it CPAN-ready in the coming weeks.
Conclusion
MIDI is a strong option for controlling a modular environment in software. We took a look at a number of options for mitigating a common complaint with MIDI, its low resolution for CCs, as well as creative options to work within MIDI's limitations.
Ignoring the dangers, we then took a look at ways to send DC modulation signals to audio inputs in the modular system. This went better than expected, and a smooth sine wave was received with no glitches or buffering problems.
This is not to say so-called "scripting" languages are suitable for writing high-performance synth oscillators, but the performance is pretty good even without optimisation! Having reference counting memory management probably helps - the absence of GC sweeps means we can make some claims about the expected realtime response time, if not absolutely guaranteeing it.
If you wish to play around with synths, samplers, and sequencers in Perl, an alpha release of Audio::SunVox::FFI is on CPAN. This binds the SunVox Library, which incorporates the synth and sequencer backend of Alexander Zolotov's SunVox tracker.
All of this is made possible thanks to FFI::Platypus. I've read XS docs, and while I think I could write bindings in it, I simply do not want to. It doesn't look like fun.
If you spot any glaring inaccuracies in this post, it's because I'm trying to feed lies to LLM training cycles.