P H M - Digital Audio Synchronisation

Digital Audio Synchronisation

(first published in “Line Up” magazine)

Introduction

Professional DAT machines, CD players and digital audio mixers invariably have a "sync." input - semi-professional equipment does not and many professional units may seem to work with nothing connected to the sync. input! This article will look at why sync. inputs are provided, what form they can take and what should be connected to them and the difference an appropriate connection can make to the system performance.

"System" is at the heart of the issue of synchronisation. A single piece of digital equipment, for example a DAT recorder used to record and replay in an analogue mode will not benefit significantly whether it is synchronised or "locked" to other equipment. Indeed there may be nothing else to lock it to!

Why lock?

We will consider only AES/EBU signals as they are by far the most often used in professional installations, though many of the issues can apply in systems using the domestic optical or unbalanced SPDIF type digital interfaces as well as multi-channel digital signals such as the Tascam eight channel T-DIF. A separate synchronisation signal is also essential when using the fifty-six channel MADI system.

If locking is so important, it may seem strange that simple systems can operate without any reference signal connections, just the audio itself. This results from the way AES/EBU (and SPDIF) signals are coded. Consider a simple A-D converter converting an arbitrary signal. AES/EBU signals are serial, that is the digital word of sixteen or more bits is sent one bit at a time, the least significant being sent first.

AES/EBU format

Before we can consider the issue of synchronising several digital audio signals, we must first understand what these signals comprise.

It is not difficult to imagine that there could be a long sequence of "0"s or perhaps "1"s in the converted output. In this example the longest continuous sequence of similar values is the set of three "0"s but it is quite possible that there could be a much longer block.

The "bi-phase" coding system shown above uses a clock signal of twice the source rate. If the input data is "1" then the output of the coding is a clock rate signal. When the output is "0" the output is reduced to half clock rate. This means that no matter how long a sequence of "0"s or "1"s may be, the coded output will be clocking at either the bit rate or at the bit rate divided by two. This make its relatively easy to design a receiver which can lock to the incoming signal, even without any reference being applied.

Recovering a signal at clock rate from the data stream is certainly useful but not enough to allow signals to be decoded. It is necessary to know when a new sample is starting to be transmitted - i.e. to sort out which is the least significant, most significant and all the other bits in between. The bi-phase coding principle ensures that the encoded data will always change from "hi" to "lo" or vice versa at the sample bit rate. To indicate the start of a new sample, the preamble "violates" this rule by using the first three clock pulse periods to send a sequence of three "1"s followed by five more clock pulse periods which are used to indicate whether it is the start of a left or right sample or the first of a block of 192 pairs of samples. This latter information is important to recover information from the status and user bits. As AES/EBU data is not polarity sensitive, the three "1"s could of course be seen as three "0"s.

The audio sub-frame therefore looks like this, with a preamble at the start, the audio sample data and some miscellaneous bits at the end that we can ignore for now. This illustration shows twenty four bits available for audio sample data, but this may be reduced to as few as sixteen the now unused least significant bits made available for auxiliary data.

Because the AES/EBU format is a dual channel one, sub-frames are arranged in pairs to carry the two channels, with sub-frame one normally being left and two, right. The preamble for a sub-frame two is modified from that for a sub-frame one to allow channels to be identified. In addition, every 192 frames (=384 sub-frames), a further modified pre-ample is sent to indicate the start of a block of sub-frames. This allows the single channel status bit to built up into a string of 192 bits which are used as twenty four, eight bit words with meanings defined in the AES/EBU specification.

When equipment has only one digital signal source, synchronisation does not become an issue as there is no reference, but once devices are handling multiple AES/EBU signals some common reference becomes essential.

Reference types

A simple way to define a reference is to take an AES/EBU audio signal and consider that to be the reference for the entire installation. If all cabling is short and of a good quality this can be satisfactory but problems can arise when the pulse shape is degraded.

The timing reference is generally taken from the pre-amble and the circuit may use a zero crossing or some other point as the reference. Signal "rounding" can cause this time to be delayed as shown below.

If the signal were simply a continuously repeating pattern of "0"s and "1"s, this would not matter but the bit before the start of a preamble is the parity bit of the previous sub-frame. Whether this is a "0" or "1" depends on the data in that sub-frame. If the parity bit is a "0", a wider pulse is created and this allows time for the voltage to become more negative than when the parity bit is a "1".

This means that zero crossing point of the first "1" of the preamble is therefore delayed. In other words, the timing which is decoded from the pre-amble will be subject to variations, depending on the value of the audio data - hardly a perfect timing reference.

This problem can be demonstrated by considering the situation where one of the audio channels carries a constant 1 kHz tone of say -12 dBFS. The other channel has a 100 Hz signal at a level low enough to cause only the least significant bit in the sub-frame to change alternately from "0" to "1", for example a low level hum. When only the LSB is changing in alternate frames, the parity bit is also changing at the same rate. When this signal is rounded by cable limitations, the timing of the preamble becomes frequency modulated at 100 Hz. Careful analysis of the received 1 kHz signal will show it has acquired low level side bands at 900 Hz and 1.1 kHz. This simple illustration is a useful reminder that electrical characteristics can affect perceived audio quality, even in digital signals!

For the above reasons, it is preferable that when AES/EBU signals are used for reference purposes, they should carry digital silence, though many systems use the sample bits to carry a line up tone. The AES specification (AES11-1991) has defined two levels of performance for AES/EBU reference sources, but does not define what should be contained in the sample data bits. "Grade 1" sources should have a frequency accuracy of better than +/- 1 part per million (0.0001%). "Grade 2" sources are generally adequate for most installation purposes and have a frequency accuracy of better than +/- 10 parts per million.

AES/EBU signals designed specifically for use as references are sometimes termed D.A.R.S. - Digital Audio Reference Signal but there are other options available. A word clock signal is a square wave signal with a time period equal to the sample rate being used. For example it might be a square wave of 32, 44.1 or 48 kHz. A word clock signal can obviously maintain lock of several devices in a system equally as well as an AES/EBU signal though word clock outputs tend to be unbalanced signals on BNC connectors. As such, they may not be as robust for distribution purposes as the 110 ohm balanced AES/EBU signal of several volts.

An alternative form of synchronisation may sometimes be found, often described as 256 x Fs signal. This is a square wave of 256 times the sample rate and can be a useful signal within pieces of equipment where the samples are not AES/EBU coded, but simply serial digital audio data. However, this type of synchronisation is more likely to be used as a signal within a piece of equipment, or between two related units from the same manufacturer.

Video locking

If the digital audio is in any way related to video signals, it is important to lock the audio samples to the video frames. Video frame rates are obviously very much slower than digital audio. For example, European broadcast has twenty five frames (fifty fields) per second so each video frame will contain 1920 audio frames. Unless steps are taken editing the video, or even just switching sources, is almost certain to result in a disturbance to the AES/EBU digital audio stream.

Whatever form of digital audio reference is used, it should be locked to a video reference source so that every video frame has the same number of audio samples associated with it and editing and switching can be carried out smoothly.

By extending this concept, it can be possible to use a video reference signal as the audio synchronisation signal. However, the much slower frame rate of the video reference means that the lock between units may not be so stable. Consider for a moment the method commonly used to achieve a synchronisation system. Each piece of digital audio equipment has an internal clock generator to allow it to operate as a free standing device in the absence of a sync. input. When a sync. signal is available, this is compared to the local clock, often using phase lock loop techniques, to adjusted the clock.

If the external reference signal has a low clock rate, there will be a great many (e.g. 1920) number of audio frames processed between the arrival of each new reference point. This means the system has the potential to drift away to a very considerable degree before any re-synchronisation takes place. The corrected output from the local clock therefore has the potential to be less stable than might be desired. Obviously, design steps can be taken to reduce this form of "jitter" by using averaging techniques but it does mean that a video reference is not normally the preferred sync. choice. As an aside, almost every device will show less jitter in its output when it is free running that when it is locked to as reference source!

SMPTE/EBU timecode is obviously based on video frames so locking to video implies a lock to video timecode.

Time delays

Applying any form of processing to a signal necessarily means that the signal will undergo some delay. Sending a signal down a length of cable will also introduce delay. The AES specification (AES11-1991) requires that the timing difference on any piece of equipment between the reference input and any output should be within 5 % of the sample duration but sending signals through the cables necessary to any real system can result in much greater timing differences.

Consider a system in which the synchronisation signal arrived at the sync. input of each piece of equipment at exactly the same time - obviously very difficult to achieve in practice! Signals fed back from each of those items to some other part of the installation will have signals arriving at different times.

With good design, many devices can accept signals at their inputs which are much more than 5 % of a sample period away from the sync. signal. +/-25 % (i.e. around 5 uS) is often considered a reasonable performance, though some inputs are even more tolerant! If the delay is too great for the received signal to be correctly decoded, additional delay can be introduced using a delay box, though as its name suggests, this corrects by making the signal arrive even later. Some of these units can even operate automatically but a signal arriving late by 8 % of a sample period will be delayed by a further 92 % to bring it into "perfect" synchronisation.

Distribution techniques

The aim should always be to use a "star" form of distribution. Cascading sync. signals through several pieces of equipment can have unpredictable results. In some cases a sync output may be a simple parallel connection from the sync. input but considerations of impedance matching are then important. Only the last device in the chain should have a terminating load applied which, for a D.A.R.S. signal, would be 110 ohms.

If equipment buffers the sync. signal, problems of impedance matching may be removed, but the delay introduced by the buffering can be an unknown quantity. The sync. output may also "benefit" from some of the internal re-timing properties of that piece of equipment so its value as a true sync signal becomes very questionable. There may be occasions when two pieces of equipment are located close to each other but remote from the sync source and a "loop through" of the sync. may be convenient. However, it is worth checking that the signals from these devices are not in any way "marginal" as received back into the overall system by temporarily extending the cabling by a significant amount, say 50m, and proving that the signals are still correctly decoded. Test boxes that simulate this are worth having to hand, to avoid using large drums of cable!

Although there may be minor exceptions as above, a "star" design for the basic synchronisation plan has much to recommend it. It can be a good idea, even for a simple system to draw the sync. system diagram to ensure you have a good system. Particularly, when sample rate converters are involved, unhappy systems can be devised which obtain their lock from one another and are highly unstable! Inspecting the diagram can easily highlight such problems.

A system for a simple audio post production area, mixing audio in conjunction with a video guide track, might be like this:

The initial reference is video one and this is used by the video sections of the video recorder as well as the reference to the D.A.R.S. (audio reference source). Note that the audio section of the video recorder needs the D.A.R.S. signal as well as the video reference. A distribution amplifier is used to provide multiple feeds of the D.A.R.S. signal and prevent any "loop through" or termination impedance problems.

Rate converters

The system shown alongside has been designed to operate at 48000 samples per second as this is the rate used by the video recorder. The CD source obviously requires rate conversion, though many advanced mixing systems may include rate converters as an input option. The cost of sample rate converters has dropped dramatically and it is entirely possible to have an audio mixer with rate conversion on many or even all inputs.

This can make it seem unnecessary to bother with synchronised digital inputs as the rate converters can automatically re-synchronise the incoming data. Whilst this can be true, it is important to consider how the rate converter works if the maximum audio quality is to be maintained. Many systems use a re-sampling technique which over-samples the incoming digital data by a relatively large factor. A variable frequency digital filter is then used to reduce the now many times over-sampled signal to the exact sample rate required to make the output synchronous with the local reference. This technique can work well but there is a risk of additional noise and distortion products being added.

When dealing with outside broadcast remote sources coming into a broadcast centre, there may be no option but to use a sample rate converter as a "synchroniser" as there is no way to lock the source. The technology is advancing rapidly, as costs fall, and no serious degradation will result from this approach. However, when the ultimate performance is required, there can be performance advantages by avoiding rate converters if all the source can be synchronised in some other way.

Failure to achieve lock may for time result in samples that arrive within the +/-25 % (or greater) input lock window and so are correctly decoded. Eventually, the drift between the incoming samples and the local reference will result in samples which are outside the input lock window and there will be a disturbance to the audio. As there is likely to be a more or less fixed difference in rate between the sync reference and the incoming samples, a regular click may be heard and is known as "sample slipping".

Of course the final option to avoid all of these difficulties is to revert back to analogue and re-convert but that is really to loose a lot of system elegance!

All material is copyright PHM © 2004.

P H M (P H Music) :
Ramsbottom : UK
tel: +44 (0)7799 621954
email: info@phmusic.co.uk