Dave's Hacks

Saturday, March 29, 2025

Polar Modulation - A Deep Dive Into Phase Calculations

This is one of several posts regarding Polar Modulation, the first being here.

In this post I will be digging further into the phase calculations that were introduced earlier.

Where we left off last time, we had talked about how the incoming audio signal can be represented as a phasor, and that the Hilbert Transform generates I and Q signals that represent that phasor.

The phase and magnitude of the waveform can be derived from these I, Q signals:

amplitude = sqrt(I*I, Q*Q)

phase = atan2(Q, I)

We also raised the need to perform Phase Unwrapping, which is caused by the atan2() function only giving a value between -π and +π:

To correctly calculate the phase difference between two successive measurements phase unwrapping must be taken into account. The solution discussed previously was to perform two checks on the phase difference:

if (deltaPhase < 0) deltaPhase = deltaPhase + 2*PI // phase unwrapping

if (deltaPhase >PI) deltaPhase = deltaPhase - 2*PI // phase unwrapping

Even with these calculations, both the simulation and real hardware still exhibit some low level spuri. I already knew that the phase calculations were important, and that adding the second line above made a huge difference under some circumstances. I therefore wondered whether some errors in the delta phase calculations still remained, and whether some improvements to the delta phase calculations could be made.

I started to investigate when exactly the atan2() discontinuity would occur, and whether the phase calculation above were correct under all circumstances. My alernative approach was to consider drawing a line between successive measurements in phase space:

I argued that if the line crosses the negative x axis (as in the diagram above), then the atan2() discontinuity would arise, and the delta phase calculation should take the 2π change into account. Adding or subtracting 2π depends on whether the x axis is crossed from below or from above.

I altered the simulator to calculate the delta phase using using this "line crossing" algorithm. The calculations are nontrivial, and therefore not practical for a real implementation, but are fine for testing in a simulation. Surprisingly the resulting spuri were no better; the much more complex algorithm made no practical difference. I altered the simulator to calculate the phase difference using both algorithms, and to show when the two results were different. This is one instance (with the latest reading to the lower left):

(note the change in scale of the axes)

The prior measurement is 5° at most, and the latest measurement is about -110°. The delta phase calculation is therefore -110° - 5° = -105°.

In this instance the line crossing algorithm did not adjust the delta angle by 2π/360° so it remained at -105°, but the simple algorithm did adjust it, resulting in a positive angle of -105° + 360° = 255°. But which one is correct? This is not easy to answer, which is surprsing because the two answers are so different - one will cause an increase in output frequency, and the other a decrease in frequency. The answer requires us to step back a little and go down another "rabbit hole".

Let us revisit the earlier post's discussion about this plot:

The discussion there highlighted how:

The the step from orange to purple, and from purple to red, are very dramatic negative phase changes.
Very slightly increasing the amplitude of the higher frequency tone causes the result phasor to pass very close to, but now slightly above, the origin, resulting in dramatic positive phase changes.

This is a most interesting result. A tiny change in the amplitude of the higher frequency tone results in a dramatically different phase difference calculation. But the consequence on the generated signal are tiny - the difference in the output spectrum is tiny and corresponds exactly to the input amplitude change. This is despite one output signal having a brief negative frequency blip, and the other having a brief positive frequency blip.

The key conclusion we can draw is that it is the resulting phase shift that is important, and that the specific output frequencies are just to accomplish this phase shift. A desired phase shift can be accomplished by either temporarily speeding up or slowing down the output frequency. And this is what's happening here - in one case the approximately 180° phase shift is accomplished by increasing the frequency, and in the other case by decreasing the frequency. The mathematics is such that the "shortest path" is taken to move from one phase to the next. A longer path (i.e. a rotation in the opposite direction) could be taken, but that would require a higher frequency output.

This insight into the importance of phase rather than frequency helps reconcile the different results produced by the line crossing algorithm and the simple algorithm. They both give a "correct" phase difference - it's just that one result takes "the long way around", and the other gives the "shortest path". This explains why both algorithms gave similar results.

This insight also gives us a way to derive a good phase difference: first calculate the raw phase difference and then calculate the shortest path - if the angle is more than π/180°, go around the other way:

if (deltaPhase <-PI) deltaPhase = deltaPhase + 2*PI // take the shorter path
if (deltaPhase > PI) deltaPhase = deltaPhase - 2*PI // take the shorter path

This "shortest path" algorithm simultaneously addresses the atan2() discontinuity.

The "shortest path" algorithm has been tested both in the simulator and on test hardware, and all results are no worse than the original algorithm at the top of this page.

Conclusion

This discussion has led to some insights into the importance of phase, and the relative unimportance of the direction of frequency change. A new "shortest path" algorithm has been derived and has the benefit of some solid thinking behind it. I now have much more confidence that this is the best the algorithm can be, and that the remaining spuri are for other reasons.

Tuesday, February 25, 2025

Simulating Polar Modulation

Note - this page is work-in-progress so expect changes & additions.

We now have some mathematical and practical insight into polar modulation. We also now have some hints regarding it's bandwidth requirements: under certain circumstances both the amplitude and frequency outputs have a high harmonic content. But a practical circuit implementation will impose some bandwidth constraints. For instance, if we use the si5351 for frequency generation, it's output frequency can only be updated at about 12ksps - imposing an upper bandwith on the frequency output of about 6kHz. We therefore want to understand the consequences of these bandwidth restrictions.

My maths skills are insufficient for me to carry out a theoretical analysis, and I'm not even sure whether it's possible. For instance I was surprised to find that even analysis of a nontrivial FM modulation quickly becomes intractible - see here.

I have therefore sought answers by simulating the algorithm. I've written a C program that internally generates test signals, simulates polar modulation and writes the individual samples into an array. It then displays the fourier transform results of this time-sequence of values.

The simulation parameters I've chosen are:

The simulation should test bandwidths of the amplitude and frequency outputs up to 192ksps.
The carrier frequency should be at least 5x this frequency, or about 1MHz.
The simulation sampling should be at least 2x this frequency, but make it 4x for safety: ie. 4MHz.

With 256K samples my PC performs a Fourier transform and all other simulation calculations in well under a second. This quick evaluation time nicely allows the spectrum results to be refreshed and viewed whilst interactively changing input parameters. The frequency-width of each of the Fourier transform's output array entry = \(\frac{sampling-frequency}{no-of-samples} \approx 15.3Hz\), which is more than enough resolution.

The goal with the simulation code is to remove all possible sources of uncertainty or errors so that the results reflect issues with the core modulation technique only. The Hilbert Transform is nontrival and potentially generates slight amplitude and phase errors in its output. For this reason the Hilbert transform has been removed, and replaced by oscillators that directly generate quadrature phase signals. Also the atan2() and power calculation approximations in Guido's original code are replaced by high accuracy calculations.

The core simulation Pseudocode is:

SAMPLES = 1<<18
Fs = 4e6
dt = 1.0/Fs

float output[SAMPLES]
float outputAmplitude
float freqDelta

float f1 = 1600 // set the modulation signal values
float a1 = 1.0
float f2 = 300
float a2 = 1.0

int AmpSubsampling = Fs/12000 // for 12kHz amplitude updates
int PhaseSubsampling = Fs/12000 // for 12kHz phase updates

for t = 0..SAMPLES
// calculate the modulating signal
t1 = 2*PI*f1*dt*t
t2 = 2*PI*f2*dt*t
I = a1*cos(t1) + a2*cos(t2)
Q = a1*sin(t1) + a2*sin(t2)

if ( t mod AmpSubsampling == 0)

outputAmplitude = sqrt(I*I + Q*Q)

if ( t mod PhaseSubsampling == 0)
float phase = atan2(Q, I);
float dp = phase - prev_phase;
prev_phase = phase;

if(dp < 0) dp += 2*PI; // phase unwrap
if(dp >= PI) dp -= 2*PI; // reverse phase unwrap
freqDelta = dp * Fs/PhaseSubsampling /(2.0*PI)

carrierPhase += 2.0*PI*(fc+freqDelta)*dt
output[t] = outputAmplitude * cos(carrierPhase)

ApplyWindowingFunction(output)
FFTresult[] = FFT_Transform(output)

power[] = convertResultToPowerIndB(FFTresult)
DisplayAsChart(power)

For the Fourier Transform calculation I use this library - it's open-source, quite fast, the code is easily read, and I was already familiar with it, having used it on other projects. Other libraries might be faster, but raw speed is not critical here.

To simulate various conditions, such as amplitude and phase errors that might be introduced by the Hilbert transform, this core simulation code is supplemented with additional calculations.

When I first wrote this code, I spent some time experimenting with it to give myself confidence that it was both doing what was expected, and that I was able to correctly interprete the results. Some experiments were:

Temporarily replaced the modulated carrier with pure sinusoid and verified that the result appeared in the predicted FFT "bin".
Temporarily generated two carriers close together and verified that two peaks were present in the FFT result and at the predicted location.
Temporarily AM modulated the carrier with a low-frequency sinusoid and confirmed the FFT result was a carrier with two side lobes.
Reduced and increased the amplitude of the carrier by a factor of 10 and confirmed that the power output changed by 20dB.

Results with 192kHz Sample Rate

Finally, with the sample rate set to 192kHz, I checked the spectrum with two equal amplitude sinusoids, one at 3.5 kHz and the other ranging in frequency from just below 3.5kHz down to just 40 Hz. In all cases the spectrum is virtually perfect:

The horizontal lines are 10dB apart, and the total span is about 12kHz. This is good evidence that the algorithm is working as expected.

Results with 12kHz Sample Rate - slight amplitude delay

The spectrum results are sensitive to the relative delay between the frequency and amplitude outputs. The best results are when the amplitude signal is delayed by approx. 190 master clock ticks (4MHz rate). This corresponds to a delay of a little over half the 12kHz sample clock.

With the sample rate set to 12kHz, and one input set to 3.5kHz, and the other just 130Hz lower, some spurious signals are present at 45dB down:

When the difference frequency increases to 500Hz, many more spurious signals appear:

At a difference frequency of 1000Hz, the spurious signals have continued to grow but the nearby peaks remain about 40dB down:

With a difference frequency of 1500Hz the spurious signal peaks are 30dB down:

When the difference frequency reaches 2000Hz, the spurious signal peaks are 25dB down:

With a difference frequency of 2500Hz, the spurious signal peaks are <25dB down:

Finally, with a difference frequency of 3000Hz, the spurious signal peaks are 20dB down:

A similar pattern emerges with different sample rates - with a higher sample rate and the same difference-frequency the power of the spuri are lower (and vice versa).

Results with 12kHz Frequency Sample Rate, Increased Amplitude Sample Rate

...tbd...

Sensitivity to Amplitude Variation

In the following charts, the frequencies are kept constant at the standard IMD test tones of 700Hz and 1900Hz , and the 700Hz amplitude is varied:

Equal amplitude:

700Hz at 0.9 amplitude:

700Hz at 0.8 amplitude:

700Hz at 0.5 amplitude:

700Hz at 0.1 amplitude:

To Follow (maybe):

Source of errors arising from lower sampling rate
Alternative phase unwrapping code?
Hardware configuration and results of tests on the hardware
Indications of the algorithm's sensitivity to

Hilbert transform phase & amplitude errors
Delays between the amplitude & frequency outputs

Tuesday, February 18, 2025

Guido's SSB Polar Modulation - Analysis and Application

This blog post is "work in progress" and may be updated from time to time.

The uSDX, is a microprocessor-based Software Defined Transceiver developed by Guido PE1NNZ that has a very interesting technique to generate the SSB transmission. The SSB signal is created by rapidly updating both the output frequency and the output power of a class-E driven Power Amplifier. This design results in a power-efficient and very simple low cost circuit:

That's it - that's the entire transmit circuitry to generate a 5W SSB signal! That's crazy, and so much simpler than typical SSB modulation techniques. I'll call this type of modulation "polar modulation" for reasons that will become apparent later.

Guido's approach is such an attractive idea that the QMX is is also adopting polar modulation.

This blog post describes my journey learning about polar modulation, how it works, what its limitations are, and how those limitations can be minimised in a practical implementation. The journey I took led me down many blind alleys. I'll try to spare you from these, and to chart a more direct and logical path.

My first step was to explore how it was even possible for polar modulation to modulate typical audio inputs. Understsanding it for a single input tone is easy - the output frequency and output power are just kept constant. For instance, if the transceiver is tuned to 7MHz upper sideband, and a modulating tone of 1kHz is applyed, the radio sets the si5351 CLK2 output frequency to 7.001MHz, and the PA amplitude to a value proportional to the input signal's amplitude. If the frequency or amplitude of the single tone modulating input changes, the output frequency and amplitude are updated accordingly.

When two audio signals, say at 1kHz and 1.9kHz, are applied then the output to the antenna should comprise two frequencies - one at 7.001MHz and one at 7.0019MHz. How is this possible when the transmitter is only capable of generating one frequency at a time? Well, it turns out that it is possible - by varying both the output frequency and output amplitude in exactly the right way. This can be shown mathematically in my blog post here.

I've gone on to confirm in both simulation software and laboratory-style tests that indeed a two frequency output signal can be accurately generated using polar modulation. However there are limitations to this approach which I will summarise here, and will explain later how I arrived at these conclusions.

Summary of Findings

The polar modulation algorithm is nonlinear, and becomes increasingly nonlinear as the amplitudes of two input tones approach each other. When this happens, as with any nonlinear system, many harmonics are generated and these harmonics are at frequencies that are multiples of the difference-frequency of the input tones. For example, if the input signals are at 1kHz and 1.9kHz, the difference frequency is 900Hz, and there are harmonics at 1800Hz, 2700Hz, 3600Hz, etc.

The harmonics are critical to the correct generation of the RF output. By "magic" the harmonics generated by the frequency variations exactly cancel the harmonics generated by the amplitude variations, resulting in a clean RF signal. Any reduction in the harmonic signals that are caused by bandwidth limitations in either the frequency or amplitude paths will result in spurious signals in the output.

A rough rule of thumb for the case of equal-amplitude input signals is that the bandwidth of the frequency and amplitude paths need to be approximately 5x the maximum expected frequency-difference in order to keep the output's spurious signals more than 30dB below either test tone. For instance, the si5351 can only be updated at about 12ksps maximum (with overclocking the I2C bus beyond spec), and so has a maximum bandwidth of 6kHz. Such a system will show spurious signals of worse than 30dB whenever equal-amplitude dual-tone signals are applied that have a freequency difference greater than \(\frac{6kHz}{5} = 1200Hz\). However, as will be seen later, improvements are achievable if the amplitude path has a higher bandwidth than the frequency path.

To illustrate the consequences of limiting the bandwith of both paths, here is the simulation result of a 6kHz bandwidth limitation on 300Hz and 1600Hz tones (1300Hz frequency difference) is:

The same test on a laboratory hardware test set-up is:

Note the very strong similarity between the simulated results and real results (allowing for slightly difference horizontal and vertical scales). This is positive evidence of the accuracy of the simulation and my understanding of what is happening.

The Modulation Algorithm

The modulation algorithm at a high level is:

Or in pseudo-code:

modulate(audio):

(I, Q) = HilbertTransform(audio)

amplitude = sqrt(I*I, Q*Q)
phase = atan2(Q, I)

deltaPhase = phase - lastPhase // differentiate
lastPhase = phase

if (deltaPhase < 0) deltaPhase = deltaPhase + 2*PI // phase unwrapping
if (deltaPhase > PI) deltaPhase = deltaPhase - 2*PI // phase unwrapping
frequency = deltaPhase * Const + CarrierFrequency
return (frequency, amplitude)

You might rightly wonder what is happening here - this is a huge jump from the description so far and from the mathematical analysis referred to above.

To understand how this algorithm works it helps to first describe:

The phasor representation of a sinusoid.
The Hilbert transform, as it applies in this narrow application.
Phase unwrapping.

Phasor Representation

The key idea is that, since a sinusoid has both an amplitude and a phase, it can be represented as a vector(arrow) or phasor in a 2D space. The phasor's length is its amplitude, and its frequency is how quickly it rotates around its "tail" in this 2D space. So the tip of a phasor representing a single tone traces a circle in this 2D space. The convention is that the phasor rotates anticlockwise. The horizontal component of this phasor is the level of the signal we see. This is illustrated here.

At any point in time the phasor has a length (the distance from its tail to its tip) and an angle, which is the phasor's angle to the horizontal.

If a signal comprises multiple sinusoids, all the phasors representing these sinusoids are added together tip-to-tail, with each phasor continuing to rotate around its own tail. The result is a new result phasor with its tail at the origin and a tip that traces out a complex pattern in 2D space.

The resulting path could also be traced out by a hypothetical signal generator whose output amplitude and frequency is continually and precisely varied to match. The signal generator's output amplitude is the length of result phasor, and its frequency is the rate of change of the result phasor's angle. This is exactly the calculation being performed in the modulation algorithm!

The Hilbert Transform

The discussion above describes the use of phasors, but the input signal isn't in phasor form. That's where the Hilbert Transform (or Hilbert Filter) comes in - it turns the input into phasor form.

The Hilbert Transform takes an input sinusoid and through "magic" (which I don't understand) outputs two sinusoids I and Q, with Q delayed by exactly 90° compared to I. I and Q are the two dimensions of the 2D phasor space.

The Hilbert Transform is also linear, so if two sinusoids are combined and fed into the Hilbert Transform, the result is exactly the same as feeding each sinusoid into a separate Hilbert Transform and adding the result. ie 𝓗(a + b) = 𝓗(a) + 𝓗(b). This means we can feed our input signal, which comprises multiple frequencies, into the Hilbert Transform and the resulting I and Q signals correctly represent the result phasor described earlier.

Phase Unwrapping

We're almost there! The last issue to deal with is that the phase measurement only gives a raw value between -π and +π. But the actual phase angle of a sinusoid continues to grow without bound. This becomes an issue when we calculate the frequency by measuring the phase's rate of change.

If the raw atan2() phase value is used, the frequency calculation is incorrect each time the phase increases beyond π and jumps back to -π. That's why the following code is needed to "unwrap" the phase and to give the correct phase difference:

if (deltaPhase < 0) deltaPhase = deltaPhase + 2*PI // phase unwrapping

It is also possible for the phase be decreasing and for the phase to jump from -π to +π. The code to "unwrap" this case is:

if (deltaPhase >PI) deltaPhase = deltaPhase - 2*PI // phase unwrapping

"When can the phase be decreasing - that's a negative frequency!" you might ask. Here's an example. Phasor 1 is rotating slowly (say at 100Hz), and Phasor 2 is rotating quickly (say at 1kHz). In the time that Phasor 2 rotates from position 1 to position 2 Phase 1 has hardly moved. The result phasor has moved from pointing to 1 to pointing at 2 - a reduction in phase:

This latter case of phase unwrapping is absent from the uSDX implementation~~, but appears to be unimportant since it makes little difference to the final signal~~. Update 29 Mar 2025: this latter case is important after all. Without it, there are increased spuri when the higher frequency tone has slightly lower amplitude than the lower frequency tone.

Studying the appropriate phasor diagrams helps illustrate why the mathematical analysis shows the phase changes so dramatically when the two frequencies are of similar amplitude:

In this diagram, the origin is towards the top, and a trace of the tip of the result phasor is shown over a number of time steps. The low frequency's phasor has its tail at the origin, and its tip is currently in the lower left quadrant. The higher frequency's phasor has it's tail at the LF phasor's tip, and since it is spinning much faster, it is causing the dominant movement; it's tip is the tip of the result phasor. It has similar amplitude to the LF phasor so the result phasor passes very close to, but slightly below, the origin.

For the majority of the result phasor's path, it's angle to the origin only changes relatively slowly from one time step to the next. However, the step from orange to purple, and from purple to red, are very dramatic negative phase changes. This is the dramatic phase change as predicted by the mathematical analysis.

Note that there's nothing special in this description about the LF phasor's location. Another "snapshot" could be taken at a different time where the LF phasor is pointing - say - in the upper right quadrant; exactly the same description of the phase behaviour would apply. In fact different snapshots in time can be thought of as just rotating this image around the origin.

If you now imagine that the HF phasor's amplitude is very slightly larger, then the result phasor would pass very close to, but slightly above, the origin. In this case the dramatic phase changes are positive. A small amplitude change has caused the sign of the phase "blip" to change.

As a final thought experiment, imagine the two amplitudes are exactly the same. In this case the result phasor passes directly through the origin. At one time step it will be on one side of the origin, and on the next time step is will be on the opposite side - a 180° or \(\pi\) radians change. The frequency, which is the change in phase over time, will be \(\frac{\pi}{dt}\), where dt = the time step duration. If the size of the time step is reduced, then the frequency at the point of crossing increases. This becomes important later - altering the size of the time step alters the result of the frequency calculation - because the calculation depends on a time-derivative.

Conclusion

This post has hopefully given some intuition behind polar modulation. In a subsequent post I'll investigate the the bandwidth requirements via simulation and a hardware implementation.

Sunday, February 16, 2025

Mathematical Analysis of a two-tone SSB Signal

Suppose we have an SSB Transmitter tuned to transmit at carrier frequency \(f_c\), upper sideband. With an input audio modulating signal of \(f_a\), the transmitter generates an output signal of \( f_c + f_a\). For example if the tuned frequency is 7.0MHz and the audio input is 1kHz, a signal at 7.001MHz is generated.

Similarly, two input audio modulating signals \(f_a\) with amplitude a, and \(f_b\) with amplitude b, the transmitter generates a signal comprising two frequencies, one at \( f_c + f_a\) and another at \( f_c + f_b\).

The following analysis shows that this 2-frequency output signal can be generated by amplitude-modulating an a single frequency-modulated oscillator.

The following trigonometric identities are used:

(I): \( \cos(x+y) = \cos x\cos y - \sin x\sin y\) -- link
(II) \(x\cos\theta + y\sin\theta = c\cos(\theta+\phi)\) where \( c = sgn(x)\sqrt{(x^2+y^2)}\), \(\phi = \arctan(-y/x)\) -- link
(III) \( \sin^2x + \cos^2x = 1\) -- link

We start with the output signal:

\( s = a\cos 2\pi(f_c+f_a)t + b\cos 2\pi(f_c+f_b)\)

Define \( \omega_c = 2\pi f_c \), \(\omega_a = 2\pi(f_c+f_a)\) and \(\omega_b = 2\pi(f_c+f_b)\):

\( s = a\cos \omega_a t + b\cos \omega_b \)

\( s = a\cos \omega_a t + b\cos (\omega_a t + (\omega_b - \omega_a) t)\)

Applying (I):

\( s = a\cos \omega_a t + b \cos \omega_a t \cos (\omega_b - \omega_a) t - b\sin\omega_a t \sin (\omega_b - \omega_a) t\)

Rearranging:

\( s = (a + b\cos (\omega_b - \omega_a) t) \cos\omega_a t - b\sin\omega_a t \sin (\omega_b - \omega_a) t\)

Applying (II), with \(x = (a + b\cos (\omega_b - \omega_a) t)\), \(y = b\sin(\omega_b - \omega_a)t\):

\(s = \sqrt{a^2 + 2ab\cos(\omega_b - \omega_a) t + b^2\cos^2(\omega_b - \omega_a) t + b^2\sin^2(\omega_b - \omega_a) t} \cos\left({\omega_a t +\tan^{-1} \frac{b\sin(\omega_b - \omega_a) t}{a+b\cos(\omega_b - \omega_a) t}}\right)\)

Applying (III) to simplify the terms under the square root:

\(s = \sqrt{a^2 + 2ab\cos(\omega_b - \omega_a) t + b^2} \cos\left({\omega_a t +\tan^{-1} \frac{b\sin(\omega_b - \omega_a) t}{a+b\cos(\omega_b - \omega_a) t}}\right)\)

This complex equation can be better understood by rewriting it as:

\(s = A \cos (\omega_a t + f(t))\)

where:

\( A = \sqrt{a^2 + 2ab\cos(\omega_b - \omega_a) t + b^2} \) sets the amplitude of the generated signal, and varies slowly compared to the carrier frequency because \( \omega_b - \omega_a \ll \omega_c\), and
\( f(t) = \tan^{-1} \frac{b\sin(\omega_b - \omega_a) t}{a+b\cos(\omega_b - \omega_a) t} \) and varies slowly compared to the \(\omega_a t\) term, again because \( \omega_b - \omega_a \ll \omega_c\).

In other words, combining two input signals results in an output signal with a phase varying around \(\omega_a t\) at the difference frequency \(\omega_b - \omega_a\) and with amplitude A also varying at the difference frequency.

Confidence that the formula is correctly derived can be gained by testing what happens if either of the modulating signals is removed (by setting it's amplitude term - either a or b - to zero. The formula should "collapse" to a single simple waveform. Indeed this is exactly what happens.

The analysis so far draws on "Frequency Analysis, Modulation and Noise", Standford Goldman, 1948 - p160. I believe what follows is novel.

Phase/Frequency Analysis

The full phase term is: \(\left({\omega_a t +\tan^{-1} \frac{b\sin(\omega_b - \omega_a) t}{a+bcos(\omega_b - \omega_a) t}}\right)\).

The frequency at any instant in time is the derivative of this term with respect to time:

\( f = \frac{d}{dt}\left({\omega_a t +\tan^{-1} \frac{b\sin(\omega_b - \omega_a) t}{a+b\cos(\omega_b - \omega_a) t}}\right)\)

\( f = \omega_a + \frac{d}{dt}\left({ \tan^{-1} \frac{b\sin(\omega_b - \omega_a) t}{a+b\cos(\omega_b - \omega_a) t}}\right)\)

Deriving the derivative of this second term is beyond my mathematics skills, but thankfully the Derivative Calculator website will do this for us (click the "Go!" button). Here I have defined \(m = \omega_b - \omega_a\).

The result is gives is: \(\frac{bm \left(a \cos\left(mt\right) + b\right)}{b \left(2a \cos\left(mt\right) + b\right) + a^{2}}\)

This formula isn't very informative, apart from showing that the output frequency varies with the difference frequency \(m = \omega_b - \omega_a\). However, very helpfully, Derivative Calculator also interactively plots the input and derivative values for the input values a, b and m. For example:

Experimenting with different values starts to give some slight intuition about how the amplitude values influence the frequency output. For instance, as the two amplitudes approach each other, the frquency variation grows dramatically:

Finally, when the two amplitudes are equal, there's a discontinuity & the "blip" disappears from the graph view - the negative "blip" is still there and corresponds to the negative edge of the input signal, but it's infinitely narrow and doesn't display:

This tendency for the frequency "blip" to grow without bound as one amplitude approaches the other is extremely important phenomena to understand in later analysis. It will also become apparent that this phenomena changes slightly when we move from the continuous domain to the discrete/sampled domain.

Note that the blue waveform might initially look strange when the amplitude is increased further:

The explanation for the discontinuities in the blue input waveform is that it is a phase measurement and has been displayed to wrap every \(2\pi\). The red derivative is displayed correctly and does not show any "blips" at these transitions.

Amplitude Analysis

The amplitude term is \( A = \sqrt{a^2 + 2ab\cos(\omega_b - \omega_a) t + b^2} \).

Its behaviour can be interactively plotted here.

The most interesting plot is again when \(a = b\):

This is a rectified sine wave, as can be demonstrated by setting \(a = b = 1\). The amplitude term simplifies to:

\( A = \sqrt{1 + 2\cos(\omega_b - \omega_a) t + 1} \)

\( A = \sqrt(2)\sqrt{1 + \cos(\omega_b - \omega_a) t } \)

But \(sgn(\cos\frac{\theta}{2})\sqrt{\frac{1 + \cos\theta}{2}}= \cos\frac{\theta}{2}\) (see link).

So, ignoring the sign term, the amplitude is the shape of a cosine wave.

A rectified sine wave has high harmonic content (see here or here), which will become relevant in later analysis.

Conclusions

This mathematical analysis has derived some equations for both amplitude and frequency shift that appear to correspond to expectations.

The interactive waveform plots give some intuition on the signal behaviour. In particular, when the two tones approach the same amplitude, the frequency shift approaches a discontinuity and values can become very large, and the amplitude waveform approaches a shape that has high harmonic content.

Thursday, February 13, 2025

Measuring the Si5351 Output Frequency Settling Time

I'm using the Si5351 programable clock generator in an upcoming project, portions of which are similar to the usdx. For my application, characteristics of the chip that are important, but not documented in the datasheet, are:

When does the output frequency change in relation to the I2C command that has been issued to update the frequency? Answer: on the second to last low-to-high I2C clock transition of the command that writes the lowest byte of the PLL register.
How long does the output frequency take to reach the target frequency (its settling time)? Answer: < 1us! Yes, it settles extremely quickly.

This note describes how I measured these characteristics.

For my tests, the Si5351's output frequency is being updated at up to 12k times per second. The I2C bus is overclocked at 800kHz to achieve this update rate. The usdx project implements a frequency-change mechanism where just 5 on-chip 8-bit register values are updated in a single I2S 7 byte transaction. The update is made to alter the PLL frequency. For the purposes of my tests I use the same mechanism, but only need to update 4 registers for the range of frequency changes I'm conducting. Documenting the exact frequency update mechanism is outside the scope of this note, but can be ascertained from the usdx code and reference to AN619.

Note that my tests also showed that the Si5351 output only changes when the lowest byte of

Test Method

I used a mixer to compare the Si5351's output target frequency against a known reference frequency. The filtered mixer's output is the difference between the input frequencies. So when the two frequencies are the same, or very similar, the output is either constant or changes only slowly. I initially used a filter RC time constant of 1us.

The mixer was built from a 3253 analogue switch. The mixer is very much like a quarter of a Tayloe Mixer.

The fixed frequency could be provided by an external source, but in my case I used a second Si5351 output. I have relied on the fact that the si5351 can be configured to output 2 frequencies completely independently - each with their own PLL and MultiSynth divider chain (a third output frequency can also be configured, but must share one of the PLLs):

the target frequency on CLK1 whose frequency will be adjusted by ±4kHz (or more).
the reference frequency on CLK2 whose frequency is locked and stable throughout a test.

A Pi Pico 2 was used for generating the I2C signals, and output a "high" on a digital pin just before the start of a I2C transaction, and "low" immedately after. An oscilloscope monitored the I2C clock, the digital pin, and the mixer output.

To conduct the measurement, the target frequency would first be set several kHz above or below the reference frequency and allowed to stabilise, and then the target frequency would be set to match the reference frequency.

A typical trace (with the target frequency starting 5kHz above the target frequency, with C = 1nF & RC = 1us) is:

(where yellow = I2C clock, cyan = digital output, blue = mixer output).

The 5kHz difference frequency is clearly visible on the mixer output prior to the frequency change. The flat flat output after the frequency change shows that the target and reference frequencies are now equal.

Zooming in, we see that the mixer output becomes flat (ie target and reference frequencies now equal) within a few microseconds after the second to last low-to-high transition of the I2C clock. The 1us RC time constant of the filter slows the transition:

With C = 100pF, RC = 0.1us there is reduced filtering, but the exact time thta the frequency changed is much clearer:

The frequency transition occurs within 1us.

The tests were rerun using an external Function Generator as the reference frequency. The same results were obtained and verify that there was no unintentional interaction between the two Si5351 outputs.

Sunday, January 24, 2016

Inside the arm1v - the ALU control logic

This is one of a number of posts on my work on reverse engineering the armv1 processor. The first in the series, and an index of the other articles can be found here.

My first post in this series described in some detail a one-bit slice of the ALU, and identified quite a number of control signals that feed all 32 bit slices which determine how the ALU operates. Now that I've reverse-engineered the overall instruction decoding and sequencing mechanism we now have enough context to make a start on reverse-engineering the circuitry that generates the ALU's control signals. This turns out to be more complex than I first expected and a full understanding of what's happening will probably have to wait until other parts of the processor have also been reverse-engineered.

Let's make a start by setting the context. The ALU circuit from my earlier post is reproduced below; we'll need to refer to it as we proceed.

The location of this circuitry, and it's control circuitry, on the silicon are as follows:

Let's now zoom in on the ALU control area:

The ALU Control Logic comprises two distinct areas - (1) a bunch of "ad-hoc" gates and drivers sitting just above the ALU itself, and (2) just above them, a Programmable Logic Array which is referred to as PLA-1 in an earlier blog. In the same area is the logic which implements the Program Status Register which is not discussed here.

The overall circuit is as follows:

The circuit diagram is laid out roughly as it is in the silicon - a bunch of control signals come in from above and feed PLA-1; the buffered PLA-1 outputs, and a couple of outputs from the instruction-decoder, then feed downwards to provide the control signals to all 32 bit slices that form the ALU. Not surprisingly, these control signals tally exactly with the signals in the ALU circuit at the beginning of this blog.

The Carry Logic on the left (highlighted in blue) feeds the Carry bit from the PSR Logic (node 8083) to bit 0 of the ALU. The Carry bit is delayed by a single clock cycle using a dynamic latch (I've used a D latch symbol for ease of drawing) and conditioned by two outputs of the PLA as per the following truth table:

This shows that the PLA-1 outputs can either pass the Carry bit straight to bit 0 of the ALU, or force the Carry line to be either 0 or 1.

The Dynamic Register Control circuitry on the RHS (highlighted in orange) shows that the two dynamic latches in the ALU (one on each operand path) are controlled by outputs on PLA-2 (the instruction decoder), with latching occurring on the trailing edge of the Phase 1 clock. the Operand 1 latch is also operated one clock cycle after the PLA-2 signals (8062, 8061, 8060) are all simultaneously low. We'll try to make sense of the purpose of this part of the circuit later.

PLA-1 Content - Input Decoding

After all this context setting it's now time to look at what's in PLA-1. The PLA layout is very similar to the instruction decoder PLA-2 described earlier:

The 8 input selection signals enter the PLA from the lower right and the signals, and their inverted versions, feed 16x vertical columns. Which transistors are fitted (or missing) determine which one of the 31 rows within the PLA is selected. There are further details about the operation of this circuit in the earlier blog post.

Let's start with the row selection logic on the RHS of the PLA, a raw dump of which is below (with the top row of the table corresponding to the top row in the PLA):

The content displayed in this format doesn't give us much insight into what is happening. However, after a little study the following pattern emerges:

It becomes clearer that the 3x inputs from PLA-2 (8062, 8061, 8060) form a "command" that narrows down which row will be selected. (Note that I've shown these 3x signals in the order that they enter PLA-1; similarly the earlier description shows these signals in the order that they exit PLA-2 - but they are in a different order! This is the way the chip is wired! So you need to remember to swap the bit ordering when you compare the data between the two PLAs!)

Let's start by looking closer at the first command: "111" from PLA-2, which narrows the selection to rows 15-30. If we refer to the PLA-2 logic table & find which rows generate a "111" (remembering to reverse the bits, which happens to not matter in this case), we find it appears on three rows: 1, 3, and 4 (highlighted in light blue).

All three rows are associated with the last (or only) cycle of a DP (Data Processing) instruction. So, returning to the PLA-1 table above, the signals on the Instruction Bus b24 to b21 are from a DP instruction. The DP instruction format is:

And, as expected, we see that b24 to b21 correspond to the "Operation Code" - AND, OR, etc. So we can conclude that rows 15 to 30 of PLA-1 correspond to each ALU operation.

By following a similar process and taking each of the remaining commands in turn we can gain some understanding of what rows 0..14 are for. However this analysis reveals lots of specific details - there is a lot of complexity here. The annotated PLA-1 row-selection table is as follows:

What's clear from this table is that the ALU is being used for a lot more than the DP instructions that are visible to the programmer - almost half of this table is for other purposes! In the main the use of Instruction Register bits 24..21 makes sense given the context. The one exception is for command "000" (rows 0, 1 above); the Instruction Register bits are only valid & meaningful on a couple of the 9 rows this command is used. I suspect that this command is sometimes used as a "no-op".

We can also perhaps gain some insight into what the signal "From Trap Ctrl (8158)" is. It only influences the row decoding for rows 6, 7 which is associated with LDR and LDM instruction execution. We know from the instruction descriptions that there are special rules in cases where a data fetch abort occurs:

for the LDR instruction: "The data fetch may abort, and in this case, the base and destination modifications are prevented",
for the LDM instruction: "If an abort occurs...the final cycle is altered to restore the modified base register")

Perhaps this logic is to implement this functionality?

PLA-1 Content - Output Values

Now that we've made an initial analysis of the input decoding, it's time to see what sense we can make of the output values. The table below is laid out to reflect the physical layout and therefore shows the output values on the LHS. There are also some notes on the left.

There is an awful lot of information contained in this table!

Let's start with the DP rows - 15 to 30:

The good news is that on rearranging the order of the 6 right-most output columns, these values exactly correspond to the table that appeared in my first post on the ALU! Whew!

If we now look at the output columns for 8087, 8088, we know from the earlier circuit diagram that these values determine what signal is fed to the Carry In on bit 0 of the ALU. And we only need to look at those rows where the output signal "/Enable C chain" (8116) is 0 (since the Carry signal is ignored otherwise). The comments on the left captures the results. The Carry In signal is set to exactly the values we would expect for associated op-code.

The first output, "To INST SKIP logic (8065)" is interesting in that it is only zero for the 4x comparison op-codes. About these op-codes the documentation states: "They are used only to perform tests and to set the condition codes on the result, and therefore should always have the S bit set" (we can see from the DP instruction format above that the S bit is defined as "Set Condition Code", 0 = Do not alter condition codes, 1 = Set condition codes). I suspect that this output from the PLA is used by the INST SKIP logic to detect if this is an invalid instruction.

The output "To PSR Logic (8064)" appears to control when the PSR is updated. Since 8064 is high for all the DP op-codes, and low for all other ALU operations, it probably controls when the Carry, Overflow, Negative, and Zero flag can be updated (subject of course to the S bit, as described earlier).

The output "To PSR Logic (8059)" is only high when an arithmetic operation (as opposed to a logical operation) is taking place. It therefore selects where the Carry flag is updated from - either the ALU output, or the result of the shift operation, and whether the Overflow flag is updated or not. See p2-32 of the VTI arm data book (1990) for a more complete description of the rules regarding how the PSR bits are updated.

Now that we've completed analysing the DP instructions, let's move the remainder of the table - rows 0 to 14.

To analyse rows 0 to 14 of the PLA-1 output table I assumed that the ALU would be carrying out similar operations to the DP instructions we've already been looking at. I therefore expected to see the same output settings on these rows as the rows we've already analysed. The results of this comparison work are shown in the Notes on the left of the table above. A summary is:

Matches: There are 8 rows with exact matches, and select a variety of ALU op-codes: MOV, ADD, SUB, RSB (reverse substract).
Variants: A further 5 rows match, apart from the Carry In selection. These are described further below.
Unknowns: Two rows have identical values, but don't match any DP instruction.

Let's look further at the variants. Although they appear on 5 rows, there are just two versions:

ADD (but Cin = 1). The normal ADD has a Cin = 0, so this variant has the result = Op1 + Op2 + 1 (where Op1 and Ap2 are the two operands presented to the ALU).
RSB (but Cin = 0). The normal RSB has a Cin = 1, so this variant has the result = Op2 - Op1 - 1.

We can make educated guesses as to what is happening on each of these rows. For instance, on row 2, we are executing a Branch or Branch and Link instruction, and are almost certainly calculating the destination address by adding the PC to the PC-relative offset which appears within the instruction (we've seen in an earlier article how there is logic in the Read Bus B that extracts the 24 bit offset).

Conclusion

It turns out that the ALU control logic is more complex than I expected. What is clear is that the chip designers really maximised the use of the ALU!

Whilst I've completely reverse-engineered the control circuitry, I've only made a start on unraveling what is actually going on. A full understanding requires some more analysis of the circuitry itself, and on getting a wider view on what is happening elsewhere on the chip. My list of "to-do" items includes:

What exactly is the "unknown" ALU command we discovered above?
What values appear at the inputs to the ALU (Operand 1 and operand 2) for each of the rows above (this requires other parts of the processor to be reverse-engineering first). This will also give us a better understanding of why the variants to the ADD, RSB opcodes are introduced.
What exactly is being latched into the ALU's dynamic registers, and what is the sequencing associated with this? And why does the circuit above have the specific logic to introduce a one cycle delay for when DP instructions are executed?