time-nuts@lists.febo.com

Discussion of precise time and frequency measurement

View all threads

Re: pulling some crystals

SC
Stewart Cobb
Mon, Dec 18, 2023 1:20 PM

Anyone considering DDS implementations in an FPGA should look at using the
CORDIC algorithm instead of sin/cos lookup tables. For short DAC output
words, a table is usually better and faster, but for long output words, the
table approach becomes unwieldy and the CORDIC starts to win.

If raw speed is the goal, is it's possible to build DDS counters and CORDIC
stages using serial arithmetic which will run at nearly the toggle speed of
the FPGA. Unfortunately, the number of CORDIC stages required by this trick
expands as roughly the square of the number of phase bits used from the
accumulator. Even though one CORDIC stage generally fits into one CLB, this
still becomes a lot of logic. And the control logic for all those serial
accumulators is tricky.

Just another tool for your toolbox.

Cheers!
--Stu

Anyone considering DDS implementations in an FPGA should look at using the CORDIC algorithm instead of sin/cos lookup tables. For short DAC output words, a table is usually better and faster, but for long output words, the table approach becomes unwieldy and the CORDIC starts to win. If raw speed is the goal, is it's possible to build DDS counters and CORDIC stages using serial arithmetic which will run at nearly the toggle speed of the FPGA. Unfortunately, the number of CORDIC stages required by this trick expands as roughly the square of the number of phase bits used from the accumulator. Even though one CORDIC stage generally fits into one CLB, this still becomes a lot of logic. And the control logic for all those serial accumulators is tricky. Just another tool for your toolbox. Cheers! --Stu
BK
Bob kb8tq
Mon, Dec 18, 2023 3:14 PM

Hi

One of the most basic issues is that a DDS chip is cheaper than an FPGA + DAC. Unless you already have a giant project in the FPGA that’s going to be an issue. If you do have a giant project in the FPGA, keeping the noise from that out of your DDS …. yikes !!!

Bob

On Dec 18, 2023, at 8:20 AM, Stewart Cobb via time-nuts time-nuts@lists.febo.com wrote:

Anyone considering DDS implementations in an FPGA should look at using the
CORDIC algorithm instead of sin/cos lookup tables. For short DAC output
words, a table is usually better and faster, but for long output words, the
table approach becomes unwieldy and the CORDIC starts to win.

If raw speed is the goal, is it's possible to build DDS counters and CORDIC
stages using serial arithmetic which will run at nearly the toggle speed of
the FPGA. Unfortunately, the number of CORDIC stages required by this trick
expands as roughly the square of the number of phase bits used from the
accumulator. Even though one CORDIC stage generally fits into one CLB, this
still becomes a lot of logic. And the control logic for all those serial
accumulators is tricky.

Just another tool for your toolbox.

Cheers!
--Stu


time-nuts mailing list -- time-nuts@lists.febo.com
To unsubscribe send an email to time-nuts-leave@lists.febo.com

Hi One of the most basic issues is that a DDS chip is cheaper than an FPGA + DAC. Unless you already have a giant project in the FPGA that’s going to be an issue. If you do have a giant project in the FPGA, keeping the noise from that out of your DDS …. yikes !!! Bob > On Dec 18, 2023, at 8:20 AM, Stewart Cobb via time-nuts <time-nuts@lists.febo.com> wrote: > > Anyone considering DDS implementations in an FPGA should look at using the > CORDIC algorithm instead of sin/cos lookup tables. For short DAC output > words, a table is usually better and faster, but for long output words, the > table approach becomes unwieldy and the CORDIC starts to win. > > If raw speed is the goal, is it's possible to build DDS counters and CORDIC > stages using serial arithmetic which will run at nearly the toggle speed of > the FPGA. Unfortunately, the number of CORDIC stages required by this trick > expands as roughly the square of the number of phase bits used from the > accumulator. Even though one CORDIC stage generally fits into one CLB, this > still becomes a lot of logic. And the control logic for all those serial > accumulators is tricky. > > Just another tool for your toolbox. > > Cheers! > --Stu > _______________________________________________ > time-nuts mailing list -- time-nuts@lists.febo.com > To unsubscribe send an email to time-nuts-leave@lists.febo.com
MH
Matt Huszagh
Mon, Dec 18, 2023 3:14 PM

Stewart Cobb via time-nuts time-nuts@lists.febo.com writes:

Anyone considering DDS implementations in an FPGA should look at using the
CORDIC algorithm instead of sin/cos lookup tables. For short DAC output
words, a table is usually better and faster, but for long output words, the
table approach becomes unwieldy and the CORDIC starts to win.

If raw speed is the goal, is it's possible to build DDS counters and CORDIC
stages using serial arithmetic which will run at nearly the toggle speed of
the FPGA. Unfortunately, the number of CORDIC stages required by this trick
expands as roughly the square of the number of phase bits used from the
accumulator. Even though one CORDIC stage generally fits into one CLB, this
still becomes a lot of logic. And the control logic for all those serial
accumulators is tricky.

Don't forget that you can also use a lookup table implementation with
interpolation between table entries. Often, a small table and simple
interpolation methods will get you very good accuracy. In my own
implementations, I've found a simple linear interpolation between
elements to work quite well. This consumes minimal resources, including
memory. Often, the lookup table won't even occupy block RAM because it's
small enough where it would be a waste of BRAM to put it there, and it
will wind up in distributed memory. You can add higher order terms if
you need them, but usually they're not needed. If you're simultaneously
generating a sine and cosine value (very common), another interpolation
method would be a first-order Taylor series. This is basically free to
implement because the derivative of a sine is a cosine and the
derivative of a cosine is minus sine. However, I generally stick with
linear interpolation between elements, which I find has some nicer
properties.

This probably goes without saying, but also don't forget that you don't
need to store one full sine/cosine period! The only non-redundant part
if you're computing just sine or just cosine is one quarter of the
period. Or, if you're computing both simultaneously, just take 1/8 of a
period. So, for the fixed-point implementation when computing both, drop
the 3 most significant bits for lookup in the table, then use those 3
MSBs to adjust the lookup value for the full period. Fixed point, in
addition to being cheaper to implement on an FPGA, has some nice
advantages over floating point for representing phase (and frequency)
values, including that it wraps naturally and has a constant resolution
over 0 to 2pi.

Finally, the lookup table plus interpolation approach can easily run at
full FPGA clock rates. I implemented a DDS recently that runs at the
system clock rate of 312.5 MHz on a Xilinx MPSoC. I think it could run
quite a bit faster too, but I haven't tried. Also, when it matters, this
approach has a much lower latency than the CORDIC. I'm not convinced
CORDICs have much use in DDS implementations when you have embedded
multipliers available, given how cheap (in terms of resources) and
performant LUT + interpolation techniques are.

Matt

Stewart Cobb via time-nuts <time-nuts@lists.febo.com> writes: > Anyone considering DDS implementations in an FPGA should look at using the > CORDIC algorithm instead of sin/cos lookup tables. For short DAC output > words, a table is usually better and faster, but for long output words, the > table approach becomes unwieldy and the CORDIC starts to win. > > If raw speed is the goal, is it's possible to build DDS counters and CORDIC > stages using serial arithmetic which will run at nearly the toggle speed of > the FPGA. Unfortunately, the number of CORDIC stages required by this trick > expands as roughly the square of the number of phase bits used from the > accumulator. Even though one CORDIC stage generally fits into one CLB, this > still becomes a lot of logic. And the control logic for all those serial > accumulators is tricky. Don't forget that you can also use a lookup table implementation with interpolation between table entries. Often, a small table and simple interpolation methods will get you very good accuracy. In my own implementations, I've found a simple linear interpolation between elements to work quite well. This consumes minimal resources, including memory. Often, the lookup table won't even occupy block RAM because it's small enough where it would be a waste of BRAM to put it there, and it will wind up in distributed memory. You can add higher order terms if you need them, but usually they're not needed. If you're simultaneously generating a sine and cosine value (very common), another interpolation method would be a first-order Taylor series. This is basically free to implement because the derivative of a sine is a cosine and the derivative of a cosine is minus sine. However, I generally stick with linear interpolation between elements, which I find has some nicer properties. This probably goes without saying, but also don't forget that you don't need to store one full sine/cosine period! The only non-redundant part if you're computing just sine or just cosine is one quarter of the period. Or, if you're computing both simultaneously, just take 1/8 of a period. So, for the fixed-point implementation when computing both, drop the 3 most significant bits for lookup in the table, then use those 3 MSBs to adjust the lookup value for the full period. Fixed point, in addition to being cheaper to implement on an FPGA, has some nice advantages over floating point for representing phase (and frequency) values, including that it wraps naturally and has a constant resolution over 0 to 2pi. Finally, the lookup table plus interpolation approach can easily run at full FPGA clock rates. I implemented a DDS recently that runs at the system clock rate of 312.5 MHz on a Xilinx MPSoC. I think it could run quite a bit faster too, but I haven't tried. Also, when it matters, this approach has a much lower latency than the CORDIC. I'm not convinced CORDICs have much use in DDS implementations when you have embedded multipliers available, given how cheap (in terms of resources) and performant LUT + interpolation techniques are. Matt
G
ghf@hoffmann-hochfrequenz.de
Mon, Dec 18, 2023 5:23 PM

Am 2023-12-18 16:14, schrieb Matt Huszagh via time-nuts:

Don't forget that you can also use a lookup table implementation with
interpolation between table entries. Often, a small table and simple

.......

This probably goes without saying, but also don't forget that you don't
need to store one full sine/cosine period! The only non-redundant part
if you're computing just sine or just cosine is one quarter of the
period. Or, if you're computing both simultaneously, just take 1/8 of a
period. So, for the fixed-point implementation when computing both,
drop
the 3 most significant bits for lookup in the table, then use those 3

...

I'd sign all of this.

I have published a Sine/cos table some 12 years ago in pure VHDL on
https://opencores.org/projects/sincos  > that does all the mirroring
and pipelining automagically and delivers sine AND cos at the same time
without doubling the resources. IIRC it ran at 200 MHz without tuning on
a
Spartan 6 eval board. Completely portable, no Xilinx IP required. (but
synthesis-friendly). The delays of a CORDIC are highly unwelcome in an
all-
digital PLL or Costas loop, and for mixed A/D systems, the DAC and its
reference voltage are usually the limit.

With hardwired carry chains, DSP-48 blocks and block rams, long word
lengths
have lost their horror.

In 1985, I built this 200 MSPS signal averager. It took 8 pipelines in
parallel, each 2 dozen stages deep. We have come a long way since then.
<
https://www.flickr.com/photos/137684711@N07/52758369518/in/dateposted-public/

The board replaces everything blurred in the background.

Attila, if you see this, that was at Fraunhofer, 300 meters downhill
from where you work now.  :-)

regards, Gerhard

Am 2023-12-18 16:14, schrieb Matt Huszagh via time-nuts: > Don't forget that you can also use a lookup table implementation with > interpolation between table entries. Often, a small table and simple ....... > This probably goes without saying, but also don't forget that you don't > need to store one full sine/cosine period! The only non-redundant part > if you're computing just sine or just cosine is one quarter of the > period. Or, if you're computing both simultaneously, just take 1/8 of a > period. So, for the fixed-point implementation when computing both, > drop > the 3 most significant bits for lookup in the table, then use those 3 ... I'd sign all of this. I have published a Sine/cos table some 12 years ago in pure VHDL on < https://opencores.org/projects/sincos > that does all the mirroring and pipelining automagically and delivers sine AND cos at the same time without doubling the resources. IIRC it ran at 200 MHz without tuning on a Spartan 6 eval board. Completely portable, no Xilinx IP required. (but synthesis-friendly). The delays of a CORDIC are highly unwelcome in an all- digital PLL or Costas loop, and for mixed A/D systems, the DAC and its reference voltage are usually the limit. With hardwired carry chains, DSP-48 blocks and block rams, long word lengths have lost their horror. In 1985, I built this 200 MSPS signal averager. It took 8 pipelines in parallel, each 2 dozen stages deep. We have come a long way since then. < https://www.flickr.com/photos/137684711@N07/52758369518/in/dateposted-public/ > The board replaces everything blurred in the background. Attila, if you see this, that was at Fraunhofer, 300 meters downhill from where you work now. :-) regards, Gerhard
GE
glen english LIST
Mon, Dec 18, 2023 6:39 PM

I've generally used DDS ccordic implementations in microcontrollers and
a 1/8 table storage in the FPGA... very much depends of user
requirements I think.

Good to have a DDS discussion. Not quite my problem here but always very
enjoyable to hear what other people do.

Matt , 312 MHz should be a walk in the park for MPSoC silicon if the
sequential element doesnt go more than 3 or 4 deep ..... Matt we should
have a off list discussion on Efinix. DSPs run at 1000 Mhz without
cracking a sweat (but...).

On 19/12/2023 2:14 am, Matt Huszagh via time-nuts wrote:

Stewart Cobb via time-nuts time-nuts@lists.febo.com writes:

Anyone considering DDS implementations in an FPGA should look at using the
CORDIC algorithm instead of sin/c

I've generally used DDS ccordic implementations in microcontrollers and a 1/8 table storage in the FPGA... very much depends of user requirements I think. Good to have a DDS discussion. Not quite my problem here but always very enjoyable to hear what other people do. Matt , 312 MHz should be a walk in the park for MPSoC silicon if the sequential element doesnt go more than 3 or 4 deep ..... Matt we should have a off list discussion on Efinix. DSPs run at 1000 Mhz without cracking a sweat (but...). On 19/12/2023 2:14 am, Matt Huszagh via time-nuts wrote: > Stewart Cobb via time-nuts <time-nuts@lists.febo.com> writes: > >> Anyone considering DDS implementations in an FPGA should look at using the >> CORDIC algorithm instead of sin/c
D
dschuecker
Fri, Dec 29, 2023 11:28 AM

Hi,

I generate Sin/Cos without a table by rotating complex numbers
(a+ib)(u+i*v)

a,b,u,v may be 16Bit integers, for most appplications this prooved
sufficient for me. a is the cos-component, b is the sine component.
(a+ib) is rotated by the angle of (u+iv). For integer calculation
|u+iv| = sqrt(uu+vv) is normalized to ,say 2^16 and followed by a
rightshift of 16Bit. It is necessary to watch |a+i
b|=sqrt(aa+bb)
because vectorlength is shrinking or expanding due to calculation noise.

Cheers Detlef


time-nuts mailing list -- time-nuts@lists.febo.com
To unsubscribe send an email to time-nuts-leave@lists.febo.com

Hi, I generate Sin/Cos without a table by rotating complex numbers (a+i*b)*(u+i*v) a,b,u,v may be 16Bit integers, for most appplications this prooved sufficient for me. a is the cos-component, b is the sine component. (a+i*b) is rotated by the angle of (u+i*v). For integer calculation |u+i*v| = sqrt(u*u+v*v) is normalized to ,say 2^16 and followed by a rightshift of 16Bit. It is necessary to watch |a+i*b|=sqrt(a*a+b*b) because vectorlength is shrinking or expanding due to calculation noise. Cheers Detlef _______________________________________________ > time-nuts mailing list -- time-nuts@lists.febo.com > To unsubscribe send an email to time-nuts-leave@lists.febo.com >
JL
Jim Lux
Sat, Dec 30, 2023 7:09 PM

Generically, this is known as the Cordic method. 
There are some techniques for managing the truncation error and magnitude that Detlef mentions, but I can’t remember them off the top of my head. I’m sure they’re in the literature. The trick is in making them simple and deterministic in time.

On Fri, 29 Dec 2023 12:28:20 +0100, dschuecker via time-nuts time-nuts@lists.febo.com wrote:

Hi,

I generate Sin/Cos without a table by rotating complex numbers
(a+ib)(u+i*v)

a,b,u,v may be 16Bit integers, for most appplications this prooved
sufficient for me. a is the cos-component, b is the sine component.
(a+ib) is rotated by the angle of (u+iv). For integer calculation
|u+iv| = sqrt(uu+vv) is normalized to ,say 2^16 and followed by a
rightshift of 16Bit. It is necessary to watch |a+i
b|=sqrt(aa+bb)
because vectorlength is shrinking or expanding due to calculation noise.

Cheers Detlef


time-nuts mailing list -- time-nuts@lists.febo.com
To unsubscribe send an email to time-nuts-leave@lists.febo.com


time-nuts mailing list -- time-nuts@lists.febo.com
To unsubscribe send an email to time-nuts-leave@lists.febo.com
 

Generically, this is known as the Cordic method.  There are some techniques for managing the truncation error and magnitude that Detlef mentions, but I can’t remember them off the top of my head. I’m sure they’re in the literature. The trick is in making them simple and deterministic in time. On Fri, 29 Dec 2023 12:28:20 +0100, dschuecker via time-nuts <time-nuts@lists.febo.com> wrote: Hi, I generate Sin/Cos without a table by rotating complex numbers (a+i*b)*(u+i*v) a,b,u,v may be 16Bit integers, for most appplications this prooved sufficient for me. a is the cos-component, b is the sine component. (a+i*b) is rotated by the angle of (u+i*v). For integer calculation |u+i*v| = sqrt(u*u+v*v) is normalized to ,say 2^16 and followed by a rightshift of 16Bit. It is necessary to watch |a+i*b|=sqrt(a*a+b*b) because vectorlength is shrinking or expanding due to calculation noise. Cheers Detlef _______________________________________________ > time-nuts mailing list -- time-nuts@lists.febo.com > To unsubscribe send an email to time-nuts-leave@lists.febo.com > _______________________________________________ time-nuts mailing list -- time-nuts@lists.febo.com To unsubscribe send an email to time-nuts-leave@lists.febo.com  
AT
Andy Talbot
Sat, Dec 30, 2023 8:53 PM

https://dspguru.com/dsp/faqs/cordic/

Andy
www.g4jnt.com

On Sat, 30 Dec 2023 at 20:44, Jim Lux via time-nuts <
time-nuts@lists.febo.com> wrote:

Generically, this is known as the Cordic method.
There are some techniques for managing the truncation error and magnitude
that Detlef mentions, but I can’t remember them off the top of my head. I’m
sure they’re in the literature. The trick is in making them simple and
deterministic in time.

On Fri, 29 Dec 2023 12:28:20 +0100, dschuecker via time-nuts <
time-nuts@lists.febo.com> wrote:

Hi,

I generate Sin/Cos without a table by rotating complex numbers
(a+ib)(u+i*v)

a,b,u,v may be 16Bit integers, for most appplications this prooved
sufficient for me. a is the cos-component, b is the sine component.
(a+ib) is rotated by the angle of (u+iv). For integer calculation
|u+iv| = sqrt(uu+vv) is normalized to ,say 2^16 and followed by a
rightshift of 16Bit. It is necessary to watch |a+i
b|=sqrt(aa+bb)
because vectorlength is shrinking or expanding due to calculation noise.

Cheers Detlef


time-nuts mailing list -- time-nuts@lists.febo.com
To unsubscribe send an email to time-nuts-leave@lists.febo.com


time-nuts mailing list -- time-nuts@lists.febo.com
To unsubscribe send an email to time-nuts-leave@lists.febo.com


time-nuts mailing list -- time-nuts@lists.febo.com
To unsubscribe send an email to time-nuts-leave@lists.febo.com

https://dspguru.com/dsp/faqs/cordic/ Andy www.g4jnt.com On Sat, 30 Dec 2023 at 20:44, Jim Lux via time-nuts < time-nuts@lists.febo.com> wrote: > > > > > Generically, this is known as the Cordic method. > There are some techniques for managing the truncation error and magnitude > that Detlef mentions, but I can’t remember them off the top of my head. I’m > sure they’re in the literature. The trick is in making them simple and > deterministic in time. > > > On Fri, 29 Dec 2023 12:28:20 +0100, dschuecker via time-nuts < > time-nuts@lists.febo.com> wrote: > > Hi, > > I generate Sin/Cos without a table by rotating complex numbers > (a+i*b)*(u+i*v) > > a,b,u,v may be 16Bit integers, for most appplications this prooved > sufficient for me. a is the cos-component, b is the sine component. > (a+i*b) is rotated by the angle of (u+i*v). For integer calculation > |u+i*v| = sqrt(u*u+v*v) is normalized to ,say 2^16 and followed by a > rightshift of 16Bit. It is necessary to watch |a+i*b|=sqrt(a*a+b*b) > because vectorlength is shrinking or expanding due to calculation noise. > > Cheers Detlef > > > > _______________________________________________ > > > time-nuts mailing list -- time-nuts@lists.febo.com > > To unsubscribe send an email to time-nuts-leave@lists.febo.com > > > _______________________________________________ > time-nuts mailing list -- time-nuts@lists.febo.com > To unsubscribe send an email to time-nuts-leave@lists.febo.com > > > > > _______________________________________________ > time-nuts mailing list -- time-nuts@lists.febo.com > To unsubscribe send an email to time-nuts-leave@lists.febo.com
G
ghf@hoffmann-hochfrequenz.de
Sun, Dec 31, 2023 3:33 AM

Yes, but as I wrote in the previous article, the delays of the CORDIC
are a PITA if you want a fast tunable digital oscillator for a snappy
Costas loop or such. Cordic shines when you have virtually no hardware
but shifters and adders. Those times are gone. Nowadays you get up to
2.6 GMACS in a Xilinx/AMD ZYNC, and that is a low-end chip family.
One or two Cortex9 ARM processors are thrown in, also.

I forgot to mention, that in  "James A Crawford: Frequency Synthesizer
Design Handbook"
Artech House  ++
there the Sunderland technique is mentioned that divides the look up
table
into 2 smaller ROMs, which gives an chip area advantage of 12 or up to
50 times and
costs just another tiny adder.

I just dug out the original:
<    https://ieeexplore.ieee.org/document/1052173    > behind the IEEE
paywall of shame
DOI  <    10.1109/JSSC.1984.1052173    >
https://bothonce.com/10.1109/jssc.1984.1052173  > for the bad boys.
:-)

He writes about a 45 Kbit ROM that is reduced to two ROMs with
4 address bits each, 2816+1024 stored bits in toto. Quite modest.
OMG. That reminds me at HP's dynamic n-MOS process that we used as
near-kids at the univ.

The limit is now the DAC, if you really want to go back to the
analog domain at all.

Cheers & a happy new year,

Gerhard

Yes, but as I wrote in the previous article, the delays of the CORDIC are a PITA if you want a fast tunable digital oscillator for a snappy Costas loop or such. Cordic shines when you have virtually no hardware but shifters and adders. Those times are gone. Nowadays you get up to 2.6 GMACS in a Xilinx/AMD ZYNC, and that is a low-end chip family. One or two Cortex9 ARM processors are thrown in, also. I forgot to mention, that in "James A Crawford: Frequency Synthesizer Design Handbook" Artech House ++ there the Sunderland technique is mentioned that divides the look up table into 2 smaller ROMs, which gives an chip area advantage of 12 or up to 50 times and costs just another tiny adder. I just dug out the original: < https://ieeexplore.ieee.org/document/1052173 > behind the IEEE paywall of shame DOI < 10.1109/JSSC.1984.1052173 > < https://bothonce.com/10.1109/jssc.1984.1052173 > for the bad boys. :-) He writes about a 45 Kbit ROM that is reduced to two ROMs with 4 address bits each, 2816+1024 stored bits in toto. Quite modest. OMG. That reminds me at HP's dynamic n-MOS process that we used as near-kids at the univ. The limit is now the DAC, if you really want to go back to the analog domain at all. Cheers & a happy new year, Gerhard