Anyone considering DDS implementations in an FPGA should look at using the
CORDIC algorithm instead of sin/cos lookup tables. For short DAC output
words, a table is usually better and faster, but for long output words, the
table approach becomes unwieldy and the CORDIC starts to win.
If raw speed is the goal, is it's possible to build DDS counters and CORDIC
stages using serial arithmetic which will run at nearly the toggle speed of
the FPGA. Unfortunately, the number of CORDIC stages required by this trick
expands as roughly the square of the number of phase bits used from the
accumulator. Even though one CORDIC stage generally fits into one CLB, this
still becomes a lot of logic. And the control logic for all those serial
accumulators is tricky.
Just another tool for your toolbox.
Cheers!
--Stu
Hi
One of the most basic issues is that a DDS chip is cheaper than an FPGA + DAC. Unless you already have a giant project in the FPGA that’s going to be an issue. If you do have a giant project in the FPGA, keeping the noise from that out of your DDS …. yikes !!!
Bob
On Dec 18, 2023, at 8:20 AM, Stewart Cobb via time-nuts time-nuts@lists.febo.com wrote:
Anyone considering DDS implementations in an FPGA should look at using the
CORDIC algorithm instead of sin/cos lookup tables. For short DAC output
words, a table is usually better and faster, but for long output words, the
table approach becomes unwieldy and the CORDIC starts to win.
If raw speed is the goal, is it's possible to build DDS counters and CORDIC
stages using serial arithmetic which will run at nearly the toggle speed of
the FPGA. Unfortunately, the number of CORDIC stages required by this trick
expands as roughly the square of the number of phase bits used from the
accumulator. Even though one CORDIC stage generally fits into one CLB, this
still becomes a lot of logic. And the control logic for all those serial
accumulators is tricky.
Just another tool for your toolbox.
Cheers!
--Stu
time-nuts mailing list -- time-nuts@lists.febo.com
To unsubscribe send an email to time-nuts-leave@lists.febo.com
Stewart Cobb via time-nuts time-nuts@lists.febo.com writes:
Anyone considering DDS implementations in an FPGA should look at using the
CORDIC algorithm instead of sin/cos lookup tables. For short DAC output
words, a table is usually better and faster, but for long output words, the
table approach becomes unwieldy and the CORDIC starts to win.
If raw speed is the goal, is it's possible to build DDS counters and CORDIC
stages using serial arithmetic which will run at nearly the toggle speed of
the FPGA. Unfortunately, the number of CORDIC stages required by this trick
expands as roughly the square of the number of phase bits used from the
accumulator. Even though one CORDIC stage generally fits into one CLB, this
still becomes a lot of logic. And the control logic for all those serial
accumulators is tricky.
Don't forget that you can also use a lookup table implementation with
interpolation between table entries. Often, a small table and simple
interpolation methods will get you very good accuracy. In my own
implementations, I've found a simple linear interpolation between
elements to work quite well. This consumes minimal resources, including
memory. Often, the lookup table won't even occupy block RAM because it's
small enough where it would be a waste of BRAM to put it there, and it
will wind up in distributed memory. You can add higher order terms if
you need them, but usually they're not needed. If you're simultaneously
generating a sine and cosine value (very common), another interpolation
method would be a first-order Taylor series. This is basically free to
implement because the derivative of a sine is a cosine and the
derivative of a cosine is minus sine. However, I generally stick with
linear interpolation between elements, which I find has some nicer
properties.
This probably goes without saying, but also don't forget that you don't
need to store one full sine/cosine period! The only non-redundant part
if you're computing just sine or just cosine is one quarter of the
period. Or, if you're computing both simultaneously, just take 1/8 of a
period. So, for the fixed-point implementation when computing both, drop
the 3 most significant bits for lookup in the table, then use those 3
MSBs to adjust the lookup value for the full period. Fixed point, in
addition to being cheaper to implement on an FPGA, has some nice
advantages over floating point for representing phase (and frequency)
values, including that it wraps naturally and has a constant resolution
over 0 to 2pi.
Finally, the lookup table plus interpolation approach can easily run at
full FPGA clock rates. I implemented a DDS recently that runs at the
system clock rate of 312.5 MHz on a Xilinx MPSoC. I think it could run
quite a bit faster too, but I haven't tried. Also, when it matters, this
approach has a much lower latency than the CORDIC. I'm not convinced
CORDICs have much use in DDS implementations when you have embedded
multipliers available, given how cheap (in terms of resources) and
performant LUT + interpolation techniques are.
Matt
Am 2023-12-18 16:14, schrieb Matt Huszagh via time-nuts:
Don't forget that you can also use a lookup table implementation with
interpolation between table entries. Often, a small table and simple
.......
This probably goes without saying, but also don't forget that you don't
need to store one full sine/cosine period! The only non-redundant part
if you're computing just sine or just cosine is one quarter of the
period. Or, if you're computing both simultaneously, just take 1/8 of a
period. So, for the fixed-point implementation when computing both,
drop
the 3 most significant bits for lookup in the table, then use those 3
...
I'd sign all of this.
I have published a Sine/cos table some 12 years ago in pure VHDL on
< https://opencores.org/projects/sincos > that does all the mirroring
and pipelining automagically and delivers sine AND cos at the same time
without doubling the resources. IIRC it ran at 200 MHz without tuning on
a
Spartan 6 eval board. Completely portable, no Xilinx IP required. (but
synthesis-friendly). The delays of a CORDIC are highly unwelcome in an
all-
digital PLL or Costas loop, and for mixed A/D systems, the DAC and its
reference voltage are usually the limit.
With hardwired carry chains, DSP-48 blocks and block rams, long word
lengths
have lost their horror.
In 1985, I built this 200 MSPS signal averager. It took 8 pipelines in
parallel, each 2 dozen stages deep. We have come a long way since then.
<
https://www.flickr.com/photos/137684711@N07/52758369518/in/dateposted-public/
The board replaces everything blurred in the background.
Attila, if you see this, that was at Fraunhofer, 300 meters downhill
from where you work now. :-)
regards, Gerhard
I've generally used DDS ccordic implementations in microcontrollers and
a 1/8 table storage in the FPGA... very much depends of user
requirements I think.
Good to have a DDS discussion. Not quite my problem here but always very
enjoyable to hear what other people do.
Matt , 312 MHz should be a walk in the park for MPSoC silicon if the
sequential element doesnt go more than 3 or 4 deep ..... Matt we should
have a off list discussion on Efinix. DSPs run at 1000 Mhz without
cracking a sweat (but...).
On 19/12/2023 2:14 am, Matt Huszagh via time-nuts wrote:
Stewart Cobb via time-nuts time-nuts@lists.febo.com writes:
Anyone considering DDS implementations in an FPGA should look at using the
CORDIC algorithm instead of sin/c
Hi,
I generate Sin/Cos without a table by rotating complex numbers
(a+ib)(u+i*v)
a,b,u,v may be 16Bit integers, for most appplications this prooved
sufficient for me. a is the cos-component, b is the sine component.
(a+ib) is rotated by the angle of (u+iv). For integer calculation
|u+iv| = sqrt(uu+vv) is normalized to ,say 2^16 and followed by a
rightshift of 16Bit. It is necessary to watch |a+ib|=sqrt(aa+bb)
because vectorlength is shrinking or expanding due to calculation noise.
Cheers Detlef
time-nuts mailing list -- time-nuts@lists.febo.com
To unsubscribe send an email to time-nuts-leave@lists.febo.com
Generically, this is known as the Cordic method.
There are some techniques for managing the truncation error and magnitude that Detlef mentions, but I can’t remember them off the top of my head. I’m sure they’re in the literature. The trick is in making them simple and deterministic in time.
On Fri, 29 Dec 2023 12:28:20 +0100, dschuecker via time-nuts time-nuts@lists.febo.com wrote:
Hi,
I generate Sin/Cos without a table by rotating complex numbers
(a+ib)(u+i*v)
a,b,u,v may be 16Bit integers, for most appplications this prooved
sufficient for me. a is the cos-component, b is the sine component.
(a+ib) is rotated by the angle of (u+iv). For integer calculation
|u+iv| = sqrt(uu+vv) is normalized to ,say 2^16 and followed by a
rightshift of 16Bit. It is necessary to watch |a+ib|=sqrt(aa+bb)
because vectorlength is shrinking or expanding due to calculation noise.
Cheers Detlef
time-nuts mailing list -- time-nuts@lists.febo.com
To unsubscribe send an email to time-nuts-leave@lists.febo.com
time-nuts mailing list -- time-nuts@lists.febo.com
To unsubscribe send an email to time-nuts-leave@lists.febo.com
https://dspguru.com/dsp/faqs/cordic/
Andy
www.g4jnt.com
On Sat, 30 Dec 2023 at 20:44, Jim Lux via time-nuts <
time-nuts@lists.febo.com> wrote:
Generically, this is known as the Cordic method.
There are some techniques for managing the truncation error and magnitude
that Detlef mentions, but I can’t remember them off the top of my head. I’m
sure they’re in the literature. The trick is in making them simple and
deterministic in time.
On Fri, 29 Dec 2023 12:28:20 +0100, dschuecker via time-nuts <
time-nuts@lists.febo.com> wrote:
Hi,
I generate Sin/Cos without a table by rotating complex numbers
(a+ib)(u+i*v)
a,b,u,v may be 16Bit integers, for most appplications this prooved
sufficient for me. a is the cos-component, b is the sine component.
(a+ib) is rotated by the angle of (u+iv). For integer calculation
|u+iv| = sqrt(uu+vv) is normalized to ,say 2^16 and followed by a
rightshift of 16Bit. It is necessary to watch |a+ib|=sqrt(aa+bb)
because vectorlength is shrinking or expanding due to calculation noise.
Cheers Detlef
time-nuts mailing list -- time-nuts@lists.febo.com
To unsubscribe send an email to time-nuts-leave@lists.febo.com
time-nuts mailing list -- time-nuts@lists.febo.com
To unsubscribe send an email to time-nuts-leave@lists.febo.com
time-nuts mailing list -- time-nuts@lists.febo.com
To unsubscribe send an email to time-nuts-leave@lists.febo.com
Yes, but as I wrote in the previous article, the delays of the CORDIC
are a PITA if you want a fast tunable digital oscillator for a snappy
Costas loop or such. Cordic shines when you have virtually no hardware
but shifters and adders. Those times are gone. Nowadays you get up to
2.6 GMACS in a Xilinx/AMD ZYNC, and that is a low-end chip family.
One or two Cortex9 ARM processors are thrown in, also.
I forgot to mention, that in "James A Crawford: Frequency Synthesizer
Design Handbook"
Artech House ++
there the Sunderland technique is mentioned that divides the look up
table
into 2 smaller ROMs, which gives an chip area advantage of 12 or up to
50 times and
costs just another tiny adder.
I just dug out the original:
< https://ieeexplore.ieee.org/document/1052173 > behind the IEEE
paywall of shame
DOI < 10.1109/JSSC.1984.1052173 >
< https://bothonce.com/10.1109/jssc.1984.1052173 > for the bad boys.
:-)
He writes about a 45 Kbit ROM that is reduced to two ROMs with
4 address bits each, 2816+1024 stored bits in toto. Quite modest.
OMG. That reminds me at HP's dynamic n-MOS process that we used as
near-kids at the univ.
The limit is now the DAC, if you really want to go back to the
analog domain at all.
Cheers & a happy new year,
Gerhard