HM

Hal Murray

Fri, Feb 3, 2012 3:16 AM

It's possible to use Bresenham with two integers 10,000,000 and 32,768 but I
found no way to perform all the 24-bit calculations on an 8-bit PIC quick
enough. Removing the GCD often helps but in this case the accumulator
remains 3-bytes wide.

To generate 32 kHz you have to toggle a pin and calculate if the next toggle
must be 38 or 39 instructions in the future; all the math must occur within
37 instructions. That's why I came up with the binary leap year kind of
algorithm; it's as close to math-less as you can get.

You missed the simple way. Table lookup. :)

The table is only 256 slots long.

That's toggling between 305 and 306 cycles. If your CPU uses N clocks per
instruction, multiply the table size by N.

In hindsight, I'm embarrassed that I didn't see this much sooner.

10,000,000 is 10^7 or 2^7 * 2^5.

32,768 is 2^15. So we need a factor of 2^8 to get back to where we started.

My early introduction to the advantages of table lookup was using Fortran on
an IBM 7094. How do you calculate factorials quickly? The table is only
30-50 slots. Anything bigger generates a floating point overflow.

Here is my hack python code that I had to write to see what's going on:

#!/usr/bin/python2

Given 10 MHz, target is 32 KHz

What sequence of DDS steps is required?

How long is the sequence before it repeats.

import sys

Target = 32768
Input = 10000000

table = {}

X = 0
K = 0
oldI = 0

for I in range(0,Input):
X += Target
if X >= Input:
X -= Input
print "%5d %7d %5d %3d" % (K, I, X, I-oldI)
if table.has_key(X):
print "Found in table:", X
sys.exit(0)
table[X] = K
K += 1
oldI = I

--
These are my opinions, not necessarily my employer's. I hate spam.

> It's possible to use Bresenham with two integers 10,000,000 and 32,768 but I > found no way to perform all the 24-bit calculations on an 8-bit PIC quick > enough. Removing the GCD often helps but in this case the accumulator > remains 3-bytes wide. > To generate 32 kHz you have to toggle a pin and calculate if the next toggle > must be 38 or 39 instructions in the future; all the math must occur within > 37 instructions. That's why I came up with the binary leap year kind of > algorithm; it's as close to math-less as you can get. You missed the simple way. Table lookup. :) The table is only 256 slots long. That's toggling between 305 and 306 cycles. If your CPU uses N clocks per instruction, multiply the table size by N. In hindsight, I'm embarrassed that I didn't see this much sooner. 10,000,000 is 10^7 or 2^7 * 2^5. 32,768 is 2^15. So we need a factor of 2^8 to get back to where we started. --------- My early introduction to the advantages of table lookup was using Fortran on an IBM 7094. How do you calculate factorials quickly? The table is only 30-50 slots. Anything bigger generates a floating point overflow. --------- Here is my hack python code that I had to write to see what's going on: #!/usr/bin/python2 # Given 10 MHz, target is 32 KHz # What sequence of DDS steps is required? # How long is the sequence before it repeats. import sys Target = 32768 Input = 10000000 table = {} X = 0 K = 0 oldI = 0 for I in range(0,Input): X += Target if X >= Input: X -= Input print "%5d %7d %5d %3d" % (K, I, X, I-oldI) if table.has_key(X): print "Found in table:", X sys.exit(0) table[X] = K K += 1 oldI = I -- These are my opinions, not necessarily my employer's. I hate spam.

OE

Orin Eman

Fri, Feb 3, 2012 6:15 AM

On Thu, Feb 2, 2012 at 7:16 PM, Hal Murray hmurray@megapathdsl.net wrote:

It's possible to use Bresenham with two integers 10,000,000 and 32,768

but I

found no way to perform all the 24-bit calculations on an 8-bit PIC quick
enough. Removing the GCD often helps but in this case the accumulator
remains 3-bytes wide.

To generate 32 kHz you have to toggle a pin and calculate if the next

toggle

must be 38 or 39 instructions in the future; all the math must occur

within

37 instructions. That's why I came up with the binary leap year kind of
algorithm; it's as close to math-less as you can get.

You missed the simple way. Table lookup. :)

The table is only 256 slots long.

That's toggling between 305 and 306 cycles. If your CPU uses N clocks per
instruction, multiply the table size by N.

Well, I thought table lookup too, but I figured a 2048 x 1 table. Easily
done with a rotating bit and 256 byte table.

Assuming clocking a PIC at 10MHz, you have 2,500,000 instructions per
second. Since there was talk about time to the next toggle, we have
2,500,000/65536 instructions between toggles, ie 38.1470... instructions.
The fraction turns out to be 301/2048, so you have to distribute 301 extra
instructions over every 2048 half-periods of the 32768Hz waveform.

Here's what I would do in a mix of C and asm:

unsigned char bitmask = 0x80;
unsigned char index = 0xFF;
unsigned char table[256] = { // Calculate using a spreadsheet or similar };
bit OutputBit;

asm {
loop:
BCF STATUS,C
RLF bitmask,F
BTFSS STATUS,C
GOTO IndexOK
RLF bitmask,F ; restore low bit from carry
INCF index,W ; on to the next byte in the table
GOTO DoLookup
IndexOK:
NOP ; equalize time in if/else cases
NOP
MOVF index,W
DoLookup:
CALL TableLookup ; Not defined here, returns value in W

ANDWF bitmask,W
BTFSS STATUS,Z
GOTO  ExtraCycle        ; 1 cycle if skipped, 2 if executed

ExtraCycle:
}
// Extra delay to get to 38/39 instructions (about 20 instructions if I
counted right)

OutputBit ^= 1;             ; Toggle output
goto loop;

This version rotates the mask each time through and increments the index
every 8 times through. You could increment the index each time through and
rotate the mask when the index rolls over. That makes calculating the
table harder though.

No doubt I got the sense of the skips wrong or miscounted instructions
somewhere!

Orin.

On Thu, Feb 2, 2012 at 7:16 PM, Hal Murray <hmurray@megapathdsl.net> wrote: > > > It's possible to use Bresenham with two integers 10,000,000 and 32,768 > but I > > found no way to perform all the 24-bit calculations on an 8-bit PIC quick > > enough. Removing the GCD often helps but in this case the accumulator > > remains 3-bytes wide. > > > To generate 32 kHz you have to toggle a pin and calculate if the next > toggle > > must be 38 or 39 instructions in the future; all the math must occur > within > > 37 instructions. That's why I came up with the binary leap year kind of > > algorithm; it's as close to math-less as you can get. > > You missed the simple way. Table lookup. :) > > The table is only 256 slots long. > > That's toggling between 305 and 306 cycles. If your CPU uses N clocks per > instruction, multiply the table size by N. > Well, I thought table lookup too, but I figured a 2048 x 1 table. Easily done with a rotating bit and 256 byte table. Assuming clocking a PIC at 10MHz, you have 2,500,000 instructions per second. Since there was talk about time to the next toggle, we have 2,500,000/65536 instructions between toggles, ie 38.1470... instructions. The fraction turns out to be 301/2048, so you have to distribute 301 extra instructions over every 2048 half-periods of the 32768Hz waveform. Here's what I would do in a mix of C and asm: unsigned char bitmask = 0x80; unsigned char index = 0xFF; unsigned char table[256] = { // Calculate using a spreadsheet or similar }; bit OutputBit; asm { loop: BCF STATUS,C RLF bitmask,F BTFSS STATUS,C GOTO IndexOK RLF bitmask,F ; restore low bit from carry INCF index,W ; on to the next byte in the table GOTO DoLookup IndexOK: NOP ; equalize time in if/else cases NOP MOVF index,W DoLookup: CALL TableLookup ; Not defined here, returns value in W ANDWF bitmask,W BTFSS STATUS,Z GOTO ExtraCycle ; 1 cycle if skipped, 2 if executed ExtraCycle: } // Extra delay to get to 38/39 instructions (about 20 instructions if I counted right) OutputBit ^= 1; ; Toggle output goto loop; This version rotates the mask each time through and increments the index every 8 times through. You could increment the index each time through and rotate the mask when the index rolls over. That makes calculating the table harder though. No doubt I got the sense of the skips wrong or miscounted instructions somewhere! Orin.

DF

Dennis Ferguson

Fri, Feb 3, 2012 9:08 AM

On 3 Feb, 2012, at 14:15 , Orin Eman wrote:

On Thu, Feb 2, 2012 at 7:16 PM, Hal Murray hmurray@megapathdsl.net wrote:

It's possible to use Bresenham with two integers 10,000,000 and 32,768

but I

found no way to perform all the 24-bit calculations on an 8-bit PIC quick
enough. Removing the GCD often helps but in this case the accumulator
remains 3-bytes wide.

To generate 32 kHz you have to toggle a pin and calculate if the next

toggle

must be 38 or 39 instructions in the future; all the math must occur

within

37 instructions. That's why I came up with the binary leap year kind of
algorithm; it's as close to math-less as you can get.

You missed the simple way. Table lookup. :)

The table is only 256 slots long.

That's toggling between 305 and 306 cycles. If your CPU uses N clocks per
instruction, multiply the table size by N.

Well, I thought table lookup too, but I figured a 2048 x 1 table. Easily
done with a rotating bit and 256 byte table.

Assuming clocking a PIC at 10MHz, you have 2,500,000 instructions per
second. Since there was talk about time to the next toggle, we have
2,500,000/65536 instructions between toggles, ie 38.1470... instructions.
The fraction turns out to be 301/2048, so you have to distribute 301 extra
instructions over every 2048 half-periods of the 32768Hz waveform.

I only barely know the instruction set on those processors, but it seems
like it should be way easier than that. You know it is going to be 38 or
39 instructions, so that only question is when it should be 39. The value
of 2500000/65536 is 38.1470… in decimal, but in hex it is exactly 26.25a;
that is the 0x26 is 38 decimal while the fractional part is only 10 bits
long. This means you should be able to compute when the extra cycle is
required by keeping a 16 bit accumulator to which the fractional part
0x25a0 is added at every change and executing the extra instruction when
there is a carry out of that. The seems straight forward. If lo' and hi'
are the two halves of the accumulator then the working part of this becomes
something like (excusing my PIC assembler, which I mostly forget):

movl    0xa0,w      // low byte of increment into w
add     w,lo        // add w to lo, may set carry
movl    0x25,w      // high byte of increment into w
btfsc   3,0         // skip next if carry clear
add     one,w       // increment w by one; I'm not sure how to do that
add     w,hi        // add w to hi, may set carry
// if carry set here need extra instruction.  Maybe this does it?
btfss   3,0         // skip if carry set
goto    blorp       // carry clear, don't execute next instruction
nop                 // the extra instruction

blorp:
// enough instructions more to make 38/39

Maybe someone who knows what they're doing can interpret that?

Dennis Ferguson

On 3 Feb, 2012, at 14:15 , Orin Eman wrote: > On Thu, Feb 2, 2012 at 7:16 PM, Hal Murray <hmurray@megapathdsl.net> wrote: > >> >>> It's possible to use Bresenham with two integers 10,000,000 and 32,768 >> but I >>> found no way to perform all the 24-bit calculations on an 8-bit PIC quick >>> enough. Removing the GCD often helps but in this case the accumulator >>> remains 3-bytes wide. >> >>> To generate 32 kHz you have to toggle a pin and calculate if the next >> toggle >>> must be 38 or 39 instructions in the future; all the math must occur >> within >>> 37 instructions. That's why I came up with the binary leap year kind of >>> algorithm; it's as close to math-less as you can get. >> >> You missed the simple way. Table lookup. :) >> >> The table is only 256 slots long. >> >> That's toggling between 305 and 306 cycles. If your CPU uses N clocks per >> instruction, multiply the table size by N. >> > > > > Well, I thought table lookup too, but I figured a 2048 x 1 table. Easily > done with a rotating bit and 256 byte table. > > > Assuming clocking a PIC at 10MHz, you have 2,500,000 instructions per > second. Since there was talk about time to the next toggle, we have > 2,500,000/65536 instructions between toggles, ie 38.1470... instructions. > The fraction turns out to be 301/2048, so you have to distribute 301 extra > instructions over every 2048 half-periods of the 32768Hz waveform. I only barely know the instruction set on those processors, but it seems like it should be way easier than that. You know it is going to be 38 or 39 instructions, so that only question is when it should be 39. The value of 2500000/65536 is 38.1470… in decimal, but in hex it is exactly 26.25a; that is the 0x26 is 38 decimal while the fractional part is only 10 bits long. This means you should be able to compute when the extra cycle is required by keeping a 16 bit accumulator to which the fractional part 0x25a0 is added at every change and executing the extra instruction when there is a carry out of that. The seems straight forward. If `lo' and `hi' are the two halves of the accumulator then the working part of this becomes something like (excusing my PIC assembler, which I mostly forget): movl 0xa0,w // low byte of increment into w add w,lo // add w to lo, may set carry movl 0x25,w // high byte of increment into w btfsc 3,0 // skip next if carry clear add one,w // increment w by one; I'm not sure how to do that add w,hi // add w to hi, may set carry // if carry set here need extra instruction. Maybe this does it? btfss 3,0 // skip if carry set goto blorp // carry clear, don't execute next instruction nop // the extra instruction blorp: // enough instructions more to make 38/39 Maybe someone who knows what they're doing can interpret that? Dennis Ferguson

OE

Orin Eman

Fri, Feb 3, 2012 11:16 PM

On Fri, Feb 3, 2012 at 1:08 AM, Dennis Ferguson <dennis.c.ferguson@gmail.com

wrote:

On 3 Feb, 2012, at 14:15 , Orin Eman wrote:

On Thu, Feb 2, 2012 at 7:16 PM, Hal Murray hmurray@megapathdsl.net

wrote:

It's possible to use Bresenham with two integers 10,000,000 and 32,768

but I

found no way to perform all the 24-bit calculations on an 8-bit PIC

quick

enough. Removing the GCD often helps but in this case the accumulator
remains 3-bytes wide.

To generate 32 kHz you have to toggle a pin and calculate if the next

toggle

must be 38 or 39 instructions in the future; all the math must occur

within

37 instructions. That's why I came up with the binary leap year kind of
algorithm; it's as close to math-less as you can get.

You missed the simple way. Table lookup. :)

The table is only 256 slots long.

That's toggling between 305 and 306 cycles. If your CPU uses N clocks

per

instruction, multiply the table size by N.

Well, I thought table lookup too, but I figured a 2048 x 1 table.

Easily

done with a rotating bit and 256 byte table.

Assuming clocking a PIC at 10MHz, you have 2,500,000 instructions per
second. Since there was talk about time to the next toggle, we have
2,500,000/65536 instructions between toggles, ie 38.1470... instructions.
The fraction turns out to be 301/2048, so you have to distribute 301

extra

instructions over every 2048 half-periods of the 32768Hz waveform.

I only barely know the instruction set on those processors, but it seems
like it should be way easier than that. You know it is going to be 38 or
39 instructions, so that only question is when it should be 39. The value
of 2500000/65536 is 38.1470… in decimal, but in hex it is exactly 26.25a;
that is the 0x26 is 38 decimal while the fractional part is only 10 bits
long. This means you should be able to compute when the extra cycle is
required by keeping a 16 bit accumulator to which the fractional part
0x25a0 is added at every change and executing the extra instruction when
there is a carry out of that. The seems straight forward. If lo' and hi'
are the two halves of the accumulator then the working part of this becomes
something like (excusing my PIC assembler, which I mostly forget):

movl    0xa0,w      // low byte of increment into w
add     w,lo        // add w to lo, may set carry
movl    0x25,w      // high byte of increment into w
btfsc   3,0         // skip next if carry clear
add     one,w       // increment w by one; I'm not sure how to do that
add     w,hi        // add w to hi, may set carry
// if carry set here need extra instruction.  Maybe this does it?
btfss   3,0         // skip if carry set
goto    blorp       // carry clear, don't execute next instruction
nop                 // the extra instruction

blorp:
// enough instructions more to make 38/39

Maybe someone who knows what they're doing can interpret that?

Here you go:

 movlw     0xa0
 addwf      lo,f
 movlw     0x25
 btfsc      status,C
 addlw     1            ; or movlw 0x26
 addwf     hi,f
 btfss      status,C
 goto       skip
 nop
 nop

skip:

Takes 9 or 10 instruction cycles. You needed an extra nop as the goto
takes two cycles, one cycle if skipped. I'd just reverse the sense as
follows:

btfsc      status,C
goto       skip

skip:

If carry is clear, skipping the goto takes one instruction cycle. If carry
is set, executing the goto takes two instruction cycles. Yes, the goto is
to the next inline instruction, but as far as I know, the PIC doesn't
optimise that. The entire sequence in this case would take 8 or 9
instruction cycles.

I'd forgotten the trick of using binary fixed point arithmetic with the
fractional part strategically placed at a byte boundary! I've used a
similar trick in the past to convert a binary fraction into decimal digits

keep multiplying by 10 (a couple of shifts plus an add), pick up the
non-fractional bits as the next decimal digit, then truncate to the
remaining fractional bits.

Orin.

On Fri, Feb 3, 2012 at 1:08 AM, Dennis Ferguson <dennis.c.ferguson@gmail.com > wrote: > > On 3 Feb, 2012, at 14:15 , Orin Eman wrote: > > > On Thu, Feb 2, 2012 at 7:16 PM, Hal Murray <hmurray@megapathdsl.net> > wrote: > > > >> > >>> It's possible to use Bresenham with two integers 10,000,000 and 32,768 > >> but I > >>> found no way to perform all the 24-bit calculations on an 8-bit PIC > quick > >>> enough. Removing the GCD often helps but in this case the accumulator > >>> remains 3-bytes wide. > >> > >>> To generate 32 kHz you have to toggle a pin and calculate if the next > >> toggle > >>> must be 38 or 39 instructions in the future; all the math must occur > >> within > >>> 37 instructions. That's why I came up with the binary leap year kind of > >>> algorithm; it's as close to math-less as you can get. > >> > >> You missed the simple way. Table lookup. :) > >> > >> The table is only 256 slots long. > >> > >> That's toggling between 305 and 306 cycles. If your CPU uses N clocks > per > >> instruction, multiply the table size by N. > >> > > > > > > > > Well, I thought table lookup too, but I figured a 2048 x 1 table. > Easily > > done with a rotating bit and 256 byte table. > > > > > > Assuming clocking a PIC at 10MHz, you have 2,500,000 instructions per > > second. Since there was talk about time to the next toggle, we have > > 2,500,000/65536 instructions between toggles, ie 38.1470... instructions. > > The fraction turns out to be 301/2048, so you have to distribute 301 > extra > > instructions over every 2048 half-periods of the 32768Hz waveform. > > I only barely know the instruction set on those processors, but it seems > like it should be way easier than that. You know it is going to be 38 or > 39 instructions, so that only question is when it should be 39. The value > of 2500000/65536 is 38.1470… in decimal, but in hex it is exactly 26.25a; > that is the 0x26 is 38 decimal while the fractional part is only 10 bits > long. This means you should be able to compute when the extra cycle is > required by keeping a 16 bit accumulator to which the fractional part > 0x25a0 is added at every change and executing the extra instruction when > there is a carry out of that. The seems straight forward. If `lo' and `hi' > are the two halves of the accumulator then the working part of this becomes > something like (excusing my PIC assembler, which I mostly forget): > > movl 0xa0,w // low byte of increment into w > add w,lo // add w to lo, may set carry > movl 0x25,w // high byte of increment into w > btfsc 3,0 // skip next if carry clear > add one,w // increment w by one; I'm not sure how to do that > add w,hi // add w to hi, may set carry > // if carry set here need extra instruction. Maybe this does it? > btfss 3,0 // skip if carry set > goto blorp // carry clear, don't execute next instruction > nop // the extra instruction > blorp: > // enough instructions more to make 38/39 > > Maybe someone who knows what they're doing can interpret that? > Here you go: movlw 0xa0 addwf lo,f movlw 0x25 btfsc status,C addlw 1 ; or movlw 0x26 addwf hi,f btfss status,C goto skip nop nop skip: Takes 9 or 10 instruction cycles. You needed an extra nop as the goto takes two cycles, one cycle if skipped. I'd just reverse the sense as follows: btfsc status,C goto skip skip: If carry is clear, skipping the goto takes one instruction cycle. If carry is set, executing the goto takes two instruction cycles. Yes, the goto is to the next inline instruction, but as far as I know, the PIC doesn't optimise that. The entire sequence in this case would take 8 or 9 instruction cycles. I'd forgotten the trick of using binary fixed point arithmetic with the fractional part strategically placed at a byte boundary! I've used a similar trick in the past to convert a binary fraction into decimal digits - keep multiplying by 10 (a couple of shifts plus an add), pick up the non-fractional bits as the next decimal digit, then truncate to the remaining fractional bits. Orin.

time-nuts@lists.febo.com

Re: [time-nuts] 32768 Hz from 10 MHz

Given 10 MHz, target is 32 KHz

What sequence of DDS steps is required?

How long is the sequence before it repeats.