It's possible to use Bresenham with two integers 10,000,000 and 32,768 but I
found no way to perform all the 24-bit calculations on an 8-bit PIC quick
enough. Removing the GCD often helps but in this case the accumulator
remains 3-bytes wide.
To generate 32 kHz you have to toggle a pin and calculate if the next toggle
must be 38 or 39 instructions in the future; all the math must occur within
37 instructions. That's why I came up with the binary leap year kind of
algorithm; it's as close to math-less as you can get.
You missed the simple way. Table lookup. :)
The table is only 256 slots long.
That's toggling between 305 and 306 cycles. If your CPU uses N clocks per
instruction, multiply the table size by N.
In hindsight, I'm embarrassed that I didn't see this much sooner.
10,000,000 is 10^7 or 2^7 * 2^5.
32,768 is 2^15. So we need a factor of 2^8 to get back to where we started.
My early introduction to the advantages of table lookup was using Fortran on
an IBM 7094. How do you calculate factorials quickly? The table is only
30-50 slots. Anything bigger generates a floating point overflow.
Here is my hack python code that I had to write to see what's going on:
#!/usr/bin/python2
import sys
Target = 32768
Input = 10000000
table = {}
X = 0
K = 0
oldI = 0
for I in range(0,Input):
X += Target
if X >= Input:
X -= Input
print "%5d %7d %5d %3d" % (K, I, X, I-oldI)
if table.has_key(X):
print "Found in table:", X
sys.exit(0)
table[X] = K
K += 1
oldI = I
--
These are my opinions, not necessarily my employer's. I hate spam.
On Thu, Feb 2, 2012 at 7:16 PM, Hal Murray hmurray@megapathdsl.net wrote:
It's possible to use Bresenham with two integers 10,000,000 and 32,768
but I
found no way to perform all the 24-bit calculations on an 8-bit PIC quick
enough. Removing the GCD often helps but in this case the accumulator
remains 3-bytes wide.
To generate 32 kHz you have to toggle a pin and calculate if the next
toggle
must be 38 or 39 instructions in the future; all the math must occur
within
37 instructions. That's why I came up with the binary leap year kind of
algorithm; it's as close to math-less as you can get.
You missed the simple way. Table lookup. :)
The table is only 256 slots long.
That's toggling between 305 and 306 cycles. If your CPU uses N clocks per
instruction, multiply the table size by N.
Well, I thought table lookup too, but I figured a 2048 x 1 table. Easily
done with a rotating bit and 256 byte table.
Assuming clocking a PIC at 10MHz, you have 2,500,000 instructions per
second. Since there was talk about time to the next toggle, we have
2,500,000/65536 instructions between toggles, ie 38.1470... instructions.
The fraction turns out to be 301/2048, so you have to distribute 301 extra
instructions over every 2048 half-periods of the 32768Hz waveform.
Here's what I would do in a mix of C and asm:
unsigned char bitmask = 0x80;
unsigned char index = 0xFF;
unsigned char table[256] = { // Calculate using a spreadsheet or similar };
bit OutputBit;
asm {
loop:
BCF STATUS,C
RLF bitmask,F
BTFSS STATUS,C
GOTO IndexOK
RLF bitmask,F ; restore low bit from carry
INCF index,W ; on to the next byte in the table
GOTO DoLookup
IndexOK:
NOP ; equalize time in if/else cases
NOP
MOVF index,W
DoLookup:
CALL TableLookup ; Not defined here, returns value in W
ANDWF bitmask,W
BTFSS STATUS,Z
GOTO ExtraCycle ; 1 cycle if skipped, 2 if executed
ExtraCycle:
}
// Extra delay to get to 38/39 instructions (about 20 instructions if I
counted right)
OutputBit ^= 1; ; Toggle output
goto loop;
This version rotates the mask each time through and increments the index
every 8 times through. You could increment the index each time through and
rotate the mask when the index rolls over. That makes calculating the
table harder though.
No doubt I got the sense of the skips wrong or miscounted instructions
somewhere!
Orin.
On 3 Feb, 2012, at 14:15 , Orin Eman wrote:
On Thu, Feb 2, 2012 at 7:16 PM, Hal Murray hmurray@megapathdsl.net wrote:
It's possible to use Bresenham with two integers 10,000,000 and 32,768
but I
found no way to perform all the 24-bit calculations on an 8-bit PIC quick
enough. Removing the GCD often helps but in this case the accumulator
remains 3-bytes wide.
To generate 32 kHz you have to toggle a pin and calculate if the next
toggle
must be 38 or 39 instructions in the future; all the math must occur
within
37 instructions. That's why I came up with the binary leap year kind of
algorithm; it's as close to math-less as you can get.
You missed the simple way. Table lookup. :)
The table is only 256 slots long.
That's toggling between 305 and 306 cycles. If your CPU uses N clocks per
instruction, multiply the table size by N.
Well, I thought table lookup too, but I figured a 2048 x 1 table. Easily
done with a rotating bit and 256 byte table.
Assuming clocking a PIC at 10MHz, you have 2,500,000 instructions per
second. Since there was talk about time to the next toggle, we have
2,500,000/65536 instructions between toggles, ie 38.1470... instructions.
The fraction turns out to be 301/2048, so you have to distribute 301 extra
instructions over every 2048 half-periods of the 32768Hz waveform.
I only barely know the instruction set on those processors, but it seems
like it should be way easier than that. You know it is going to be 38 or
39 instructions, so that only question is when it should be 39. The value
of 2500000/65536 is 38.1470… in decimal, but in hex it is exactly 26.25a;
that is the 0x26 is 38 decimal while the fractional part is only 10 bits
long. This means you should be able to compute when the extra cycle is
required by keeping a 16 bit accumulator to which the fractional part
0x25a0 is added at every change and executing the extra instruction when
there is a carry out of that. The seems straight forward. If lo' and hi'
are the two halves of the accumulator then the working part of this becomes
something like (excusing my PIC assembler, which I mostly forget):
movl 0xa0,w // low byte of increment into w
add w,lo // add w to lo, may set carry
movl 0x25,w // high byte of increment into w
btfsc 3,0 // skip next if carry clear
add one,w // increment w by one; I'm not sure how to do that
add w,hi // add w to hi, may set carry
// if carry set here need extra instruction. Maybe this does it?
btfss 3,0 // skip if carry set
goto blorp // carry clear, don't execute next instruction
nop // the extra instruction
blorp:
// enough instructions more to make 38/39
Maybe someone who knows what they're doing can interpret that?
Dennis Ferguson
On Fri, Feb 3, 2012 at 1:08 AM, Dennis Ferguson <dennis.c.ferguson@gmail.com
wrote:
On 3 Feb, 2012, at 14:15 , Orin Eman wrote:
On Thu, Feb 2, 2012 at 7:16 PM, Hal Murray hmurray@megapathdsl.net
wrote:
It's possible to use Bresenham with two integers 10,000,000 and 32,768
but I
found no way to perform all the 24-bit calculations on an 8-bit PIC
quick
enough. Removing the GCD often helps but in this case the accumulator
remains 3-bytes wide.
To generate 32 kHz you have to toggle a pin and calculate if the next
toggle
must be 38 or 39 instructions in the future; all the math must occur
within
37 instructions. That's why I came up with the binary leap year kind of
algorithm; it's as close to math-less as you can get.
You missed the simple way. Table lookup. :)
The table is only 256 slots long.
That's toggling between 305 and 306 cycles. If your CPU uses N clocks
per
instruction, multiply the table size by N.
Well, I thought table lookup too, but I figured a 2048 x 1 table.
Easily
done with a rotating bit and 256 byte table.
Assuming clocking a PIC at 10MHz, you have 2,500,000 instructions per
second. Since there was talk about time to the next toggle, we have
2,500,000/65536 instructions between toggles, ie 38.1470... instructions.
The fraction turns out to be 301/2048, so you have to distribute 301
extra
instructions over every 2048 half-periods of the 32768Hz waveform.
I only barely know the instruction set on those processors, but it seems
like it should be way easier than that. You know it is going to be 38 or
39 instructions, so that only question is when it should be 39. The value
of 2500000/65536 is 38.1470… in decimal, but in hex it is exactly 26.25a;
that is the 0x26 is 38 decimal while the fractional part is only 10 bits
long. This means you should be able to compute when the extra cycle is
required by keeping a 16 bit accumulator to which the fractional part
0x25a0 is added at every change and executing the extra instruction when
there is a carry out of that. The seems straight forward. If lo' and hi'
are the two halves of the accumulator then the working part of this becomes
something like (excusing my PIC assembler, which I mostly forget):
movl 0xa0,w // low byte of increment into w
add w,lo // add w to lo, may set carry
movl 0x25,w // high byte of increment into w
btfsc 3,0 // skip next if carry clear
add one,w // increment w by one; I'm not sure how to do that
add w,hi // add w to hi, may set carry
// if carry set here need extra instruction. Maybe this does it?
btfss 3,0 // skip if carry set
goto blorp // carry clear, don't execute next instruction
nop // the extra instruction
blorp:
// enough instructions more to make 38/39
Maybe someone who knows what they're doing can interpret that?
Here you go:
movlw 0xa0
addwf lo,f
movlw 0x25
btfsc status,C
addlw 1 ; or movlw 0x26
addwf hi,f
btfss status,C
goto skip
nop
nop
skip:
Takes 9 or 10 instruction cycles. You needed an extra nop as the goto
takes two cycles, one cycle if skipped. I'd just reverse the sense as
follows:
btfsc status,C
goto skip
skip:
If carry is clear, skipping the goto takes one instruction cycle. If carry
is set, executing the goto takes two instruction cycles. Yes, the goto is
to the next inline instruction, but as far as I know, the PIC doesn't
optimise that. The entire sequence in this case would take 8 or 9
instruction cycles.
I'd forgotten the trick of using binary fixed point arithmetic with the
fractional part strategically placed at a byte boundary! I've used a
similar trick in the past to convert a binary fraction into decimal digits
Orin.