Empathy List Archives

S

SAIDJACK@aol.com

Thu, Oct 16, 2008 7:21 PM

Hi there,

exactly! What about units that jump only once every say 12 months or so?

I can't find much fundamental research on this at all; and if even Mr. Vig
says the phenomenon is not well understood.. But there are (propriatary) ways
to probe for the susceptibility of a particular unit to do this.

We have done some extensive research trying to correlate jumps to external
phenomena (radiation, vibration, thermal effects, etc). There is no straight
forward correlation.

It's similar to asynchronous switching inside a digital computer. You can
add levels of flip flops to synchronize across two asynchronous time-domains,
but all you are doing is decreasing the possibility of a meta-stable failure to
make it through the flops. Statistically you can never guarantee that there
won't be a failure at all; even if the MTBF is 10 Million years by using five
levels of synchronization etc, a failure could actually happen after 10s of
operation.

The good news is: there are mitigating factors. Most jumps are in the E-011
or E-010 range, and most applications won't even be affected by such an
extremely small frequency change, and we have GPS to quickly correct the
aberration.

I would not be surprised if your standard computer crystal has jumps in the
range of E-07 that happen all the time, and no one notices.

As Mike said, most specs that require performance of E-010 or better are
somewhat bogus, and don't have real-world requirements that come close to the
paper spec. So we try to improve the state of the art as best as we can.

bye,
Said

In a message dated 10/16/2008 11:31:32 Pacific Daylight Time,
hmurray@megapathdsl.net writes:

the problem is one of statistics. Not all crystals exhibit this, one
may find only about 20% or so of a typical manufacturers OCXO will
exhibit these jumps, and with varying intensity.

There is another dimension. How often do the jumps happen?

Do the other 80% make jumps if you wait longer?

Hi there, exactly! What about units that jump only once every say 12 months or so? I can't find much fundamental research on this at all; and if even Mr. Vig says the phenomenon is not well understood.. But there are (propriatary) ways to probe for the susceptibility of a particular unit to do this. We have done some extensive research trying to correlate jumps to external phenomena (radiation, vibration, thermal effects, etc). There is no straight forward correlation. It's similar to asynchronous switching inside a digital computer. You can add levels of flip flops to synchronize across two asynchronous time-domains, but all you are doing is decreasing the possibility of a meta-stable failure to make it through the flops. Statistically you can never guarantee that there won't be a failure at all; even if the MTBF is 10 Million years by using five levels of synchronization etc, a failure could actually happen after 10s of operation. The good news is: there are mitigating factors. Most jumps are in the E-011 or E-010 range, and most applications won't even be affected by such an extremely small frequency change, and we have GPS to quickly correct the aberration. I would not be surprised if your standard computer crystal has jumps in the range of E-07 that happen all the time, and no one notices. As Mike said, most specs that require performance of E-010 or better are somewhat bogus, and don't have real-world requirements that come close to the paper spec. So we try to improve the state of the art as best as we can. bye, Said In a message dated 10/16/2008 11:31:32 Pacific Daylight Time, hmurray@megapathdsl.net writes: > the problem is one of statistics. Not all crystals exhibit this, one > may find only about 20% or so of a typical manufacturers OCXO will > exhibit these jumps, and with varying intensity. There is another dimension. How often do the jumps happen? Do the other 80% make jumps if you wait longer?

TV

Tom Van Baak

Thu, Oct 16, 2008 7:36 PM

I can't find much fundamental research on this at all; and if even Mr. Vig
says the phenomenon is not well understood.. But there are (propriatary) ways
to probe for the susceptibility of a particular unit to do this.

John told me once that an easy way to deal with many classes
of jumps is simply use a pair of OCXO. This is an easy solution
for a time-nut (driven by interest and passion), perhaps much
less so for a manufacturer (driven by market and profit).

If we as a group find that unpredictable phase or frequency
jumps are limiting GPSDO performance then that would be a
fun group discussion on how best to implement a N-way XO
GPSDO. If done to the extreme it could also reduce phase
noise as a by-product.

But let's not jump (!) to conclusions that Mike's problem is
necessarily all quartz related. Given how little testing any of
us have done on this particular GPSDO, speculation should
take a back seat to measurement.

/tvb

> I can't find much fundamental research on this at all; and if even Mr. Vig > says the phenomenon is not well understood.. But there are (propriatary) ways > to probe for the susceptibility of a particular unit to do this. John told me once that an easy way to deal with many classes of jumps is simply use a pair of OCXO. This is an easy solution for a time-nut (driven by interest and passion), perhaps much less so for a manufacturer (driven by market and profit). If we as a group find that unpredictable phase or frequency jumps are limiting GPSDO performance then that would be a fun group discussion on how best to implement a N-way XO GPSDO. If done to the extreme it could also reduce phase noise as a by-product. But let's not jump (!) to conclusions that Mike's problem is necessarily all quartz related. Given how little testing any of us have done on this particular GPSDO, speculation should take a back seat to measurement. /tvb

JM

John Miles

Thu, Oct 16, 2008 9:27 PM

Couple of (somewhat naive) questions here:

It's similar to asynchronous switching inside a digital computer.
You can
add levels of flip flops to synchronize across two asynchronous
time-domains,
but all you are doing is decreasing the possibility of a
meta-stable failure to
make it through the flops. Statistically you can never guarantee
that there
won't be a failure at all; even if the MTBF is 10 Million years
by using five
levels of synchronization etc, a failure could actually happen
after 10s of
operation.

Well, no, proper domain synchronization doesn't just give you an incremental
advantage. The use of flip-flops between clock domains is done to trade
latency for guaranteed stability. The idea is to isolate the effects of
metastability to a single clock edge that won't be used to clock anything
else. Unless a metastable event somehow lasts more than one clock period
(or half-period) it won't constitute a failure... and that never happens in
practice, in the absence of a hard failure. Correct? Or am I missing
something? (e.g., are we talking cosmic-ray hits, which are much more
likely to affect RAM elements than clock synchronizers?)

The good news is: there are mitigating factors. Most jumps are in
the E-011
or E-010 range, and most applications won't even be affected by such an
extremely small frequency change, and we have GPS to quickly correct the
aberration.

Is it a good idea to tie a crystal to GPS with such a wide loop bandwidth?
GPS-locked loops are usually on the order of t=1 minute, right? Or do you
use 'speedup' tricks to temporarily widen the loop bandwidth when you see a
fast transition? That sounds reasonable as long as there aren't any GPS
propagation aberrations or timing-receiver artifacts that can't be
distinguished from a crystal jump.

-- john, KE5FX

Couple of (somewhat naive) questions here: > It's similar to asynchronous switching inside a digital computer. > You can > add levels of flip flops to synchronize across two asynchronous > time-domains, > but all you are doing is decreasing the possibility of a > meta-stable failure to > make it through the flops. Statistically you can never guarantee > that there > won't be a failure at all; even if the MTBF is 10 Million years > by using five > levels of synchronization etc, a failure could actually happen > after 10s of > operation. Well, no, proper domain synchronization doesn't just give you an incremental advantage. The use of flip-flops between clock domains is done to trade latency for guaranteed stability. The idea is to isolate the effects of metastability to a single clock edge that won't be used to clock anything else. Unless a metastable event somehow lasts more than one clock period (or half-period) it won't constitute a failure... and that never happens in practice, in the absence of a hard failure. Correct? Or am I missing something? (e.g., are we talking cosmic-ray hits, which are much more likely to affect RAM elements than clock synchronizers?) > The good news is: there are mitigating factors. Most jumps are in > the E-011 > or E-010 range, and most applications won't even be affected by such an > extremely small frequency change, and we have GPS to quickly correct the > aberration. Is it a good idea to tie a crystal to GPS with such a wide loop bandwidth? GPS-locked loops are usually on the order of t=1 minute, right? Or do you use 'speedup' tricks to temporarily widen the loop bandwidth when you see a fast transition? That sounds reasonable as long as there aren't any GPS propagation aberrations or timing-receiver artifacts that can't be distinguished from a crystal jump. -- john, KE5FX

JM

John Miles

Thu, Oct 16, 2008 9:36 PM

I can't find much fundamental research on this at all; and if

even Mr. Vig

says the phenomenon is not well understood.. But there are

(propriatary) ways

to probe for the susceptibility of a particular unit to do this.

John told me once that an easy way to deal with many classes
of jumps is simply use a pair of OCXO. This is an easy solution
for a time-nut (driven by interest and passion), perhaps much
less so for a manufacturer (driven by market and profit).

Hmm, it's worth considering how good the secondary oscillator really needs
to be. If the jumps tend to occur once per minute in a cheap oscillator and
once per hour in a good one, and if they're truly uncorrelated, then it
might be reasonable to detect jumps in either oscillator by differentiating
the beat note between the cheap and expensive rocks. A third cheap
oscillator would be used to 'vote' on which one was responsible for the
observed jump.

You would only need oscillators of equal quality if you wanted to improve
ADEV and/or PN performance across the board. Detecting fast excursions
should in principle be much easier.

-- john, KE5FX

> > I can't find much fundamental research on this at all; and if > even Mr. Vig > > says the phenomenon is not well understood.. But there are > (propriatary) ways > > to probe for the susceptibility of a particular unit to do this. > > John told me once that an easy way to deal with many classes > of jumps is simply use a pair of OCXO. This is an easy solution > for a time-nut (driven by interest and passion), perhaps much > less so for a manufacturer (driven by market and profit). Hmm, it's worth considering how good the secondary oscillator really needs to be. If the jumps tend to occur once per minute in a cheap oscillator and once per hour in a good one, and if they're truly uncorrelated, then it might be reasonable to detect jumps in either oscillator by differentiating the beat note between the cheap and expensive rocks. A third cheap oscillator would be used to 'vote' on which one was responsible for the observed jump. You would only need oscillators of equal quality if you wanted to improve ADEV and/or PN performance across the board. Detecting fast excursions should in principle be much easier. -- john, KE5FX

LJ

Lux, James P

Thu, Oct 16, 2008 10:18 PM

Well, no, proper domain synchronization doesn't just give you
an incremental advantage. The use of flip-flops between
clock domains is done to trade latency for guaranteed
stability. The idea is to isolate the effects of
metastability to a single clock edge that won't be used to
clock anything else. Unless a metastable event somehow lasts
more than one clock period (or half-period) it won't
constitute a failure... and that never happens in practice,
in the absence of a hard failure. Correct? Or am I missing
something? (e.g., are we talking cosmic-ray hits, which are
much more likely to affect RAM elements than clock synchronizers?)

A flipflop used as a synchronizer is a RAM element subject to upset, albeit one that can be made quite robust with internal redundancy.

Even without TMR or other similar schemes, the probability of upset IS pretty low. However, as Black or Scholes said(I can't remember which), "One should not confuse very low probability with impossible". If it absolutely, positively can't take any hit, then some more work is involved.

James Lux, P.E.
Task Manager, SOMD Software Defined Radios
Flight Communications Systems Section
Jet Propulsion Laboratory
4800 Oak Grove Drive, Mail Stop 161-213
Pasadena, CA, 91109
+1(818)354-2075 phone
+1(818)393-6875 fax

> > Well, no, proper domain synchronization doesn't just give you > an incremental advantage. The use of flip-flops between > clock domains is done to trade latency for guaranteed > stability. The idea is to isolate the effects of > metastability to a single clock edge that won't be used to > clock anything else. Unless a metastable event somehow lasts > more than one clock period (or half-period) it won't > constitute a failure... and that never happens in practice, > in the absence of a hard failure. Correct? Or am I missing > something? (e.g., are we talking cosmic-ray hits, which are > much more likely to affect RAM elements than clock synchronizers?) > A flipflop used as a synchronizer *is* a RAM element subject to upset, albeit one that can be made quite robust with internal redundancy. Even without TMR or other similar schemes, the probability of upset IS pretty low. However, as Black or Scholes said(I can't remember which), "One should not confuse very low probability with impossible". If it absolutely, positively can't take any hit, then some more work is involved. James Lux, P.E. Task Manager, SOMD Software Defined Radios Flight Communications Systems Section Jet Propulsion Laboratory 4800 Oak Grove Drive, Mail Stop 161-213 Pasadena, CA, 91109 +1(818)354-2075 phone +1(818)393-6875 fax

HM

Hal Murray

Thu, Oct 16, 2008 10:40 PM

We have done some extensive research trying to correlate jumps to
external phenomena (radiation, vibration, thermal effects, etc).
There is no straight forward correlation.

It's similar to asynchronous switching inside a digital computer.
You can add levels of flip flops to synchronize across two
asynchronous time-domains, ...

I think that's misleading in two ways.

First, we understand metastability. We can measure it and predict it. It it
mattered, we could test each individual part.

Second, the failure mode is exponential in a parameter we can control. So
given a particular set of parts to pick from, it's reasonable to make a
design with a probability of error small enough so that other things are much
more important.

If it absolutely, positively can't take any hit, then some more work is involved

I would say that the first step is to put a number on "absolutely,
positively".

If you aren't willing to take some risk, you won't get off the drawing board.

--
These are my opinions, not necessarily my employer's. I hate spam.

> We have done some extensive research trying to correlate jumps to > external phenomena (radiation, vibration, thermal effects, etc). > There is no straight forward correlation. > It's similar to asynchronous switching inside a digital computer. > You can add levels of flip flops to synchronize across two > asynchronous time-domains, ... I think that's misleading in two ways. First, we understand metastability. We can measure it and predict it. It it mattered, we could test each individual part. Second, the failure mode is exponential in a parameter we can control. So given a particular set of parts to pick from, it's reasonable to make a design with a probability of error small enough so that other things are much more important. > If it absolutely, positively can't take any hit, then some more work is involved I would say that the first step is to put a number on "absolutely, positively". If you aren't willing to take some risk, you won't get off the drawing board. -- These are my opinions, not necessarily my employer's. I hate spam.

MM

Mike Monett

Thu, Oct 16, 2008 10:50 PM

"Lux, James P" james.p.lux@jpl.nasa.gov wrote:

[...]

Even without TMR or other similar schemes, the probability of
upset IS pretty low. However, as Black or Scholes said (I can't
remember which), "One should not confuse very low probability with
impossible". If it absolutely, positively can't take any hit, then
some more work is involved.

James Lux, P.E.

How do you do that? Any web links to study?

As far as I know, it is impossible to absolutely guarantee against
metastability. Do you wait a week for the metastability to settle?

If zero probability of failure is so important, you would also have
to include the probability of a solder joint opening, or a chip
failing due to metal migration or latent ESD damage. That is never
zero.

Of course, after the system is perfect, someone will take it and put
it on a destroyer running Windows:)

Best Regards,

Mike Monett

"Lux, James P" <james.p.lux@jpl.nasa.gov> wrote: [...] > Even without TMR or other similar schemes, the probability of > upset IS pretty low. However, as Black or Scholes said (I can't > remember which), "One should not confuse very low probability with > impossible". If it absolutely, positively can't take any hit, then > some more work is involved. > James Lux, P.E. How do you do that? Any web links to study? As far as I know, it is impossible to absolutely guarantee against metastability. Do you wait a week for the metastability to settle? If zero probability of failure is so important, you would also have to include the probability of a solder joint opening, or a chip failing due to metal migration or latent ESD damage. That is never zero. Of course, after the system is perfect, someone will take it and put it on a destroyer running Windows:) Best Regards, Mike Monett

LJ

Lux, James P

Thu, Oct 16, 2008 11:20 PM

If it absolutely, positively can't take any hit, then some

more work

is involved

I would say that the first step is to put a number on
"absolutely, positively".

There are lots of systems where you can't put a real number on it, for one reason or another. Either there's too many unknowns, there are political forces at work (viz "put your management hat on"), or the system is so complex that any computed failure probability will approach 1.0, over any reasonable time scale.

If you aren't willing to take some risk, you won't get off
the drawing board.

That's certainly true, but a lot of times, you can't (or won't) quantify the risk.

So then, you fall back on fuzzy things like "really good" or "really really good" or "whatever we can do for X dollars and Y years with the best people we can hire"

Jim

> > > If it absolutely, positively can't take any hit, then some > more work > > is involved > > I would say that the first step is to put a number on > "absolutely, positively". > > There are lots of systems where you can't put a real number on it, for one reason or another. Either there's too many unknowns, there are political forces at work (viz "put your management hat on"), or the system is so complex that any computed failure probability will approach 1.0, over any reasonable time scale. > If you aren't willing to take some risk, you won't get off > the drawing board. > That's certainly true, but a lot of times, you can't (or won't) quantify the risk. So then, you fall back on fuzzy things like "really good" or "really really good" or "whatever we can do for X dollars and Y years with the best people we can hire" Jim

SJ

Said Jackson

Fri, Oct 17, 2008 12:35 AM

Hi Tom,
Would love to see the efc chart on the mini-t when a jump occurs. That
would give a definate result...
Said

From iPhone

On Oct 16, 2008, at 12:36, "Tom Van Baak" tvb@LeapSecond.com wrote:

I can't find much fundamental research on this at all; and if even
Mr. Vig
says the phenomenon is not well understood.. But there are
(propriatary) ways
to probe for the susceptibility of a particular unit to do this.

John told me once that an easy way to deal with many classes
of jumps is simply use a pair of OCXO. This is an easy solution
for a time-nut (driven by interest and passion), perhaps much
less so for a manufacturer (driven by market and profit).

If we as a group find that unpredictable phase or frequency
jumps are limiting GPSDO performance then that would be a
fun group discussion on how best to implement a N-way XO
GPSDO. If done to the extreme it could also reduce phase
noise as a by-product.

But let's not jump (!) to conclusions that Mike's problem is
necessarily all quartz related. Given how little testing any of
us have done on this particular GPSDO, speculation should
take a back seat to measurement.

/tvb

time-nuts mailing list -- time-nuts@febo.com
To unsubscribe, go to https://www.febo.com/cgi-bin/mailman/listinfo/time-nuts
and follow the instructions there.

Hi Tom, Would love to see the efc chart on the mini-t when a jump occurs. That would give a definate result... Said From iPhone On Oct 16, 2008, at 12:36, "Tom Van Baak" <tvb@LeapSecond.com> wrote: >> I can't find much fundamental research on this at all; and if even >> Mr. Vig >> says the phenomenon is not well understood.. But there are >> (propriatary) ways >> to probe for the susceptibility of a particular unit to do this. > > John told me once that an easy way to deal with many classes > of jumps is simply use a pair of OCXO. This is an easy solution > for a time-nut (driven by interest and passion), perhaps much > less so for a manufacturer (driven by market and profit). > > If we as a group find that unpredictable phase or frequency > jumps are limiting GPSDO performance then that would be a > fun group discussion on how best to implement a N-way XO > GPSDO. If done to the extreme it could also reduce phase > noise as a by-product. > > But let's not jump (!) to conclusions that Mike's problem is > necessarily all quartz related. Given how little testing any of > us have done on this particular GPSDO, speculation should > take a back seat to measurement. > > /tvb > > > _______________________________________________ > time-nuts mailing list -- time-nuts@febo.com > To unsubscribe, go to https://www.febo.com/cgi-bin/mailman/listinfo/time-nuts > and follow the instructions there.

time-nuts@lists.febo.com

Re: [time-nuts] Frequency Stability of Trimble Mini-T