Empathy List Archives

MD

Marcus D. Leech

Tue, Apr 25, 2023 4:34 PM

On 25/04/2023 12:30, James Schatzman wrote:

I don't know if this is new information but we have observed some
additional behaviors:

The radio reports dropped UDP packets. Why is it dropping them?

How are you determining this? For streaming, the Linux in the Zynq is
entirely out of the picture.

At least part of the time, the radio's fan kicks into high speed
seemingingly at the same moment that the under-run occurs. I have no
information about how it controls the fan so this is mysterious.

That is an interesting tidbit, but I honestly don't know what it means.

Jim

On Tue, Apr 25, 2023 at 12:20:45PM -0400, Marcus D. Leech wrote:

On 25/04/2023 10:34, Jim Schatzman wrote:

We have been working with N310 and N321 radios. It seems very difficult to get long term continuous operation without under-runs even at the seemingly very low data rate of 12.5 Msps.

Currently we are trying various firmware versions, changing buffer sizes, etc., but so far nothing has gotten the radios to work consistently for several hours without under-runs. One run might go for 3 hours without failure. The next attempt under-runs after 10 minutes. It is very erratic. Our next attempt will be to implement an input FIFO.

The configurations have direct connections between a fast host (Ubuntu with unnecessary services including Network Manager disabled or removed), and the radio with 10 Gbit and frame size of 9000. Without using jumbo frames the behavior was far worse.

Any other ideas??

Thanks-
Jim

Something you could try in terms of isolating root cause is to use the
"benchmark_rate" example application, and
configure it using the --duration option for a very long run and use
--tx_rate to cause it to only do a TX test.

USRP-users mailing list -- usrp-users@lists.ettus.com
To unsubscribe send an email to usrp-users-leave@lists.ettus.com

On 25/04/2023 12:30, James Schatzman wrote: > I don't know if this is new information but we have observed some > additional behaviors: > > 1) The radio reports dropped UDP packets. Why is it dropping them? How are you determining this? For streaming, the Linux in the Zynq is entirely out of the picture. > > 2) At least part of the time, the radio's fan kicks into high speed > seemingingly at the same moment that the under-run occurs. I have no > information about how it controls the fan so this is mysterious. That is an interesting tidbit, but I honestly don't know what it means. > > Jim > > > On Tue, Apr 25, 2023 at 12:20:45PM -0400, Marcus D. Leech wrote: >> On 25/04/2023 10:34, Jim Schatzman wrote: >>> We have been working with N310 and N321 radios. It seems very difficult to get long term continuous operation without under-runs even at the seemingly very low data rate of 12.5 Msps. >>> >>> Currently we are trying various firmware versions, changing buffer sizes, etc., but so far nothing has gotten the radios to work consistently for several hours without under-runs. One run might go for 3 hours without failure. The next attempt under-runs after 10 minutes. It is very erratic. Our next attempt will be to implement an input FIFO. >>> >>> The configurations have direct connections between a fast host (Ubuntu with unnecessary services including Network Manager disabled or removed), and the radio with 10 Gbit and frame size of 9000. Without using jumbo frames the behavior was far worse. >>> >>> Any other ideas?? >>> >>> Thanks- >>> Jim >>> >>> >> Something you could try in terms of isolating root cause is to use the >> "benchmark_rate" example application, and >> configure it using the --duration option for a very long run and use >> --tx_rate to cause it to only do a TX test. >> >> _______________________________________________ >> USRP-users mailing list -- usrp-users@lists.ettus.com >> To unsubscribe send an email to usrp-users-leave@lists.ettus.com

RK

Rob Kossler

Tue, Apr 25, 2023 7:35 PM

On Tue, Apr 25, 2023 at 12:34 PM Marcus D. Leech
patchvonbraun@gmail.com wrote:

On 25/04/2023 12:30, James Schatzman wrote:

I don't know if this is new information but we have observed some
additional behaviors:

The radio reports dropped UDP packets. Why is it dropping them?

How are you determining this? For streaming, the Linux in the Zynq is
entirely out of the picture.

My understanding is that dropped packets in UHD mean that the
receiving entity received consecutive packets that did not have
consecutive indices. So, if the FPGA detected such a condition, it
would send an appropriate error message back to UHD which would
produce the "D" or "S". For Underruns, it would be the Radio block
that sends the error message back to UHD.
Rob

On Tue, Apr 25, 2023 at 12:34 PM Marcus D. Leech <patchvonbraun@gmail.com> wrote: > > On 25/04/2023 12:30, James Schatzman wrote: > > I don't know if this is new information but we have observed some > > additional behaviors: > > > > 1) The radio reports dropped UDP packets. Why is it dropping them? > How are you determining this? For streaming, the Linux in the Zynq is > entirely out of the picture. My understanding is that dropped packets in UHD mean that the receiving entity received consecutive packets that did not have consecutive indices. So, if the FPGA detected such a condition, it would send an appropriate error message back to UHD which would produce the "D" or "S". For Underruns, it would be the Radio block that sends the error message back to UHD. Rob

MD

Marcus D. Leech

Tue, Apr 25, 2023 7:38 PM

On 25/04/2023 15:35, Rob Kossler wrote:

On Tue, Apr 25, 2023 at 12:34 PM Marcus D. Leech
patchvonbraun@gmail.com wrote:

My understanding is that dropped packets in UHD mean that the
receiving entity received consecutive packets that did not have
consecutive indices. So, if the FPGA detected such a condition, it
would send an appropriate error message back to UHD which would
produce the "D" or "S". For Underruns, it would be the Radio block
that sends the error message back to UHD.
Rob

Understood. The way the answer was phrased made me think of "Dropped
Packets" in
things like "ifconfig", rather than just the usual UHD reporting.

On 25/04/2023 15:35, Rob Kossler wrote: > On Tue, Apr 25, 2023 at 12:34 PM Marcus D. Leech > <patchvonbraun@gmail.com> wrote: >> > My understanding is that dropped packets in UHD mean that the > receiving entity received consecutive packets that did not have > consecutive indices. So, if the FPGA detected such a condition, it > would send an appropriate error message back to UHD which would > produce the "D" or "S". For Underruns, it would be the Radio block > that sends the error message back to UHD. > Rob Understood. The way the answer was phrased made me think of "Dropped Packets" in things like "ifconfig", rather than just the usual UHD reporting.

JS

Jim Schatzman

Tue, Apr 25, 2023 10:50 PM

Re. "the Linux in the Zynq is out of the picture." That is interesting. Yes, my comment was based on the radio's Linux OS reporting dropped UDP packets for the 10 Gbit interface. It sounds like you are saying that has nothing to do with UDP packets on the interface. Confusing...

If there really are no dropped packets then an underrun seems to imply that the radio thinks it is not getting data from the host in the required timely manner.

If the radio is dropping packets then it seems that could be the explanation for the underrun condition, although we have no idea why the radio would be dropping UDP packets.

At the time this underrun occurs, we find no evidence of an issue on the host side.
a) There is nothing in the host Linux system log indicating that anything loggable occured at or around the time of the undderrun.
b) The modified host application specifically monitors for the condition that it does not have data available at the time data is to be send to the radio. This condition never occurs.

At 10:34 AM 4/25/2023, Marcus D. Leech wrote:

On 25/04/2023 12:30, James Schatzman wrote:

I don't know if this is new information but we have observed some
additional behaviors:

The radio reports dropped UDP packets. Why is it dropping them?

How are you determining this?Â For streaming, the Linux in the Zynq is entirely out of the picture.

At least part of the time, the radio's fan kicks into high speed
seemingingly at the same moment that the under-run occurs. I have no
information about how it controls the fan so this is mysterious.

That is an interesting tidbit, but I honestly don't know what it means.

Jim

On Tue, Apr 25, 2023 at 12:20:45PM -0400, Marcus D. Leech wrote:

On 25/04/2023 10:34, Jim Schatzman wrote:

We have been working with N310 and N321 radios. It seems very difficult to get long term continuous operation without under-runs even at the seemingly very low data rate of 12.5 Msps.

Currently we are trying various firmware versions, changing buffer sizes, etc., but so far nothing has gotten the radios to work consistently for several hours without under-runs. One run might go for 3 hours without failure. The next attempt under-runs after 10 minutes. It is very erratic. Our next attempt will be to implement an input FIFO.

The configurations have direct connections between a fast host (Ubuntu with unnecessary services including Network Manager disabled or removed), and the radio with 10 Gbit and frame size of 9000. Without using jumbo frames the behavior was far worse.

Any other ideas??

Thanks-
Jim

Something you could try in terms of isolating root cause is to use the
"benchmark_rate" example application, and
Â configure it using the --duration option for a very long run and use
--tx_rate to cause it to only do a TX test.

USRP-users mailing list -- usrp-users@lists.ettus.com
To unsubscribe send an email to usrp-users-leave@lists.ettus.com

_______________________________________________ USRP-users mailing list -- usrp-users@lists.ettus.com To unsubscribe send an email to usrp-users-leave@lists.ettus.com </x-flowed>

Re. "the Linux in the Zynq is out of the picture." That is interesting. Yes, my comment was based on the radio's Linux OS reporting dropped UDP packets for the 10 Gbit interface. It sounds like you are saying that has nothing to do with UDP packets on the interface. Confusing... If there really are no dropped packets then an underrun seems to imply that the radio thinks it is not getting data from the host in the required timely manner. If the radio is dropping packets then it seems that could be the explanation for the underrun condition, although we have no idea why the radio would be dropping UDP packets. At the time this underrun occurs, we find no evidence of an issue on the host side. a) There is nothing in the host Linux system log indicating that anything loggable occured at or around the time of the undderrun. b) The modified host application specifically monitors for the condition that it does not have data available at the time data is to be send to the radio. This condition never occurs. At 10:34 AM 4/25/2023, Marcus D. Leech wrote: >On 25/04/2023 12:30, James Schatzman wrote: >>I don't know if this is new information but we have observed some >>additional behaviors: >> >>1) The radio reports dropped UDP packets. Why is it dropping them? >How are you determining this?Â For streaming, the Linux in the Zynq is entirely out of the picture. > >> >>2) At least part of the time, the radio's fan kicks into high speed >>seemingingly at the same moment that the under-run occurs. I have no >>information about how it controls the fan so this is mysterious. >That is an interesting tidbit, but I honestly don't know what it means. > > >> >>Jim >> >> >>On Tue, Apr 25, 2023 at 12:20:45PM -0400, Marcus D. Leech wrote: >>>On 25/04/2023 10:34, Jim Schatzman wrote: >>>>We have been working with N310 and N321 radios. It seems very difficult to get long term continuous operation without under-runs even at the seemingly very low data rate of 12.5 Msps. >>>> >>>>Currently we are trying various firmware versions, changing buffer sizes, etc., but so far nothing has gotten the radios to work consistently for several hours without under-runs. One run might go for 3 hours without failure. The next attempt under-runs after 10 minutes. It is very erratic. Our next attempt will be to implement an input FIFO. >>>> >>>>The configurations have direct connections between a fast host (Ubuntu with unnecessary services including Network Manager disabled or removed), and the radio with 10 Gbit and frame size of 9000. Without using jumbo frames the behavior was far worse. >>>> >>>>Any other ideas?? >>>> >>>>Thanks- >>>>Jim >>>> >>>Something you could try in terms of isolating root cause is to use the >>>"benchmark_rate" example application, and >>> Â configure it using the --duration option for a very long run and use >>>--tx_rate to cause it to only do a TX test. >>> >>>_______________________________________________ >>>USRP-users mailing list -- usrp-users@lists.ettus.com >>>To unsubscribe send an email to usrp-users-leave@lists.ettus.com >_______________________________________________ USRP-users mailing list -- usrp-users@lists.ettus.com To unsubscribe send an email to usrp-users-leave@lists.ettus.com </x-flowed>

DR

David Raeman

Wed, Apr 26, 2023 12:08 AM

I've also struggled against stubborn TX underrun issues and have had some success using dedicated CPU cores to make large improvements. My configuration is quite different than yours, but perhaps this'll be a helpful lead.

I have a custom multithreaded application that manages four E320 radios from a single server using the UHD multi_usrp API. I stream continuously and simultaneously from all four radios in both directions at 10 Msps (each direction on a different radio channel). The host server is a Dell R340 with 12 cores, 64GB RAM, and a quad-port PCIe network card that has direct 1GbE links to the SFP+ port on each of the E320 radios. It runs Ubuntu Server 20.04 (no desktop environment). My application threads' loops that handle the streamers are pretty lean, and any file I/O is done to a ramdisk partition.

Initially I tried elevating my TX thread to RTPRIO and increased the number of UHD transport frames, but still couldn't reach an hour of runtime without an underrun. Although I had plenty of unused compute capacity, switching my TX thread to use an isolated core ended up making a big difference. I speculate that occasionally something was just holding my ready-state thread from getting on the CPU in time.

I'm using both the "isolcpus" and "nohz_full" kernel boot parameters. None of the default Ubuntu kernels ship with support for nohz_full (not even the lowlatency kernel), so I need to rebuild the kernel with CONFIG_NO_HZ_FULL enabled. The isolcpus parameter prevents the kernel from using the core for its normal userspace process scheduler. The nohz_full parameter further offloads most kernel work from being done on the core (principally system tick handling, thus the name, but it also keeps most other kernel load off the core).

In my application I assign my TX thread to this dedicated core using a call to "uhd::set_thread_affinity({core_number})". After this change, I was able to run all four radios continuously for a little over 9 hours until an underrun occurred. Using system logs, I was able to determine that underrun occurred precisely when an overnight Ubuntu cron job launched to update packages, which caused systemd to restart some things. I've since disabled that cron job and intend to do another long test soon, but I haven't had a chance to circle back to it recently.

I found this to be a useful primer: https://www.suse.com/c/cpu-isolation-introduction-part-1/

Hope this in some way helps,
David

-----Original Message-----
From: Jim Schatzman james.schatzman@futurelabusa.com
Sent: Tuesday, April 25, 2023 6:51 PM
To: Marcus D. Leech patchvonbraun@gmail.com
Cc: usrp-users@lists.ettus.com
Subject: [USRP-users] Re: configuring X410 USRP to work with higher
sampling frequency/band width

Re. "the Linux in the Zynq is out of the picture." That is interesting. Yes, my
comment was based on the radio's Linux OS reporting dropped UDP packets
for the 10 Gbit interface. It sounds like you are saying that has nothing to do
with UDP packets on the interface. Confusing...

If there really are no dropped packets then an underrun seems to imply that
the radio thinks it is not getting data from the host in the required timely
manner.

If the radio is dropping packets then it seems that could be the explanation
for the underrun condition, although we have no idea why the radio would
be dropping UDP packets.

At the time this underrun occurs, we find no evidence of an issue on the host
side.
a) There is nothing in the host Linux system log indicating that anything
loggable occured at or around the time of the undderrun.
b) The modified host application specifically monitors for the condition that it
does not have data available at the time data is to be send to the radio. This
condition never occurs.

At 10:34 AM 4/25/2023, Marcus D. Leech wrote:

On 25/04/2023 12:30, James Schatzman wrote:

I don't know if this is new information but we have observed some
additional behaviors:

The radio reports dropped UDP packets. Why is it dropping them?
How are you determining this?Â For streaming, the Linux in the Zynq is
entirely out of the picture.

At least part of the time, the radio's fan kicks into high speed
seemingingly at the same moment that the under-run occurs. I have no
information about how it controls the fan so this is mysterious.
That is an interesting tidbit, but I honestly don't know what it means.

Jim

On Tue, Apr 25, 2023 at 12:20:45PM -0400, Marcus D. Leech wrote:

On 25/04/2023 10:34, Jim Schatzman wrote:

We have been working with N310 and N321 radios. It seems very
difficult to get long term continuous operation without under-runs even at
the seemingly very low data rate of 12.5 Msps.

Currently we are trying various firmware versions, changing buffer
sizes, etc., but so far nothing has gotten the radios to work consistently for
several hours without under-runs. One run might go for 3 hours without
failure. The next attempt under-runs after 10 minutes. It is very erratic. Our
next attempt will be to implement an input FIFO.

The configurations have direct connections between a fast host
(Ubuntu with unnecessary services including Network Manager disabled or
removed), and the radio with 10 Gbit and frame size of 9000. Without using
jumbo frames the behavior was far worse.

Any other ideas??

Thanks-
Jim

Something you could try in terms of isolating root cause is to use
the "benchmark_rate" example application, and Â configure it using
the --duration option for a very long run and use --tx_rate to cause
it to only do a TX test.

USRP-users mailing list -- usrp-users@lists.ettus.com To unsubscribe
send an email to usrp-users-leave@lists.ettus.com
_______________________________________________ USRP-users
mailing list
-- usrp-users@lists.ettus.com To unsubscribe send an email to
usrp-users-leave@lists.ettus.com </x-flowed>

USRP-users mailing list -- usrp-users@lists.ettus.com To unsubscribe send an
email to usrp-users-leave@lists.ettus.com

I've also struggled against stubborn TX underrun issues and have had some success using dedicated CPU cores to make large improvements. My configuration is quite different than yours, but perhaps this'll be a helpful lead. I have a custom multithreaded application that manages four E320 radios from a single server using the UHD multi_usrp API. I stream continuously and simultaneously from all four radios in both directions at 10 Msps (each direction on a different radio channel). The host server is a Dell R340 with 12 cores, 64GB RAM, and a quad-port PCIe network card that has direct 1GbE links to the SFP+ port on each of the E320 radios. It runs Ubuntu Server 20.04 (no desktop environment). My application threads' loops that handle the streamers are pretty lean, and any file I/O is done to a ramdisk partition. Initially I tried elevating my TX thread to RTPRIO and increased the number of UHD transport frames, but still couldn't reach an hour of runtime without an underrun. Although I had plenty of unused compute capacity, switching my TX thread to use an isolated core ended up making a big difference. I speculate that occasionally something was just holding my ready-state thread from getting on the CPU in time. I'm using both the "isolcpus" and "nohz_full" kernel boot parameters. None of the default Ubuntu kernels ship with support for nohz_full (not even the lowlatency kernel), so I need to rebuild the kernel with CONFIG_NO_HZ_FULL enabled. The isolcpus parameter prevents the kernel from using the core for its normal userspace process scheduler. The nohz_full parameter further offloads most kernel work from being done on the core (principally system tick handling, thus the name, but it also keeps most other kernel load off the core). In my application I assign my TX thread to this dedicated core using a call to "uhd::set_thread_affinity({core_number})". After this change, I was able to run all four radios continuously for a little over 9 hours until an underrun occurred. Using system logs, I was able to determine that underrun occurred precisely when an overnight Ubuntu cron job launched to update packages, which caused systemd to restart some things. I've since disabled that cron job and intend to do another long test soon, but I haven't had a chance to circle back to it recently. I found this to be a useful primer: https://www.suse.com/c/cpu-isolation-introduction-part-1/ Hope this in some way helps, David > -----Original Message----- > From: Jim Schatzman <james.schatzman@futurelabusa.com> > Sent: Tuesday, April 25, 2023 6:51 PM > To: Marcus D. Leech <patchvonbraun@gmail.com> > Cc: usrp-users@lists.ettus.com > Subject: [USRP-users] Re: configuring X410 USRP to work with higher > sampling frequency/band width > > Re. "the Linux in the Zynq is out of the picture." That is interesting. Yes, my > comment was based on the radio's Linux OS reporting dropped UDP packets > for the 10 Gbit interface. It sounds like you are saying that has nothing to do > with UDP packets on the interface. Confusing... > > If there really are no dropped packets then an underrun seems to imply that > the radio thinks it is not getting data from the host in the required timely > manner. > > If the radio is dropping packets then it seems that could be the explanation > for the underrun condition, although we have no idea why the radio would > be dropping UDP packets. > > At the time this underrun occurs, we find no evidence of an issue on the host > side. > a) There is nothing in the host Linux system log indicating that anything > loggable occured at or around the time of the undderrun. > b) The modified host application specifically monitors for the condition that it > does not have data available at the time data is to be send to the radio. This > condition never occurs. > > > > > > At 10:34 AM 4/25/2023, Marcus D. Leech wrote: > >On 25/04/2023 12:30, James Schatzman wrote: > >>I don't know if this is new information but we have observed some > >>additional behaviors: > >> > >>1) The radio reports dropped UDP packets. Why is it dropping them? > >How are you determining this?Â For streaming, the Linux in the Zynq is > entirely out of the picture. > > > >> > >>2) At least part of the time, the radio's fan kicks into high speed > >>seemingingly at the same moment that the under-run occurs. I have no > >>information about how it controls the fan so this is mysterious. > >That is an interesting tidbit, but I honestly don't know what it means. > > > > > >> > >>Jim > >> > >> > >>On Tue, Apr 25, 2023 at 12:20:45PM -0400, Marcus D. Leech wrote: > >>>On 25/04/2023 10:34, Jim Schatzman wrote: > >>>>We have been working with N310 and N321 radios. It seems very > difficult to get long term continuous operation without under-runs even at > the seemingly very low data rate of 12.5 Msps. > >>>> > >>>>Currently we are trying various firmware versions, changing buffer > sizes, etc., but so far nothing has gotten the radios to work consistently for > several hours without under-runs. One run might go for 3 hours without > failure. The next attempt under-runs after 10 minutes. It is very erratic. Our > next attempt will be to implement an input FIFO. > >>>> > >>>>The configurations have direct connections between a fast host > (Ubuntu with unnecessary services including Network Manager disabled or > removed), and the radio with 10 Gbit and frame size of 9000. Without using > jumbo frames the behavior was far worse. > >>>> > >>>>Any other ideas?? > >>>> > >>>>Thanks- > >>>>Jim > >>>> > >>>Something you could try in terms of isolating root cause is to use > >>>the "benchmark_rate" example application, and Â configure it using > >>>the --duration option for a very long run and use --tx_rate to cause > >>>it to only do a TX test. > >>> > >>>_______________________________________________ > >>>USRP-users mailing list -- usrp-users@lists.ettus.com To unsubscribe > >>>send an email to usrp-users-leave@lists.ettus.com > >_______________________________________________ USRP-users > mailing list > >-- usrp-users@lists.ettus.com To unsubscribe send an email to > >usrp-users-leave@lists.ettus.com </x-flowed> > _______________________________________________ > USRP-users mailing list -- usrp-users@lists.ettus.com To unsubscribe send an > email to usrp-users-leave@lists.ettus.com

usrp-users@lists.ettus.com

Re: configuring X410 USRP to work with higher sampling frequency/band width