Empathy List Archives

PL

Peter Langer

Wed, Aug 26, 2020 3:09 PM

Hi,

on two different computers, that both have identical hardware specs we
encountered stability issues with the N310 receiver. Here are the hardware
specs for the computers, though we strongly suspect that the issue is with
the software(kernel/drivers) present on the N310.

The issue is that basically everything runs perfectly well until at some
point it does not anymore and the device runs into a kernel panic and
reboots. From our understanding of the linux kernel it must be inside some
interrupt, because the kernel panic is not written to any logfile (we made
logs permanent on the device to ensure that fact). It is sometimes issued
over SSH if we monitor with tail -f /var/log/messages (yes, we see it
there, but it is not written to disk). This only happens if the N310 runs
for a longer period of time - in our case between 1 to 3 days. We
encountered that issue several times now and we verified it with the
standard FPGA image.

Our problem seems to be related (though with another device) to the issue
that was mentioned in January in this thread:
http://ettus.80997.x6.nabble.com/USRP-users-Kernel-Panic-with-v3-15-0-0-on-E320-td14098.html

Aside from the snippet below there are no other messages printed to the
messages log file.

--Snippet from /var/log/messages--
Aug 25 07:44:31 ni-n3xx-31AFFD1 kern.alert kernel: [82689.450921] Unable to
handle kernel paging request at virtual address fffffffe
Aug 25 07:44:31 ni-n3xx-31AFFD1 kern.alert kernel: [82689.458127] pgd =
d3d33249
Aug 25 07:44:31 ni-n3xx-31AFFD1 kern.alert kernel: [82689.460785]
[fffffffe] *pgd=2fffd861, *pte=00000000, *ppte=00000000
Aug 25 07:44:31 ni-n3xx-31AFFD1 kern.emerg kernel: [82689.467121] Internal
error: Oops: 80000007 [#1] PREEMPT SMP ARM

--Specs--
Processor: AMD Ryzen Threadripper 3970X 32-Core Processor
RAM: 256 GB RAM
Network device: Intel X710-DA2 10GbE with SFP+ direct attach cables
Direct attach cables: Cisco H10GB-10GB-CU3M

The N310 device has the latest stable firmware/fpga image for UHD 3.15-LTS
as of today:
Mender: n3xx/meta-ettus-v3.15.0.0/n3xx_common_mender_default-v3.15.0.0.zip
FPGA: n3xx/fpga-9ba275de0b/n3xx_n310_fpga_default-g9ba275de.zip

we use the XG FPGA image present in the zipfile.

Our test flowgraph with the standard FPGA image only features two radio
blocks that stream at a master_clock_rate of 122.88 Mhz and are each
connected to a DDC that decimates by a factor of 2 (though a factor of 3
and 4 lead to the same issue) and then both connect to a null sink in
gnuradio.

We would appreciate anyone looking into reproducing that or any ideas how
to resolve the issue.

Kind regards,
Peter

Hi, on two different computers, that both have identical hardware specs we encountered stability issues with the N310 receiver. Here are the hardware specs for the computers, though we strongly suspect that the issue is with the software(kernel/drivers) present on the N310. The issue is that basically everything runs perfectly well until at some point it does not anymore and the device runs into a kernel panic and reboots. From our understanding of the linux kernel it must be inside some interrupt, because the kernel panic is not written to any logfile (we made logs permanent on the device to ensure that fact). It is _sometimes_ issued over SSH if we monitor with tail -f /var/log/messages (yes, we see it there, but it is not written to disk). This only happens if the N310 runs for a longer period of time - in our case between 1 to 3 days. We encountered that issue several times now and we verified it with the standard FPGA image. Our problem seems to be related (though with another device) to the issue that was mentioned in January in this thread: http://ettus.80997.x6.nabble.com/USRP-users-Kernel-Panic-with-v3-15-0-0-on-E320-td14098.html Aside from the snippet below there are no other messages printed to the messages log file. --Snippet from /var/log/messages-- Aug 25 07:44:31 ni-n3xx-31AFFD1 kern.alert kernel: [82689.450921] Unable to handle kernel paging request at virtual address fffffffe Aug 25 07:44:31 ni-n3xx-31AFFD1 kern.alert kernel: [82689.458127] pgd = d3d33249 Aug 25 07:44:31 ni-n3xx-31AFFD1 kern.alert kernel: [82689.460785] [fffffffe] *pgd=2fffd861, *pte=00000000, *ppte=00000000 Aug 25 07:44:31 ni-n3xx-31AFFD1 kern.emerg kernel: [82689.467121] Internal error: Oops: 80000007 [#1] PREEMPT SMP ARM --Specs-- Processor: AMD Ryzen Threadripper 3970X 32-Core Processor RAM: 256 GB RAM Network device: Intel X710-DA2 10GbE with SFP+ direct attach cables Direct attach cables: Cisco H10GB-10GB-CU3M The N310 device has the latest stable firmware/fpga image for UHD 3.15-LTS as of today: Mender: n3xx/meta-ettus-v3.15.0.0/n3xx_common_mender_default-v3.15.0.0.zip FPGA: n3xx/fpga-9ba275de0b/n3xx_n310_fpga_default-g9ba275de.zip we use the XG FPGA image present in the zipfile. Our test flowgraph with the standard FPGA image only features two radio blocks that stream at a master_clock_rate of 122.88 Mhz and are each connected to a DDC that decimates by a factor of 2 (though a factor of 3 and 4 lead to the same issue) and then both connect to a null sink in gnuradio. We would appreciate anyone looking into reproducing that or any ideas how to resolve the issue. Kind regards, Peter

PL

Peter Langer

Fri, Aug 28, 2020 11:30 AM

Hi,

I've been dealing with that issue for some time now but...
finally noticed that the uboot image seems to tell the N310 that it has 4
GB of RAM.
The output of free -m tells me that it does not have 4GB of RAM. It has 1
GB.

There were some people on the Xilinx forums that had a similar problem with
kernel panics because their uboot device tree configuration specified this:

memory@0{
device_type = "memory";
reg = <0x0 0x40000000>
}

So if i hexdump -C /proc/device-tree/memory/reg i get: 00 00 00 00 04 00 00
00

As i don't really have experience with configuring uboot, someone knows an
easy way to change that?

Kind regards,
Peter

Am Mi., 26. Aug. 2020 um 17:09 Uhr schrieb Peter Langer <
peter.langer41@googlemail.com>:

Hi,

on two different computers, that both have identical hardware specs we
encountered stability issues with the N310 receiver. Here are the hardware
specs for the computers, though we strongly suspect that the issue is with
the software(kernel/drivers) present on the N310.

The issue is that basically everything runs perfectly well until at some
point it does not anymore and the device runs into a kernel panic and
reboots. From our understanding of the linux kernel it must be inside some
interrupt, because the kernel panic is not written to any logfile (we made
logs permanent on the device to ensure that fact). It is sometimes issued
over SSH if we monitor with tail -f /var/log/messages (yes, we see it
there, but it is not written to disk). This only happens if the N310 runs
for a longer period of time - in our case between 1 to 3 days. We
encountered that issue several times now and we verified it with the
standard FPGA image.

Our problem seems to be related (though with another device) to the issue
that was mentioned in January in this thread:
http://ettus.80997.x6.nabble.com/USRP-users-Kernel-Panic-with-v3-15-0-0-on-E320-td14098.html

Aside from the snippet below there are no other messages printed to the
messages log file.

--Snippet from /var/log/messages--
Aug 25 07:44:31 ni-n3xx-31AFFD1 kern.alert kernel: [82689.450921] Unable
to handle kernel paging request at virtual address fffffffe
Aug 25 07:44:31 ni-n3xx-31AFFD1 kern.alert kernel: [82689.458127] pgd =
d3d33249
Aug 25 07:44:31 ni-n3xx-31AFFD1 kern.alert kernel: [82689.460785]
[fffffffe] *pgd=2fffd861, *pte=00000000, *ppte=00000000
Aug 25 07:44:31 ni-n3xx-31AFFD1 kern.emerg kernel: [82689.467121] Internal
error: Oops: 80000007 [#1] PREEMPT SMP ARM

--Specs--
Processor: AMD Ryzen Threadripper 3970X 32-Core Processor
RAM: 256 GB RAM
Network device: Intel X710-DA2 10GbE with SFP+ direct attach cables
Direct attach cables: Cisco H10GB-10GB-CU3M

The N310 device has the latest stable firmware/fpga image for UHD 3.15-LTS
as of today:
Mender: n3xx/meta-ettus-v3.15.0.0/n3xx_common_mender_default-v3.15.0.0.zip
FPGA: n3xx/fpga-9ba275de0b/n3xx_n310_fpga_default-g9ba275de.zip

we use the XG FPGA image present in the zipfile.

Our test flowgraph with the standard FPGA image only features two radio
blocks that stream at a master_clock_rate of 122.88 Mhz and are each
connected to a DDC that decimates by a factor of 2 (though a factor of 3
and 4 lead to the same issue) and then both connect to a null sink in
gnuradio.

We would appreciate anyone looking into reproducing that or any ideas how
to resolve the issue.

Kind regards,
Peter

Hi, I've been dealing with that issue for some time now but... finally noticed that the uboot image seems to tell the N310 that it has 4 GB of RAM. The output of free -m tells me that it does not have 4GB of RAM. It has 1 GB. There were some people on the Xilinx forums that had a similar problem with kernel panics because their uboot device tree configuration specified this: memory@0{ device_type = "memory"; reg = <0x0 0x40000000> } So if i hexdump -C /proc/device-tree/memory/reg i get: 00 00 00 00 04 00 00 00 As i don't really have experience with configuring uboot, someone knows an easy way to change that? Kind regards, Peter Am Mi., 26. Aug. 2020 um 17:09 Uhr schrieb Peter Langer < peter.langer41@googlemail.com>: > Hi, > > on two different computers, that both have identical hardware specs we > encountered stability issues with the N310 receiver. Here are the hardware > specs for the computers, though we strongly suspect that the issue is with > the software(kernel/drivers) present on the N310. > > The issue is that basically everything runs perfectly well until at some > point it does not anymore and the device runs into a kernel panic and > reboots. From our understanding of the linux kernel it must be inside some > interrupt, because the kernel panic is not written to any logfile (we made > logs permanent on the device to ensure that fact). It is _sometimes_ issued > over SSH if we monitor with tail -f /var/log/messages (yes, we see it > there, but it is not written to disk). This only happens if the N310 runs > for a longer period of time - in our case between 1 to 3 days. We > encountered that issue several times now and we verified it with the > standard FPGA image. > > Our problem seems to be related (though with another device) to the issue > that was mentioned in January in this thread: > http://ettus.80997.x6.nabble.com/USRP-users-Kernel-Panic-with-v3-15-0-0-on-E320-td14098.html > > Aside from the snippet below there are no other messages printed to the > messages log file. > > --Snippet from /var/log/messages-- > Aug 25 07:44:31 ni-n3xx-31AFFD1 kern.alert kernel: [82689.450921] Unable > to handle kernel paging request at virtual address fffffffe > Aug 25 07:44:31 ni-n3xx-31AFFD1 kern.alert kernel: [82689.458127] pgd = > d3d33249 > Aug 25 07:44:31 ni-n3xx-31AFFD1 kern.alert kernel: [82689.460785] > [fffffffe] *pgd=2fffd861, *pte=00000000, *ppte=00000000 > Aug 25 07:44:31 ni-n3xx-31AFFD1 kern.emerg kernel: [82689.467121] Internal > error: Oops: 80000007 [#1] PREEMPT SMP ARM > > --Specs-- > Processor: AMD Ryzen Threadripper 3970X 32-Core Processor > RAM: 256 GB RAM > Network device: Intel X710-DA2 10GbE with SFP+ direct attach cables > Direct attach cables: Cisco H10GB-10GB-CU3M > > The N310 device has the latest stable firmware/fpga image for UHD 3.15-LTS > as of today: > Mender: n3xx/meta-ettus-v3.15.0.0/n3xx_common_mender_default-v3.15.0.0.zip > FPGA: n3xx/fpga-9ba275de0b/n3xx_n310_fpga_default-g9ba275de.zip > > we use the XG FPGA image present in the zipfile. > > Our test flowgraph with the standard FPGA image only features two radio > blocks that stream at a master_clock_rate of 122.88 Mhz and are each > connected to a DDC that decimates by a factor of 2 (though a factor of 3 > and 4 lead to the same issue) and then both connect to a null sink in > gnuradio. > > We would appreciate anyone looking into reproducing that or any ideas how > to resolve the issue. > > Kind regards, > Peter >

PL

Peter Langer

Fri, Aug 28, 2020 1:19 PM

Am Fr., 28. Aug. 2020 um 13:30 Uhr schrieb Peter Langer <
peter.langer41@googlemail.com>:

There were some people on the Xilinx forums that had a similar problem

with kernel panics >>because their uboot device tree
configuration specified this:

memory@0{
device_type = "memory";
reg = <0x0 0x40000000>
}

Sorry that was a wrong clue, i thought reg meant it's measured in 4 byte
words.

I'm currently running a test with avahi-daemon disabled (1-day 21 hrs
running now).
Thumbs pressed.

Peter

Am Fr., 28. Aug. 2020 um 13:30 Uhr schrieb Peter Langer < peter.langer41@googlemail.com>: > > >>There were some people on the Xilinx forums that had a similar problem > with kernel panics >>because their uboot device tree > configuration specified this: > > >>memory@0{ > >> device_type = "memory"; > >> reg = <0x0 0x40000000> > >>} > Sorry that was a wrong clue, i thought reg meant it's measured in 4 byte words. I'm currently running a test with avahi-daemon disabled (1-day 21 hrs running now). Thumbs pressed. Peter

usrp-users@lists.ettus.com

Sporadic N310 kernel panics when under load