[USRP-users] [EXTERNAL] Re: Recording the full X3x0 bandwidth
markjan at xs4all.nl
Tue Mar 12 04:10:02 EDT 2019
On Mon, Mar 11, 2019 at 08:40:49PM +0000, Minutolo, Lorenzo (389I) wrote:
> Hi All,
> We're using the USRP x3x0 for cosmology and many other applications: we
> use them to read out our cryogenics detectors.
> Do you need the full spectral bandwidth of the device? Things get much
> easier if you decimate the signal before storing it to disk. The system
> we realized uses a GPU to decimate USRPs data streams before saving them to disk.
While this looks like a workaround, you will lose information when decimating
a signal to reduce its datarate, at least half of it, so you would need
more X300/X310's and have them to phase-coherently record their inputs to disk.
If there is some throtteling over time introduced when excersising a continuous
disk IO bandwidth (via thermal events on PCIe controllers/bridges or otherwise),
but the networking to GPU bandwidth path is fine, then perhaps try to store data
from DRAM away to it via the GPU's PCIe controller.
The seperate GPU is likely connected to the PCIe interconnect directly attached to
the CPU, not indirectly via DMI to the chipset, which also can present PCIe
interconnects originating from the same PCIe root complex.
In the past there have been videocards that host a 16x16x16 PCIe bridge (ie. IDT,
or broadcom) to accomodate two 16-lane PCIe GPUs, each with their own
There also have been videocards that embedded a direct path storage
functionality to maximaze GPU-NANDflash-based IO rates, although I havn't seen ones which use
a lot of SATA or other spinning media compatible storage.
Using a storage controller directly on the CPU-attached PCIe slot, and introduce your
own PCIe bridge that does nothing than just monitor the available bandwidth over time
might be another try, using a riser card that features a IDT/broadcom PCIe bridge.
National instruments, the current parent company of Ettus, also has PXI modules dedicated to
RF recording. Perhaps this is also something to explore.
> Check it out (https://arxiv.org/abs/1812.02200) , it should be opensource soon.
> [1812.02200] A flexible GPU-accelerated radio-frequency readout for superconducting detectors<https://arxiv.org/abs/1812.02200>
> We have developed a flexible radio-frequency readout system suitable for a variety of superconducting detectors commonly used
> in millimeter and submillimeter astrophysics, including Kinetic Inductance detectors (KIDs), Thermal KID bolometers (TKIDs),
> and Quantum Capacitance Detectors (QCDs). Our system avoids custom FPGA-based readouts and instead uses commercially available
> software radio hardware for ADC/DAC and a GPU to handle real time signal processing. Because this system is written in common
> C++/CUDA, the range of different algorithms that can be quickly implemented make it suitable for the readout of many
> others cryogenic detectors and for the testing of different and possibly more effective data acquisition schemes.
> From: USRP-users <usrp-users-bounces at lists.ettus.com> on behalf of Joe Martin via USRP-users <usrp-users at lists.ettus.com>
> Sent: Saturday, March 9, 2019 10:23:28 AM
> To: Mark-Jan Bastian
> Cc: usrp-users at lists.ettus.com
> Subject: [EXTERNAL] Re: [USRP-users] Recording the full X3x0 bandwidth
> Thank you Mark-Jan for the additional information. I?ll study it and compare with my system. Much appreciated.
> Best regards,
> > On Mar 9, 2019, at 11:19 AM, Mark-Jan Bastian <markjan at xs4all.nl> wrote:
> > Hi Joe,
> > With sudo lspci -vvv you will see the capabilties, including the low-level
> > PCIe bus speed and link count negotiation of the devices. The sudo is needed
> > to get the low-level LnkCap and LnkCtl bits:
> > For a 16-lane videocard on a laptop here, likely soldered right on the motherboard:
> > The PCIe capabilities: 8 GT/sec, max 16 lanes:
> > LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <1us, L1 <4us
> > ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
> > What the speed ended up to be 8GT/sec, 16 lanes:
> > LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
> > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> > LnkSta: Speed 8GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> > DevCap2: Completion Timeout: Range AB, TimeoutDis+, LTR+, OBFF Via message
> > DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled
> > Below is a variant of the LnkCtl record, providing even more information on even the equalisation of the
> > SERDES link that is used by this PCIe device: (equalisation is the analog RF signal processing to overcome
> > losses while routing the signal over the motherboard and the connectors):
> > LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
> > Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> > Compliance De-emphasis: -6dB
> > LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
> > EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
> > Above could be regarded as a 'depth first' approach for those who would like to stay purely software-oriented.
> > I like to treat a PC in such scenario as an embedded system, first get the powersupplies, clocks and other
> > hardware right, design the clockdomains and datarates, then gradually move up the software/control/kernel/driver
> > stack to verify for anomalies that could trigger such intermittent problems.
> > Mark-Jan
> > On Sat, Mar 09, 2019 at 10:40:39AM -0700, Joe Martin wrote:
> >> Hi Mark,
> >> I am intrigued by your response and have obtained a tree view for my system as you suggested to Paul. I???m unfamiliar with the tree view and don???t understand how to check the number of PCIe lanes that are available to the disk controller and disks and how to check how many PCIe bridges are in between on my motherboard configuration.
> >> I have a screenshot of the tree view showing my 10G ethernet connection (but it is 220KB in size so I didn???t attach it here) but I am not familiar with how to determine what you asked about from the tree and what to do about the configuration in any case. Is the configuration fixed and not changeable, in any case?
> >> If so, then perhaps your alternative suggestion regarding booting from a USB stick into a ramdisk is a viable route? I???m unfortunately not familiar with the details of how to do that so perhaps a couple of brief comments about implementing that process would help me understand better if that???s the only viable alternative to pursue given the present hardware configuration?
> >> Joe
> >>> On Mar 9, 2019, at 5:14 AM, Mark-Jan Bastian via USRP-users <usrp-users at lists.ettus.com> wrote:
> >>> Hi Paul,
> >>> I can record from the X310 to disk to nvme x4 PCIe at 800 MB/sec
> >>> for a few minutes. There is still a risk of O 's appearing.
> >>> First thing to check is the number of PCIe lanes available to the disk
> >>> controller and disks, and how many and which PCIe bridges are in between
> >>> on your motherboard configuration. Try to avoid other traffic over these
> >>> PCIe bridges. lspci -vt for a tree view.
> >>> Then one can do benchmarking from DRAM to disk. Perhaps you will not need
> >>> a filesystem for your very simple storage purpose.
> >>> You can ultimately just boot from some other media (USB stick or CD-ROM
> >>> loaded into a ramdisk) just to make sure there is absolute no need to
> >>> read-access any other data on said disks, via cached pages or otherwise.
> >>> Hickups by system management mode or other unexpected driver interrupt sources
> >>> should be minimized. Other networking code and chatter might need be reduced,
> >>> just as SMM related thermal management events in the BIOS.
> >>> First tune everthing for maximum performance, then optimize for very constant
> >>> write performance.
> >>> Mark-Jan
> >>> On Sat, Mar 09, 2019 at 12:32:05PM +0100, Paul Boven via USRP-users wrote:
> >>>> Hi,
> >>>> I'm trying to record the full X310 bandwidth, for a few hours, without any
> >>>> missed samples. Which of course is a bit of a challenge - does anyone here
> >>>> already achieve this?
> >>>> We're using a TwinRX, so initially I wanted to record 2x 100MS/s (from both
> >>>> channels), which amounts to 800MB/s, 6.4Gb/s. At first I tried uhd_rx_cfile,
> >>>> but have been unable to get it to a good state without showing an 'O' every
> >>>> few seconds at these speeds.
> >>>> As a recorder I have a SuperMicro 847 chassis with 36 disks (Seagate
> >>>> Ironwolf 8TB T8000VN0022, 7200rpm). In this particular server, the disks are
> >>>> connected through an 'expander' backplane, from a single HBA (LSI 3008). CPU
> >>>> is dual Xeon 4110, 2.1 GHz, 64 GB of ram.
> >>>> At first I tried a 6 disk pool (raidz1), and eventually ended up creating a
> >>>> huge 36 disk ZFS stripe, which in theory should have no trouble with the
> >>>> throughput, but certainly kept dropping packets.
> >>>> Note that recording to /dev/shm/file works perfectly without dropping
> >>>> packets, until the point that the memory is full.
> >>>> Given that ZFS has quite a bit of (good) overhead to safeguard your data, I
> >>>> then switched to creating a mdadm raid-0 with 18 of the disks (Why not 36? I
> >>>> was really running out of time!)
> >>>> At that point I also found 'specrec' from gr-analyze, which seems more
> >>>> suitable. But, even after enlarging its circular buffer to the largest
> >>>> supported values, it would only average a write speed of about 300MB/s.
> >>>> In the end I had to settle for recording at only 50MS/s (200MB/s) from only
> >>>> a single channel, a far cry from the 2x 6.4Gb/s I'm ultimately looking to
> >>>> record. Although I did get more than an hour of perfect data out of it, over
> >>>> time the circular buffer did get fuller in bursts, and within 2 hours it
> >>>> exited due to exhausting the buffers. Restarting the application made it
> >>>> work like fresh again, with the same gradual decline
> >>>> in performance.
> >>>> Specrec, even when tweaking its settings, doesn't really take advantage of
> >>>> the large amount of memory in the server. As a next step, I'm thinking of
> >>>> adapting specrec to use much larger buffers, so that writes are at least in
> >>>> the range of MB to tens of MB. From earlier experiences, it is also
> >>>> important to flush your data to disk often, so the interruptions due to this
> >>>> are more frequent, but short enough to not cause receive buffers to
> >>>> overflow.
> >>>> In terms of network tuning, all recording was done with MTU 9000, with wmem
> >>>> and rmem at the recommended values. All recordings were done as interleaved
> >>>> shorts.
> >>>> Does anyone have hints or experiences to share?
> >>>> Regards, Paul Boven.
> >>>> _______________________________________________
> >>>> USRP-users mailing list
> >>>> USRP-users at lists.ettus.com
> >>>> http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com
> >>> _______________________________________________
> >>> USRP-users mailing list
> >>> USRP-users at lists.ettus.com
> >>> http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com
> USRP-users mailing list
> USRP-users at lists.ettus.com
More information about the USRP-users