[USRP-users] libusb uses only one thread?

Андрій Хома anik123wb at gmail.com
Wed Oct 25 04:21:38 EDT 2017


I'll try it in other words, more structured:
there is x6 b205mini, 5 via usual usb 3.0, one via hub "pci-e to usb".
For example, there is a gnuradio flowgraph:
usrp source -> null sink
Despite the fact that libusb works in one thread, I can get 45MHz from each
device.
Now, if you take such a flowgraph:
usrp source -> file sink (FIFO) -> some processing
That I can get a maximum of 7MHz from each device.
"some processing" is a chain of handlers, some of them use GPUs, and some
use named FIFOs to transfer data.
Separately, each link manages to process data for a maximum of 10% of the
maximum allowable time to reach realtime.
Load CPU at 5 MHz is almost not observed.

2017-10-25 10:52 GMT+03:00 Андрій Хома <anik123wb at gmail.com>:

> Hello Marcus,
> sorry for the belated answer, I unsubscribed from the mailing list, and in
> the end did not receive your answers. I accidentally found them here, via
> google: http://ettus.80997.x6.nabble.com/USRP-users-Buffer-
> overflow-tips-td7475.html#a7541
> =)
>
> 270 MS/s is really *a lot* of data. You'd need a very capable computer
>> even when just handling that amount of data internally, but with
>> USB-connected devices, you also get a lot of interrupt handling. That will
>> put additional load on your CPU. I'm therefore actually very amazed by the
>> fact that the processor simply managing to deal with that! But: you must
>> make sure you're not only counting the time the program itself is running,
>> but also the time the CPU is stuck in kernel mode, handling the interrupts,
>> and the data transfers. Did you do that?
>>
> In truth, I do not know how to find these kernel interrupts. For example,
> I in the profiler see a lot of ioctl and vmxarea - do you think this is it? ioctl
> - this is working with the device, vmxarea - are inside the ioctl. But
> what is vmxarea?
>
> 205 MS/s (or more so, 270 MS/s) is *extremely much* for a storage system.
>> That is more than 16 Gb/s, or, in other words, more than three full SATAIII
>> connections running at maximum utilisation. You do have an impressive set
>> of SSDs, there, if you can sustain that write rate.
>>
> Fortunately, I have a completely different task). The original signal in
> realtime is processed using GPU magic and only useful information is
> conveniently placed in the file storage. In other words, what happens
> after AFTER receiving a signal works perfectly.
>
> How large are your buffers? I mean, with 4KB buffers, and 32b/S = 4B/S
>> (assuming you use 16bit I + 16bit Q) it follows that a single 4KB packet
>> can hold 1024 Samples, and that at 41 MS/s, that happens rougly every 1024
>> S / (41 MS/s) ~= 25 µs. For things to take "several milliseconds", your
>> buffers need to be Megabytes in size.
>>
> If you mean "recv_frame_size" then I'm playing with this value, for
> example, now I stopped at about 8000. If you mean the buffer that I pass to
> "recv" - then it is equal to sample_rate, that is, one buffer holds exactly
> one second.
>
> What does help is that the OS buffers named pipes as well as the file
>> system in RAM. If overflows happen roughly after your free RAM would have
>> been eaten up by the 205 MS/s · 4B/S = 820 MB/s, then your storage isn't
>> actually up to the task of writing data as the USRPs are at producing it,
>> and buffering by the OS simply saves you for "as long as the bucket does
>> not overflow".
>>
> Well, in general, I probably already answered this question above. On the
> other side, FIFO expects a process that does not write data to disk, but
> processes it, and does it very quickly.
>
> At the expense of libusb itself - I do not read well in English, but after
> reading a bit about libusb itself and digging into uhd I saw that uhd uses
> asynchronous API, but it catches in one thread. This is explained by the
> developers themselves in the comment to the code:
> https://github.com/EttusResearch/uhd/blob/master/
> host/lib/transport/libusb1_base.cpp#L65
> Now it's clear why only one thread. In truth, developers have a reason to
> mention this point somewhere in the documentation, or even better: the
> approximate requirements for the processor. At least just because this
> behavior (one thread) is not obvious, that's why I had to spend so much
> time, because it is purely logical that each device will be allocated a
> separate threads, which means that if one device works well, the others
> too, in no matter how many of them, if the number of CPU cores allows. I
> have a total of 40, so I was confused and completely misunderstood.
>
> Also, you can look into using num_recv_frames in your device arguments
>> to "tune" the use of the USB subsystem.
>>
> Yes, I'm playing with the parameters "num_recv_frames" and
> "recv_frame_size". But unfortunately, I select them "by the method of
> blind poke." I saw that the increase in "recv_frame_size" a little helps,
> but if you specify the maximum value - then the overflow occurs even
> earlier. Also with "num_recv_frames" - it directly depends on
> "recv_frame_size", but I did not catch any logic in their behavior.
>
> The good news: I bought an expansion board "STLab PCIe to USB 3.0
> (U-720)", connected one of usrp to it and now I can use 6 devices with
> sample_rate 45MHz! (previously it was only 34 MHz). It turns out, the
> south bridge could not cope with such a data stream (?), And having
> transferred some of the load to the north bridge I still got a good result. Well,
> it's like a theory.
>
>
> But the problem remains in some form: 45MHz x6 is obtained only if the
> data is not processed afterwards (/dev/null), but if I raise the handlers,
> I get a maximum of 7MHz x6, and the CPU is almost not loaded. The basic
> calculations are performed on the GPU (Fourier transform), and all this is
> done very quickly. Each stage of processing is less than 10% of the
> maximum allowable time for realtime. Although the calculations are fast,
> but there is a lot of data, maybe there is not enough RAM speed or
> something like this?
> What do you think, what could be the problem? And how can I try to
> determine it?
> Naturally, I am ready to provide any necessary data.
>
> Once again I apologize for the belated answer,
> Andrei.
>
> 2017-10-23 10:39 GMT+03:00 Андрій Хома <anik123wb at gmail.com>:
>
>> As an addition: I have a USB controller: Intel Corporation C610 / X99
>> series chipset USB xHCI Host Controller (rev 05)
>>
>> 2017-10-23 0:09 GMT+03:00 Андрій Хома <anik123wb at gmail.com>:
>>
>>> Hello,
>>> I have:
>>> 6 usrp b502mini
>>> The motherboard Z10PE-D16 WS (can it matter in the chipset?)
>>> Intel xeon e5-2430v4 processor
>>> Memory DDR4 1866 (128 GB)
>>>
>>> As a result I get overflow ("O") when using 6 usrp at once.
>>> I'm not proficient in profiling, but I saw that only one thread is
>>> created for libusb, maybe this is the bottleneck.
>>>
>>> Explanation of attached picture # 1:
>>> 1 Create / initialize devices
>>> 2 I create two threads for each device, they alternate in the picture:
>>>     one on "recv" (starting from the very first). It can be seen that it
>>> is quite resource intensive, basically time is spent on
>>> convert_sc12_item32, but it's clear that the CPU core copes less
>>>     the second to write to named fifo (those that are more intermittent,
>>> starting with the second). Also is not resource-intensive.
>>> 3 I understand that in this thread there lives libusb, and it's only
>>> one, for 6 devices (see picture number 2)
>>>
>>> Also, I was playing with usrp x310, and I work quietly with 400Msps (via
>>> dual 10G ethernet), ie convert_sc12_item32 is fully capable of processing
>>> 400Msps on one 2.2GHz core, so the bottleneck is the aforementioned single
>>> libusb stream.
>>>
>>> Did I make the right conclusions?
>>> If so, then I need to know exactly what CPU power I need, without
>>> guessing on the coffee grounds.
>>> Are there any benchmarks, hardware requirements, so that I can still use
>>> these devices?
>>> [image: Встроенное изображение 1]
>>>
>>> [image: Встроенное изображение 2]
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ettus.com/pipermail/usrp-users_lists.ettus.com/attachments/20171025/11f8d30a/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ???????????.png
Type: image/png
Size: 118371 bytes
Desc: not available
URL: <http://lists.ettus.com/pipermail/usrp-users_lists.ettus.com/attachments/20171025/11f8d30a/attachment.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ???????????.png
Type: image/png
Size: 353761 bytes
Desc: not available
URL: <http://lists.ettus.com/pipermail/usrp-users_lists.ettus.com/attachments/20171025/11f8d30a/attachment-0001.png>


More information about the USRP-users mailing list