[USRP-users] In search of 200 MSA/sec (Windows 7)

hossein talaiee h.talaiee at gmail.com
Tue Mar 15 04:17:52 EDT 2016


There is a simple test that can show where is the problem!

1) You create a sample buffer of like 2000 bytes filled with zeros!
2) Create a loop of just sending this buffer without actually refilling the
buffer! This will measure your transport capability so you could see if
there is any underflow there!

run the code and see what happens

a sample code is like this:
// ... initialize uhd and usrp, set sample rate and freq .... std::vector<
samp_type> signal(samps_per_buff); uhd::tx_metadata_t md; md.start_of_burst
= false; md.end_of_burst = false; while(true) { tx_streamer->send(&signal.
front(), samps_per_buff, md); }








On Tue, Mar 15, 2016 at 12:50 AM, Neel Pandeya via USRP-users <
usrp-users at lists.ettus.com> wrote:

> Hello Tilla:
>
> You did just about everything that I might suggest, and your system
> certainly sounds powerful enough to handle 200 Msps. The Intel X710 is the
> latest 10 GbE card from Intel, and should perform well and be able to
> handle this data rate. There are only a few other things that I might
> mention. Have you installed the latest Intel X710 driver? Would you be able
> to upgrade Windows? Several customers have reported that they were able to
> achieve improved performance and throughput by upgrading from Windows 7 to
> Windows 8 or 10. I'm sure you're already using SSD disks, and I'm not sure
> if you're being limited by disk I/O, but perhaps you could a RAID setup or
> a RAM disk? Have you turned off all power management, and disabled ACPI in
> the BIOS? And speaking of the BIOS, some customers have also reported that
> a BIOS upgrade improved throughput, although your system looks like it's
> new, so the BIOS firmware version should be very recent. I agree with
> Marcus that the FastSendDatagramThreshold registry setting might be
> helpful. I'm not sure if it behaves differently between Windows 7, 8, 10,
> and Server 2012. It's the only registry setting tweak that I'm aware of.
>
> Please let us know if you're able to make any further progress, and
> whether you see improved results with Windows Server 2012.
>
> --Neel
>
>
>
> On 12 March 2016 at 10:43, Marcus Müller <usrp-users at lists.ettus.com>
> wrote:
>
>> Hi Tilla,
>>
>> Agreeing, this needs further investigation.
>>
>> I have done all the basic NIC tuning that is frequently discussed here:
>> jumbo packets, disable interrupt coalescing, increase buffer sizes...
>>
>> I have done a huge amount of other tuning: disable numa, pcie performance
>> mode, process affinitized to same cpu NIC is directly connected to,
>> hyperthreading disabled, hand optimized compilation, and a boatload of
>> other stuff.
>>
>> In my (Linux) experience, disabling the automatisms of the OS is most of
>> the time not beneficial to performance; in fact, I remember a case where
>> CPU affinity setting made it hard for the kernel to optimally schedule
>> kernel drivers and userland, so that significant increase in performance
>> was observed after stopping to set affinity. Of course, trying to set
>> affinity still is a very good idea – after all, you have knowledge of the
>> application that your OS lacks. I'd definitely leave hyperthreading on;
>> it's very rarely the reason for slowdown in programs that seem to be
>> IO-bound. You should definitely **enable** interrupt coalescing; no way a
>> system is going to keep up if every single packet of a 200MS/s transmission
>> causes a hardware interrupt (with 8000B per packet, that'd be 100,000
>> interrupts per second...).
>>
>> The one setting that I don't find in your list is
>> ​​
>> the Fast Datagram Threshold setting [1]; did you do that?
>>
>> What I take away from your execution time percentages is that the most
>> time is spent in the symbol that actually spent in the managed send buffers
>> (see the boost::function1<> line, 38%); looking more closely into that
>> would be interesting; can you do that?
>> Also, agreeing, there's not that that one can do about the converter;
>> don't be fooled by the fact that it also does byte conversions; it moves
>> the data out of the network card buffer into the recv()-buffer, and usually
>> you hit a memory bandwidth wall there, if you optimize the numerical
>> operation enough.
>>
>>
>> Best regards,
>> Marcus
>>
>> [1]
>> http://files.ettus.com/manual/page_transport.html#transport_udp_windows
>> ;
>> https://raw.githubusercontent.com/EttusResearch/uhd/master/host/utils/FastSendDatagramThreshold.reg
>>
>>
>> On 09.03.2016 17:00, tilla--- via USRP-users wrote:
>>
>>
>> Over the past 2 weeks, I have been working towards achieving 200 MSA/sec
>> on win7 64 bit, UHD 3.9.2, X310 w/WBX 120 daughtercard.
>>
>> I have a pretty good processor, 3.5 GHz Xeon E5-2637 v2, plenty of
>> memory, 710-DA2 NIC in a x8 Gen 3 slot with verified Gen3 trained speed, a
>> pretty beefy box in general.
>>
>> I have gathered a bunch of data and was looking for some further thoughts.
>>
>> I have done all the basic NIC tuning that is frequently discussed here:
>> jumbo packets, disable interrupt coalescing, increase buffer sizes...
>>
>> I have done a huge amount of other tuning: disable numa, pcie performance
>> mode, process affinitized to same cpu NIC is directly connected to,
>> hyperthreading disabled, hand optimized compilation, and a boatload of
>> other stuff.
>>
>> This is a simple prototype of a transmit only application, 1 thread, 1
>> transmit buffer, just send the same buffer as fast as possible.  Transmit
>> loop is as simple as possible (bottom of page 2 in attachment).
>>
>> Attached is some performance data.
>>
>> 50 MSA/sec, very good performance, occasional underflow, max cpu on a
>> core ~35% (Figure 1 screenshots).
>>
>> 100 MSA/sec, decent performance, more underflows but still "reasonable",
>> max CPU on a core ~70% (Figure 2 screenshots).
>>
>> Handwave observable based upon above numbers: performance is linear, when
>> sampling rate doubles, max CPU utilization doubles, perfectly expectable...
>>
>> Soooooo, now when going to 200 MSA/sec, constant underflows, not much
>> transmission at all.
>>
>> Extrapolating based upon 100 MSA/sec numbers I would need ~140% of a CPU
>> :(  (queue Charlie Brown music)
>>
>> If it is the byteswapping that is the true bottleneck, I am not
>> sure there is really anything I can do as it is already SSE.  Unless do
>> something like AVX or AVX2...
>>
>> On the recent thread titled "Throughput of PCIe Interface" we had some
>> performance discussions.  I don't have the email available right now, but
>> someone claimed 200 MSA/sec working.  I guess I am curious as to the setup.
>>
>> I would not think that byteswapping performance would vary much across
>> operating systems...
>>
>> So I guess I am looking for any suggestions or thought anyone would have
>> related to getting to 200 MSA, short of changing operating systems (for
>> now).
>>
>> I am planning to evaluate Windows Server 2012 Standard in the near future
>> if I cannot get this in some form of working, but would like to exhaust all
>> options before investing that much time.
>>
>> Sorry for the long winded email.
>>
>> Thanks,
>>
>>
>>
>> _______________________________________________
>> USRP-users mailing listUSRP-users at lists.ettus.comhttp://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com
>>
>>
>>
>> _______________________________________________
>> USRP-users mailing list
>> USRP-users at lists.ettus.com
>> http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com
>>
>>
>
> _______________________________________________
> USRP-users mailing list
> USRP-users at lists.ettus.com
> http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ettus.com/pipermail/usrp-users_lists.ettus.com/attachments/20160315/5ba60a6c/attachment-0002.html>


More information about the USRP-users mailing list