[USRP-users] In search of 200 MSA/sec (Windows 7)

Marcus Müller marcus.mueller at ettus.com
Sat Mar 12 13:43:54 EST 2016


Hi Tilla,

Agreeing, this needs further investigation.
> I have done all the basic NIC tuning that is frequently discussed
> here: jumbo packets, disable interrupt coalescing, increase buffer
> sizes...
> I have done a huge amount of other tuning: disable numa, pcie
> performance mode, process affinitized to same cpu NIC is directly
> connected to, hyperthreading disabled, hand optimized compilation, and
> a boatload of other stuff.
In my (Linux) experience, disabling the automatisms of the OS is most of
the time not beneficial to performance; in fact, I remember a case where
CPU affinity setting made it hard for the kernel to optimally schedule
kernel drivers and userland, so that significant increase in performance
was observed after stopping to set affinity. Of course, trying to set
affinity still is a very good idea – after all, you have knowledge of
the application that your OS lacks. I'd definitely leave hyperthreading
on; it's very rarely the reason for slowdown in programs that seem to be
IO-bound. You should definitely **enable** interrupt coalescing; no way
a system is going to keep up if every single packet of a 200MS/s
transmission causes a hardware interrupt (with 8000B per packet, that'd
be 100,000 interrupts per second...).

The one setting that I don't find in your list is the Fast Datagram
Threshold setting [1]; did you do that?

What I take away from your execution time percentages is that the most
time is spent in the symbol that actually spent in the managed send
buffers (see the boost::function1<> line, 38%); looking more closely
into that would be interesting; can you do that?
Also, agreeing, there's not that that one can do about the converter;
don't be fooled by the fact that it also does byte conversions; it moves
the data out of the network card buffer into the recv()-buffer, and
usually you hit a memory bandwidth wall there, if you optimize the
numerical operation enough.


Best regards,
Marcus

[1]
http://files.ettus.com/manual/page_transport.html#transport_udp_windows
;
https://raw.githubusercontent.com/EttusResearch/uhd/master/host/utils/FastSendDatagramThreshold.reg


On 09.03.2016 17:00, tilla--- via USRP-users wrote:
>
> Over the past 2 weeks, I have been working towards achieving 200
> MSA/sec on win7 64 bit, UHD 3.9.2, X310 w/WBX 120 daughtercard.
>  
> I have a pretty good processor, 3.5 GHz Xeon E5-2637 v2, plenty of
> memory, 710-DA2 NIC in a x8 Gen 3 slot with verified Gen3 trained
> speed, a pretty beefy box in general.
>  
> I have gathered a bunch of data and was looking for some further thoughts.
>  
> I have done all the basic NIC tuning that is frequently discussed
> here: jumbo packets, disable interrupt coalescing, increase buffer
> sizes...
>  
> I have done a huge amount of other tuning: disable numa, pcie
> performance mode, process affinitized to same cpu NIC is directly
> connected to, hyperthreading disabled, hand optimized compilation, and
> a boatload of other stuff.
>  
> This is a simple prototype of a transmit only application, 1 thread, 1
> transmit buffer, just send the same buffer as fast as possible. 
> Transmit loop is as simple as possible (bottom of page 2 in attachment).
>  
> Attached is some performance data.
>  
> 50 MSA/sec, very good performance, occasional underflow, max cpu on a
> core ~35% (Figure 1 screenshots).
>  
> 100 MSA/sec, decent performance, more underflows but still
> "reasonable", max CPU on a core ~70% (Figure 2 screenshots).
>  
> Handwave observable based upon above numbers: performance is linear,
> when sampling rate doubles, max CPU utilization doubles, perfectly
> expectable...
>  
> Soooooo, now when going to 200 MSA/sec, constant underflows, not much
> transmission at all.
>  
> Extrapolating based upon 100 MSA/sec numbers I would need ~140% of a
> CPU :(  (queue Charlie Brown music)
>  
> If it is the byteswapping that is the true bottleneck, I am not
> sure there is really anything I can do as it is already SSE.  Unless
> do something like AVX or AVX2...
>  
> On the recent thread titled "Throughput of PCIe Interface" we had some
> performance discussions.  I don't have the email available right now,
> but someone claimed 200 MSA/sec working.  I guess I am curious as to
> the setup.
>  
> I would not think that byteswapping performance would vary much across
> operating systems...
>  
> So I guess I am looking for any suggestions or thought anyone would
> have related to getting to 200 MSA, short of changing operating
> systems (for now).
>  
> I am planning to evaluate Windows Server 2012 Standard in the near
> future if I cannot get this in some form of working, but would like to
> exhaust all options before investing that much time.
>  
> Sorry for the long winded email.
>  
> Thanks,
>  
>
>
> _______________________________________________
> USRP-users mailing list
> USRP-users at lists.ettus.com
> http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ettus.com/pipermail/usrp-users_lists.ettus.com/attachments/20160312/4caea625/attachment-0002.html>


More information about the USRP-users mailing list