usrp-users@lists.ettus.com

Discussion and technical support related to USRP, UHD, RFNoC

View all threads

rx_samples_to_file issue

PW
Peter Witkowski
Fri, Oct 3, 2014 1:26 PM

To say that the issue is just because the disk subsystem can't keep up is a
bit of cop-out.

I had issues writing to disk when the incoming stream was 400MB/s and my
RAID0 system was benchmarked at being much higher than that.

The issue that I've been seeing stems from the fact that it appears that
you cannot concurrently read/write from the data stream as its coming in.
In effect you have a main loop that reads from the device and then
immediately tries to write that buffer to file.  If you do not complete
these operations in a timely fashion overflows occur.

One way to solve (or at least band aid the issue) is to set your
dirty_background_ratio to 0.  I was able to get writing to disk working
somewhat with this setting as it is more predictable to directly write to
disk instead of having your write cache fill up and then having a large
amount of data to push to disk.  That said, my RAID0 array is capable of
such speeds and even then I was getting a few (but much reduced) overflows.

The one surefire way I know of getting this working (even on a slow disk
system) is to buffer the data.  The buffer can then be consumed by the disk
writing process while being concurrently added onto by the device reader.
The easiest way to test buffering (that I've found) is to simply set up a
GNURadio Companion program with a stream-to-vector block between the USRP
and file sink blocks.  This is exactly what I am doing currently since even
with a very powerful system, I could not get data saved to disk quickly
enough given the aforementioned issues with the provided UHD software.

On Thu, Oct 2, 2014 at 11:48 PM, gsmandvoip via USRP-users <
usrp-users@lists.ettus.com> wrote:

Thanks Marcus for your replies. Yes O gone away.

On Thu, Oct 2, 2014 at 5:50 PM, Marcus D. Leech mleech@ripnet.com wrote:

with  rx_samples_to_file without _4rx.rbf, Initially I tried on my i3,
4GB ram, it gave me
some OOOO but was lesser than earlier, but I do not understand, my most
of the ram capacity and processor was sitting idle while it shows OOOO, why
is this strange behaviour

The default format for uhd_rx_cfile is complex-float, thus doubling the
amount of data written compared to rx_samples_to_file.

You can't just use CPU usage as an indicator of loading--if you're
writing to disk, the disk subsystem may be much slower than you think, so
the
"rate limiting step" is writes to the disk, not computational elements.

Try using /dev/null as the file that you write to.  If the 'O' go away,
even at higher sampling rates, then it's your disk subsystem.

using uhd_rx_cfile getting similar result, but strangely, why it is
low, at 4M sampling rate it was higher???

On Thu, Oct 2, 2014 at 9:27 AM, Marcus D. Leech mleech@ripnet.com
wrote:

On 10/01/2014 11:46 PM, gsmandvoip wrote:

Yes I am running single channel, but when trying to achieve my desired
sampling rate without _4rx.rbf, it says, requested sampling rate is not
valid, adjusting to some 3.9M or so.
sorry for misleading info I gave earlier, I have i3, with 32 bit and i7
with 64 bit, but getting same result on both machines

Here is my command to capture signal:

./rx_samples_to_file --args="fpga=usrp1_fpga_4rx.rbf, subdev=DBSRX"
--freq "$FC" --rate="$SR" $FILE --nsamps "$NSAMPLES"

and here is its output:

Creating the usrp device with: fpga=usrp1_fpga_4rx.rbf, subdev=DBSRX...
-- Loading firmware image: /usr/share/uhd/images/usrp1_fw.ihx... done
-- Opening a USRP1 device...
-- Loading FPGA image: /usr/share/uhd/images/usrp1_fpga_4rx.rbf... done
-- Using FPGA clock rate of 52.000000MHz...
Error: LookupError: IndexError: multi_usrp::get_tx_subdev_spec(0)
failed to make default spec - ValueError: The subdevice specification "A:0"
is too long.

The user specified 1 channels, but there are only 0 tx dsps on mboard 0.

Don't use the _4rx image if you don't need it.

The USRP1 only does strict-integer resampling, and with a master clock
(NON STANDARD FOR USRP1) of 52.000MHz, 4Msps is not a sample rate
that it can produce.  Try 5.2Msps or 4.3333Msps.

At 5.2Msps, it's recording at roughly 20.8Mbytes/second, so your system
needs to be able to sustain that for at least as long as the capture lasts.

--
Marcus Leech
Principal Investigator
Shirleys Bay Radio Astronomy Consortiumhttp://www.sbrac.org

--
Peter Witkowski
pwitkowski@gmail.com

To say that the issue is just because the disk subsystem can't keep up is a bit of cop-out. I had issues writing to disk when the incoming stream was 400MB/s and my RAID0 system was benchmarked at being much higher than that. The issue that I've been seeing stems from the fact that it appears that you cannot concurrently read/write from the data stream as its coming in. In effect you have a main loop that reads from the device and then immediately tries to write that buffer to file. If you do not complete these operations in a timely fashion overflows occur. One way to solve (or at least band aid the issue) is to set your dirty_background_ratio to 0. I was able to get writing to disk working somewhat with this setting as it is more predictable to directly write to disk instead of having your write cache fill up and then having a large amount of data to push to disk. That said, my RAID0 array is capable of such speeds and even then I was getting a few (but much reduced) overflows. The one surefire way I know of getting this working (even on a slow disk system) is to buffer the data. The buffer can then be consumed by the disk writing process while being concurrently added onto by the device reader. The easiest way to test buffering (that I've found) is to simply set up a GNURadio Companion program with a stream-to-vector block between the USRP and file sink blocks. This is exactly what I am doing currently since even with a very powerful system, I could not get data saved to disk quickly enough given the aforementioned issues with the provided UHD software. On Thu, Oct 2, 2014 at 11:48 PM, gsmandvoip via USRP-users < usrp-users@lists.ettus.com> wrote: > Thanks Marcus for your replies. Yes O gone away. > > On Thu, Oct 2, 2014 at 5:50 PM, Marcus D. Leech <mleech@ripnet.com> wrote: > >> with rx_samples_to_file without _4rx.rbf, Initially I tried on my i3, >> 4GB ram, it gave me >> some OOOO but was lesser than earlier, but I do not understand, my most >> of the ram capacity and processor was sitting idle while it shows OOOO, why >> is this strange behaviour >> >> The default format for uhd_rx_cfile is complex-float, thus doubling the >> amount of data written compared to rx_samples_to_file. >> >> You can't just use CPU usage as an indicator of loading--if you're >> writing to disk, the disk subsystem may be much slower than you think, so >> the >> "rate limiting step" is writes to the disk, not computational elements. >> >> Try using /dev/null as the file that you write to. If the 'O' go away, >> even at higher sampling rates, then it's your disk subsystem. >> >> >> using uhd_rx_cfile getting similar result, but strangely, why it is >> low, at 4M sampling rate it was higher??? >> >> >> On Thu, Oct 2, 2014 at 9:27 AM, Marcus D. Leech <mleech@ripnet.com> >> wrote: >> >>> On 10/01/2014 11:46 PM, gsmandvoip wrote: >>> >>> Yes I am running single channel, but when trying to achieve my desired >>> sampling rate without _4rx.rbf, it says, requested sampling rate is not >>> valid, adjusting to some 3.9M or so. >>> sorry for misleading info I gave earlier, I have i3, with 32 bit and i7 >>> with 64 bit, but getting same result on both machines >>> >>> Here is my command to capture signal: >>> >>> ./rx_samples_to_file --args="fpga=usrp1_fpga_4rx.rbf, subdev=DBSRX" >>> --freq "$FC" --rate="$SR" $FILE --nsamps "$NSAMPLES" >>> >>> and here is its output: >>> >>> Creating the usrp device with: fpga=usrp1_fpga_4rx.rbf, subdev=DBSRX... >>> -- Loading firmware image: /usr/share/uhd/images/usrp1_fw.ihx... done >>> -- Opening a USRP1 device... >>> -- Loading FPGA image: /usr/share/uhd/images/usrp1_fpga_4rx.rbf... done >>> -- Using FPGA clock rate of 52.000000MHz... >>> *Error: LookupError: IndexError: multi_usrp::get_tx_subdev_spec(0) >>> failed to make default spec - ValueError: The subdevice specification "A:0" >>> is too long.* >>> The user specified 1 channels, but there are only 0 tx dsps on mboard 0. >>> >>> >>> Don't use the _4rx image if you don't need it. >>> >>> The USRP1 only does strict-integer resampling, and with a master clock >>> (NON STANDARD FOR USRP1) of 52.000MHz, 4Msps is not a sample rate >>> that it can produce. Try 5.2Msps or 4.3333Msps. >>> >>> At 5.2Msps, it's recording at roughly 20.8Mbytes/second, so your system >>> needs to be able to sustain that for at least as long as the capture lasts. >>> >>> >>> >> >> >> -- >> Marcus Leech >> Principal Investigator >> Shirleys Bay Radio Astronomy Consortiumhttp://www.sbrac.org >> >> > > _______________________________________________ > USRP-users mailing list > USRP-users@lists.ettus.com > http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com > > -- Peter Witkowski pwitkowski@gmail.com
M
mleech@ripnet.com
Fri, Oct 3, 2014 1:34 PM

One has to keep firmly in mind that programs like rx_samples_to_file are
examples that show how to use

the underlying UHD API. They are not necessarily optimized for all
situations, and indeed, one could

restructure rx_samples_to_file to decouple UHD I/O from filesystem I/O,
using a large buffer between them.

The fact is that dynamic performance of high-speed, real-time, flows is
something that almost-invariably needs

tweaking for any particular situation. There's no way for an example
application to meet all those requirements.

But the fact also remains that for some systems, rx_samples_to_file
(and uhd_rx_cfile on the Gnu Radio side)

are able to stream high-speed data just fine as-is.

On 2014-10-03 09:26, Peter Witkowski via USRP-users wrote:

To say that the issue is just because the disk subsystem can't keep up is a bit of cop-out.

I had issues writing to disk when the incoming stream was 400MB/s and my RAID0 system was benchmarked at being much higher than that.

The issue that I've been seeing stems from the fact that it appears that you cannot concurrently read/write from the data stream as its coming in. In effect you have a main loop that reads from the device and then immediately tries to write that buffer to file. If you do not complete these operations in a timely fashion overflows occur.

One way to solve (or at least band aid the issue) is to set your dirty_background_ratio to 0. I was able to get writing to disk working somewhat with this setting as it is more predictable to directly write to disk instead of having your write cache fill up and then having a large amount of data to push to disk. That said, my RAID0 array is capable of such speeds and even then I was getting a few (but much reduced) overflows.

The one surefire way I know of getting this working (even on a slow disk system) is to buffer the data. The buffer can then be consumed by the disk writing process while being concurrently added onto by the device reader. The easiest way to test buffering (that I've found) is to simply set up a GNURadio Companion program with a stream-to-vector block between the USRP and file sink blocks. This is exactly what I am doing currently since even with a very powerful system, I could not get data saved to disk quickly enough given the aforementioned issues with the provided UHD software.

On Thu, Oct 2, 2014 at 11:48 PM, gsmandvoip via USRP-users usrp-users@lists.ettus.com wrote:

Thanks Marcus for your replies. Yes O gone away.

On Thu, Oct 2, 2014 at 5:50 PM, Marcus D. Leech mleech@ripnet.com wrote:

with rx_samples_to_file without _4rx.rbf, Initially I tried on my i3, 4GB ram, it gave me
some OOOO but was lesser than earlier, but I do not understand, my most of the ram capacity and processor was sitting idle while it shows OOOO, why is this strange behaviour The default format for uhd_rx_cfile is complex-float, thus doubling the amount of data written compared to rx_samples_to_file.

You can't just use CPU usage as an indicator of loading--if you're writing to disk, the disk subsystem may be much slower than you think, so the
"rate limiting step" is writes to the disk, not computational elements.

Try using /dev/null as the file that you write to. If the 'O' go away, even at higher sampling rates, then it's your disk subsystem.

using uhd_rx_cfile getting similar result, but strangely, why it is low, at 4M sampling rate it was higher???

On Thu, Oct 2, 2014 at 9:27 AM, Marcus D. Leech mleech@ripnet.com wrote:

On 10/01/2014 11:46 PM, gsmandvoip wrote:

Yes I am running single channel, but when trying to achieve my desired sampling rate without _4rx.rbf, it says, requested sampling rate is not valid, adjusting to some 3.9M or so. sorry for misleading info I gave earlier, I have i3, with 32 bit and i7 with 64 bit, but getting same result on both machines

Here is my command to capture signal:

./rx_samples_to_file --args="fpga=usrp1_fpga_4rx.rbf, subdev=DBSRX" --freq "$FC" --rate="$SR" $FILE --nsamps "$NSAMPLES"

and here is its output:

Creating the usrp device with: fpga=usrp1_fpga_4rx.rbf, subdev=DBSRX...
-- Loading firmware image: /usr/share/uhd/images/usrp1_fw.ihx... done
-- Opening a USRP1 device...
-- Loading FPGA image: /usr/share/uhd/images/usrp1_fpga_4rx.rbf... done
-- Using FPGA clock rate of 52.000000MHz...
ERROR: LOOKUPERROR: INDEXERROR: MULTI_USRP::GET_TX_SUBDEV_SPEC(0) FAILED TO MAKE DEFAULT SPEC - VALUEERROR: THE SUBDEVICE SPECIFICATION "A:0" IS TOO LONG.
The user specified 1 channels, but there are only 0 tx dsps on mboard 0.

Don't use the _4rx image if you don't need it.

The USRP1 only does strict-integer resampling, and with a master clock (NON STANDARD FOR USRP1) of 52.000MHz, 4Msps is not a sample rate
that it can produce. Try 5.2Msps or 4.3333Msps.

At 5.2Msps, it's recording at roughly 20.8Mbytes/second, so your system needs to be able to sustain that for at least as long as the capture lasts.

One has to keep firmly in mind that programs like rx_samples_to_file are *examples* that show how to use the underlying UHD API. They are not necessarily optimized for all situations, and indeed, one could restructure rx_samples_to_file to decouple UHD I/O from filesystem I/O, using a large buffer between them. The fact is that dynamic performance of high-speed, real-time, flows is something that almost-invariably needs tweaking for any particular situation. There's no way for an example application to meet all those requirements. But the fact also remains that for *some* systems, rx_samples_to_file (and uhd_rx_cfile on the Gnu Radio side) are able to stream high-speed data just fine as-is. On 2014-10-03 09:26, Peter Witkowski via USRP-users wrote: > To say that the issue is just because the disk subsystem can't keep up is a bit of cop-out. > > I had issues writing to disk when the incoming stream was 400MB/s and my RAID0 system was benchmarked at being much higher than that. > > The issue that I've been seeing stems from the fact that it appears that you cannot concurrently read/write from the data stream as its coming in. In effect you have a main loop that reads from the device and then immediately tries to write that buffer to file. If you do not complete these operations in a timely fashion overflows occur. > > One way to solve (or at least band aid the issue) is to set your dirty_background_ratio to 0. I was able to get writing to disk working somewhat with this setting as it is more predictable to directly write to disk instead of having your write cache fill up and then having a large amount of data to push to disk. That said, my RAID0 array is capable of such speeds and even then I was getting a few (but much reduced) overflows. > > The one surefire way I know of getting this working (even on a slow disk system) is to buffer the data. The buffer can then be consumed by the disk writing process while being concurrently added onto by the device reader. The easiest way to test buffering (that I've found) is to simply set up a GNURadio Companion program with a stream-to-vector block between the USRP and file sink blocks. This is exactly what I am doing currently since even with a very powerful system, I could not get data saved to disk quickly enough given the aforementioned issues with the provided UHD software. > > On Thu, Oct 2, 2014 at 11:48 PM, gsmandvoip via USRP-users <usrp-users@lists.ettus.com> wrote: > > Thanks Marcus for your replies. Yes O gone away. > > On Thu, Oct 2, 2014 at 5:50 PM, Marcus D. Leech <mleech@ripnet.com> wrote: > > with rx_samples_to_file without _4rx.rbf, Initially I tried on my i3, 4GB ram, it gave me > some OOOO but was lesser than earlier, but I do not understand, my most of the ram capacity and processor was sitting idle while it shows OOOO, why is this strange behaviour The default format for uhd_rx_cfile is complex-float, thus doubling the amount of data written compared to rx_samples_to_file. > > You can't just use CPU usage as an indicator of loading--if you're writing to disk, the disk subsystem may be much slower than you think, so the > "rate limiting step" is writes to the disk, not computational elements. > > Try using /dev/null as the file that you write to. If the 'O' go away, even at higher sampling rates, then it's your disk subsystem. > > using uhd_rx_cfile getting similar result, but strangely, why it is low, at 4M sampling rate it was higher??? > > On Thu, Oct 2, 2014 at 9:27 AM, Marcus D. Leech <mleech@ripnet.com> wrote: > > On 10/01/2014 11:46 PM, gsmandvoip wrote: > > Yes I am running single channel, but when trying to achieve my desired sampling rate without _4rx.rbf, it says, requested sampling rate is not valid, adjusting to some 3.9M or so. sorry for misleading info I gave earlier, I have i3, with 32 bit and i7 with 64 bit, but getting same result on both machines > > Here is my command to capture signal: > > ./rx_samples_to_file --args="fpga=usrp1_fpga_4rx.rbf, subdev=DBSRX" --freq "$FC" --rate="$SR" $FILE --nsamps "$NSAMPLES" > > and here is its output: > > Creating the usrp device with: fpga=usrp1_fpga_4rx.rbf, subdev=DBSRX... > -- Loading firmware image: /usr/share/uhd/images/usrp1_fw.ihx... done > -- Opening a USRP1 device... > -- Loading FPGA image: /usr/share/uhd/images/usrp1_fpga_4rx.rbf... done > -- Using FPGA clock rate of 52.000000MHz... > ERROR: LOOKUPERROR: INDEXERROR: MULTI_USRP::GET_TX_SUBDEV_SPEC(0) FAILED TO MAKE DEFAULT SPEC - VALUEERROR: THE SUBDEVICE SPECIFICATION "A:0" IS TOO LONG. > The user specified 1 channels, but there are only 0 tx dsps on mboard 0. > > Don't use the _4rx image if you don't need it. > > The USRP1 only does strict-integer resampling, and with a master clock (NON STANDARD FOR USRP1) of 52.000MHz, 4Msps is not a sample rate > that it can produce. Try 5.2Msps or 4.3333Msps. > > At 5.2Msps, it's recording at roughly 20.8Mbytes/second, so your system needs to be able to sustain that for at least as long as the capture lasts. -- Marcus Leech Principal Investigator Shirleys Bay Radio Astronomy Consortium http://www.sbrac.org [1] _______________________________________________ USRP-users mailing list USRP-users@lists.ettus.com http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com [2] -- Peter Witkowski pwitkowski@gmail.com _______________________________________________ USRP-users mailing list USRP-users@lists.ettus.com http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com [2] Links: ------ [1] http://www.sbrac.org [2] http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com
MM
Marcus Müller
Fri, Oct 3, 2014 2:55 PM

I have to agree with Marcus on this. Also, keep in mind that storage is
really what an operating system should take care of in any "general
purpose" scenario, ie. that as long as I just write to a file, I'd
expect that the thing in charge of storage (my kernel / the filesystems
/ block device drivers) does the best it can to keep up. If I find
myself in a situation where my specific storage needs dictate a huge
write buffer, changing the application might be one way, but as I'm
responsible for my won storage subsystem, I could just as well increase
the cache buffer sizes, and let the operating system handle storage
operation. If your RAID is really performing as well as it is
benchmarked to, then this should not be one of your problems. All
rx_samples_to_file does is really sequentially writing out data at a
constant rate, which is the most basic write benchmark I can think of.

If your storage subsystem (filesystem + storage abstraction + raid
driver + interface driver + hard drive interface + hard drives +
hardware caches) can't keep up, it's failing to perform as specified,
simple as that. In this case, saying that the application needs to be
smarter when dealing with storage seems like a bit of a cop-out to me ;)

I'd like to point out that most benchmarks use heavily averaged numbers
for write speeds etc. UHD on the other hand kind of demands soft
real-time performance of a write subsystem, which is a lot harder to
fulfill. This comes up rather frequently, but I have to stress it: you
need a fast guaranteed write rate, not only an average one, and as soon
as your operating system has to postpone writing data[1], it has to have
enough performance to catch up whilst still meeting continued demand.
This is general purpose hardware running general purpose OS with dozens
of processes, and you can't just say "every single component is up to my
task, thus my system suffices", because everything potentially blocks
everything!

Greetings,
Marcus

[1] e.g. because the filesystems needs to calculate checksums, update
tables, another process gets scheduled, a device blocks your PCIe bus,
your platters randomly need a bit longer to seek, you reach the physical
end of an LVM volume and have to move across a disk, an interrupt does
what an interrupt does, some process is getting noticed on a changing
file descriptor, DBUS is happening in the kernel, token ring has run out
of tokens, thermal throttling, bitflips on SATA leading to
retransmission, some page getting fetched from swap...

On 03.10.2014 15:34, Marcus D. Leech via USRP-users wrote:

One has to keep firmly in mind that programs like rx_samples_to_file are
examples that show how to use

the underlying UHD API. They are not necessarily optimized for all
situations, and indeed, one could

restructure rx_samples_to_file to decouple UHD I/O from filesystem I/O,
using a large buffer between them.

The fact is that dynamic performance of high-speed, real-time, flows is
something that almost-invariably needs

tweaking for any particular situation. There's no way for an example
application to meet all those requirements.

But the fact also remains that for some systems, rx_samples_to_file
(and uhd_rx_cfile on the Gnu Radio side)

are able to stream high-speed data just fine as-is.

On 2014-10-03 09:26, Peter Witkowski via USRP-users wrote:

To say that the issue is just because the disk subsystem can't keep up is a bit of cop-out.

I had issues writing to disk when the incoming stream was 400MB/s and my RAID0 system was benchmarked at being much higher than that.

The issue that I've been seeing stems from the fact that it appears that you cannot concurrently read/write from the data stream as its coming in. In effect you have a main loop that reads from the device and then immediately tries to write that buffer to file. If you do not complete these operations in a timely fashion overflows occur.

One way to solve (or at least band aid the issue) is to set your dirty_background_ratio to 0. I was able to get writing to disk working somewhat with this setting as it is more predictable to directly write to disk instead of having your write cache fill up and then having a large amount of data to push to disk. That said, my RAID0 array is capable of such speeds and even then I was getting a few (but much reduced) overflows.

The one surefire way I know of getting this working (even on a slow disk system) is to buffer the data. The buffer can then be consumed by the disk writing process while being concurrently added onto by the device reader. The easiest way to test buffering (that I've found) is to simply set up a GNURadio Companion program with a stream-to-vector block between the USRP and file sink blocks. This is exactly what I am doing currently since even with a very powerful system, I could not get data saved to disk quickly enough given the aforementioned issues with the provided UHD software.

On Thu, Oct 2, 2014 at 11:48 PM, gsmandvoip via USRP-users usrp-users@lists.ettus.com wrote:

Thanks Marcus for your replies. Yes O gone away.

On Thu, Oct 2, 2014 at 5:50 PM, Marcus D. Leech mleech@ripnet.com wrote:

with rx_samples_to_file without _4rx.rbf, Initially I tried on my i3, 4GB ram, it gave me
some OOOO but was lesser than earlier, but I do not understand, my most of the ram capacity and processor was sitting idle while it shows OOOO, why is this strange behaviour The default format for uhd_rx_cfile is complex-float, thus doubling the amount of data written compared to rx_samples_to_file.

You can't just use CPU usage as an indicator of loading--if you're writing to disk, the disk subsystem may be much slower than you think, so the
"rate limiting step" is writes to the disk, not computational elements.

Try using /dev/null as the file that you write to. If the 'O' go away, even at higher sampling rates, then it's your disk subsystem.

using uhd_rx_cfile getting similar result, but strangely, why it is low, at 4M sampling rate it was higher???

On Thu, Oct 2, 2014 at 9:27 AM, Marcus D. Leech mleech@ripnet.com wrote:

On 10/01/2014 11:46 PM, gsmandvoip wrote:

Yes I am running single channel, but when trying to achieve my desired sampling rate without _4rx.rbf, it says, requested sampling rate is not valid, adjusting to some 3.9M or so. sorry for misleading info I gave earlier, I have i3, with 32 bit and i7 with 64 bit, but getting same result on both machines

Here is my command to capture signal:

./rx_samples_to_file --args="fpga=usrp1_fpga_4rx.rbf, subdev=DBSRX" --freq "$FC" --rate="$SR" $FILE --nsamps "$NSAMPLES"

and here is its output:

Creating the usrp device with: fpga=usrp1_fpga_4rx.rbf, subdev=DBSRX...
-- Loading firmware image: /usr/share/uhd/images/usrp1_fw.ihx... done
-- Opening a USRP1 device...
-- Loading FPGA image: /usr/share/uhd/images/usrp1_fpga_4rx.rbf... done
-- Using FPGA clock rate of 52.000000MHz...
ERROR: LOOKUPERROR: INDEXERROR: MULTI_USRP::GET_TX_SUBDEV_SPEC(0) FAILED TO MAKE DEFAULT SPEC - VALUEERROR: THE SUBDEVICE SPECIFICATION "A:0" IS TOO LONG.
The user specified 1 channels, but there are only 0 tx dsps on mboard 0.

Don't use the _4rx image if you don't need it.

The USRP1 only does strict-integer resampling, and with a master clock (NON STANDARD FOR USRP1) of 52.000MHz, 4Msps is not a sample rate
that it can produce. Try 5.2Msps or 4.3333Msps.

At 5.2Msps, it's recording at roughly 20.8Mbytes/second, so your system needs to be able to sustain that for at least as long as the capture lasts.

I have to agree with Marcus on this. Also, keep in mind that storage is really what an operating system should take care of in any "general purpose" scenario, ie. that as long as I just write to a file, I'd expect that the thing in charge of storage (my kernel / the filesystems / block device drivers) does the best it can to keep up. If I find myself in a situation where my specific storage needs dictate a huge write buffer, changing the application might be one way, but as I'm responsible for my won storage subsystem, I could just as well increase the cache buffer sizes, and let the operating system handle storage operation. If your RAID is really performing as well as it is benchmarked to, then this should not be one of your problems. All rx_samples_to_file does is really sequentially writing out data at a constant rate, which is the most basic write benchmark I can think of. If your storage subsystem (filesystem + storage abstraction + raid driver + interface driver + hard drive interface + hard drives + hardware caches) can't keep up, it's failing to perform as specified, simple as that. In this case, saying that the application needs to be smarter when dealing with storage seems like a bit of a cop-out to me ;) I'd like to point out that most benchmarks use heavily averaged numbers for write speeds etc. UHD on the other hand kind of demands soft real-time performance of a write subsystem, which is a lot harder to fulfill. This comes up rather frequently, but I have to stress it: you need a fast guaranteed write rate, not only an average one, and as soon as your operating system has to postpone writing data[1], it has to have enough performance to catch up whilst still meeting continued demand. This is general purpose hardware running general purpose OS with dozens of processes, and you can't just say "every single component is up to my task, thus my system suffices", because everything potentially blocks everything! Greetings, Marcus [1] e.g. because the filesystems needs to calculate checksums, update tables, another process gets scheduled, a device blocks your PCIe bus, your platters randomly need a bit longer to seek, you reach the physical end of an LVM volume and have to move across a disk, an interrupt does what an interrupt does, some process is getting noticed on a changing file descriptor, DBUS is happening in the kernel, token ring has run out of tokens, thermal throttling, bitflips on SATA leading to retransmission, some page getting fetched from swap... On 03.10.2014 15:34, Marcus D. Leech via USRP-users wrote: > > > One has to keep firmly in mind that programs like rx_samples_to_file are > *examples* that show how to use > > the underlying UHD API. They are not necessarily optimized for all > situations, and indeed, one could > > restructure rx_samples_to_file to decouple UHD I/O from filesystem I/O, > using a large buffer between them. > > The fact is that dynamic performance of high-speed, real-time, flows is > something that almost-invariably needs > > tweaking for any particular situation. There's no way for an example > application to meet all those requirements. > > But the fact also remains that for *some* systems, rx_samples_to_file > (and uhd_rx_cfile on the Gnu Radio side) > > are able to stream high-speed data just fine as-is. > > On 2014-10-03 09:26, Peter Witkowski via USRP-users wrote: > >> To say that the issue is just because the disk subsystem can't keep up is a bit of cop-out. >> >> I had issues writing to disk when the incoming stream was 400MB/s and my RAID0 system was benchmarked at being much higher than that. >> >> The issue that I've been seeing stems from the fact that it appears that you cannot concurrently read/write from the data stream as its coming in. In effect you have a main loop that reads from the device and then immediately tries to write that buffer to file. If you do not complete these operations in a timely fashion overflows occur. >> >> One way to solve (or at least band aid the issue) is to set your dirty_background_ratio to 0. I was able to get writing to disk working somewhat with this setting as it is more predictable to directly write to disk instead of having your write cache fill up and then having a large amount of data to push to disk. That said, my RAID0 array is capable of such speeds and even then I was getting a few (but much reduced) overflows. >> >> The one surefire way I know of getting this working (even on a slow disk system) is to buffer the data. The buffer can then be consumed by the disk writing process while being concurrently added onto by the device reader. The easiest way to test buffering (that I've found) is to simply set up a GNURadio Companion program with a stream-to-vector block between the USRP and file sink blocks. This is exactly what I am doing currently since even with a very powerful system, I could not get data saved to disk quickly enough given the aforementioned issues with the provided UHD software. >> >> On Thu, Oct 2, 2014 at 11:48 PM, gsmandvoip via USRP-users <usrp-users@lists.ettus.com> wrote: >> >> Thanks Marcus for your replies. Yes O gone away. >> >> On Thu, Oct 2, 2014 at 5:50 PM, Marcus D. Leech <mleech@ripnet.com> wrote: >> >> with rx_samples_to_file without _4rx.rbf, Initially I tried on my i3, 4GB ram, it gave me >> some OOOO but was lesser than earlier, but I do not understand, my most of the ram capacity and processor was sitting idle while it shows OOOO, why is this strange behaviour The default format for uhd_rx_cfile is complex-float, thus doubling the amount of data written compared to rx_samples_to_file. >> >> You can't just use CPU usage as an indicator of loading--if you're writing to disk, the disk subsystem may be much slower than you think, so the >> "rate limiting step" is writes to the disk, not computational elements. >> >> Try using /dev/null as the file that you write to. If the 'O' go away, even at higher sampling rates, then it's your disk subsystem. >> >> using uhd_rx_cfile getting similar result, but strangely, why it is low, at 4M sampling rate it was higher??? >> >> On Thu, Oct 2, 2014 at 9:27 AM, Marcus D. Leech <mleech@ripnet.com> wrote: >> >> On 10/01/2014 11:46 PM, gsmandvoip wrote: >> >> Yes I am running single channel, but when trying to achieve my desired sampling rate without _4rx.rbf, it says, requested sampling rate is not valid, adjusting to some 3.9M or so. sorry for misleading info I gave earlier, I have i3, with 32 bit and i7 with 64 bit, but getting same result on both machines >> >> Here is my command to capture signal: >> >> ./rx_samples_to_file --args="fpga=usrp1_fpga_4rx.rbf, subdev=DBSRX" --freq "$FC" --rate="$SR" $FILE --nsamps "$NSAMPLES" >> >> and here is its output: >> >> Creating the usrp device with: fpga=usrp1_fpga_4rx.rbf, subdev=DBSRX... >> -- Loading firmware image: /usr/share/uhd/images/usrp1_fw.ihx... done >> -- Opening a USRP1 device... >> -- Loading FPGA image: /usr/share/uhd/images/usrp1_fpga_4rx.rbf... done >> -- Using FPGA clock rate of 52.000000MHz... >> ERROR: LOOKUPERROR: INDEXERROR: MULTI_USRP::GET_TX_SUBDEV_SPEC(0) FAILED TO MAKE DEFAULT SPEC - VALUEERROR: THE SUBDEVICE SPECIFICATION "A:0" IS TOO LONG. >> The user specified 1 channels, but there are only 0 tx dsps on mboard 0. >> >> Don't use the _4rx image if you don't need it. >> >> The USRP1 only does strict-integer resampling, and with a master clock (NON STANDARD FOR USRP1) of 52.000MHz, 4Msps is not a sample rate >> that it can produce. Try 5.2Msps or 4.3333Msps. >> >> At 5.2Msps, it's recording at roughly 20.8Mbytes/second, so your system needs to be able to sustain that for at least as long as the capture lasts. > > > _______________________________________________ > USRP-users mailing list > USRP-users@lists.ettus.com > http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com
PW
Peter Witkowski
Fri, Oct 3, 2014 3:44 PM

So I'm confused.

You state that if I can't use rx_samples_to_file, my system is failing to
perform as specified to write data out, then you give an example of several
things that can happen to create a stochastic write speed (which I totally
understand and agree with).  Given that writes can be stochastic, why is
there not a software buffer implemented in the UHD sample code to account
for such issues?  I understand that it's meant to be an example, but I've
also seen it referenced as being used effectively as a debugger or test for
people having issues (i.e. recommendation to use the UHD programs in place
of GNURadio to resolve issues).

Also, in terms of benchmarking, I'm quoting minimum values, not averages.
I agree with you that average values are pointless, and in reality the disk
subsystem needs to perform when called up.  My minimum values for a 4 disk
RAID0 with a dedicated controller are well within the data rate that I am
pushing.

Is there an example system that can handle sustained data capture from the
USRP at (or near the limits) of 10GigE or the PCIe interfaces (maybe the
requirement is enterprise class PCIe SSDs)?  I'm running a two socket Xenon
system (two hex core processors) with 64GB of RAM.  How much more hardware
should I throw at the problem to be able to sample/write at 100MS (half of
what is quoted on the website for bandwidth for the 10GigE kit) using the
provided code?

I think the issue here is that the code itself can't simply get through
it's main loop fast enough.  There's a difference between data bandwidth
and CPU throughput.  The sequential nature of the code means that if any
weird stuff happens (your example was a good set of kernel related hilarity
that can lead to stochastic timing) you will have overflows since you
cannot read fast enough.  This is why a 90% solution for my application was
to just set the dirty_background_ratio to 0 and also why redirection to
/dev/null makes overflows go away.  With either method I didn't have to
wait for a large write cache to flush before moving on to the next read
from the USRP.  Note that there can also be things that happen on the read
side as well.  Does this mean that I can only run the code on an RTOS?

As a final note, my understanding is that GNURadio and the USRP were
developed for domain experts in DSP to use.  These users may or may not
have prior experience in software.  As a result, I'd recommend perhaps
adding a buffered example or have the USRP GNURadio block allow for
buffering.  Otherwise, I just don't see how you can advertise 200 MS/s
(maybe even a simple "buffer" block in GNURadio would do the trick?).  I
understand that this is theoretical limit of the bus, but if there doesn't
exist a driver or other software to make use of this, the practical limit
becomes much, much smaller.

On Fri, Oct 3, 2014 at 10:55 AM, Marcus Müller usrp-users@lists.ettus.com
wrote:

I have to agree with Marcus on this. Also, keep in mind that storage is
really what an operating system should take care of in any "general
purpose" scenario, ie. that as long as I just write to a file, I'd expect
that the thing in charge of storage (my kernel / the filesystems / block
device drivers) does the best it can to keep up. If I find myself in a
situation where my specific storage needs dictate a huge write buffer,
changing the application might be one way, but as I'm responsible for my
won storage subsystem, I could just as well increase the cache buffer
sizes, and let the operating system handle storage operation. If your RAID
is really performing as well as it is benchmarked to, then this should not
be one of your problems. All rx_samples_to_file does is really sequentially
writing out data at a constant rate, which is the most basic write
benchmark I can think of.

If your storage subsystem (filesystem + storage abstraction + raid driver

  • interface driver + hard drive interface + hard drives + hardware caches)
    can't keep up, it's failing to perform as specified, simple as that. In
    this case, saying that the application needs to be smarter when dealing
    with storage seems like a bit of a cop-out to me ;)

I'd like to point out that most benchmarks use heavily averaged numbers
for write speeds etc. UHD on the other hand kind of demands soft real-time
performance of a write subsystem, which is a lot harder to fulfill. This
comes up rather frequently, but I have to stress it: you need a fast
guaranteed write rate, not only an average one, and as soon as your
operating system has to postpone writing data[1], it has to have enough
performance to catch up whilst still meeting continued demand. This is
general purpose hardware running general purpose OS with dozens of
processes, and you can't just say "every single component is up to my task,
thus my system suffices", because everything potentially blocks everything!

Greetings,
Marcus

[1] e.g. because the filesystems needs to calculate checksums, update
tables, another process gets scheduled, a device blocks your PCIe bus, your
platters randomly need a bit longer to seek, you reach the physical end of
an LVM volume and have to move across a disk, an interrupt does what an
interrupt does, some process is getting noticed on a changing file
descriptor, DBUS is happening in the kernel, token ring has run out of
tokens, thermal throttling, bitflips on SATA leading to retransmission,
some page getting fetched from swap...

On 03.10.2014 15:34, Marcus D. Leech via USRP-users wrote:

One has to keep firmly in mind that programs like rx_samples_to_file are
examples that show how to use

the underlying UHD API. They are not necessarily optimized for all
situations, and indeed, one could

restructure rx_samples_to_file to decouple UHD I/O from filesystem I/O,
using a large buffer between them.

The fact is that dynamic performance of high-speed, real-time, flows is
something that almost-invariably needs

tweaking for any particular situation. There's no way for an example
application to meet all those requirements.

But the fact also remains that for some systems, rx_samples_to_file
(and uhd_rx_cfile on the Gnu Radio side)

are able to stream high-speed data just fine as-is.

On 2014-10-03 09:26, Peter Witkowski via USRP-users wrote:

To say that the issue is just because the disk subsystem can't keep up is a bit of cop-out.

I had issues writing to disk when the incoming stream was 400MB/s and my RAID0 system was benchmarked at being much higher than that.

The issue that I've been seeing stems from the fact that it appears that you cannot concurrently read/write from the data stream as its coming in. In effect you have a main loop that reads from the device and then immediately tries to write that buffer to file. If you do not complete these operations in a timely fashion overflows occur.

One way to solve (or at least band aid the issue) is to set your dirty_background_ratio to 0. I was able to get writing to disk working somewhat with this setting as it is more predictable to directly write to disk instead of having your write cache fill up and then having a large amount of data to push to disk. That said, my RAID0 array is capable of such speeds and even then I was getting a few (but much reduced) overflows.

The one surefire way I know of getting this working (even on a slow disk system) is to buffer the data. The buffer can then be consumed by the disk writing process while being concurrently added onto by the device reader. The easiest way to test buffering (that I've found) is to simply set up a GNURadio Companion program with a stream-to-vector block between the USRP and file sink blocks. This is exactly what I am doing currently since even with a very powerful system, I could not get data saved to disk quickly enough given the aforementioned issues with the provided UHD software.

On Thu, Oct 2, 2014 at 11:48 PM, gsmandvoip via USRP-users usrp-users@lists.ettus.com usrp-users@lists.ettus.com wrote:

Thanks Marcus for your replies. Yes O gone away.

On Thu, Oct 2, 2014 at 5:50 PM, Marcus D. Leech mleech@ripnet.com mleech@ripnet.com wrote:

with rx_samples_to_file without _4rx.rbf, Initially I tried on my i3, 4GB ram, it gave me
some OOOO but was lesser than earlier, but I do not understand, my most of the ram capacity and processor was sitting idle while it shows OOOO, why is this strange behaviour The default format for uhd_rx_cfile is complex-float, thus doubling the amount of data written compared to rx_samples_to_file.

You can't just use CPU usage as an indicator of loading--if you're writing to disk, the disk subsystem may be much slower than you think, so the
"rate limiting step" is writes to the disk, not computational elements.

Try using /dev/null as the file that you write to. If the 'O' go away, even at higher sampling rates, then it's your disk subsystem.

using uhd_rx_cfile getting similar result, but strangely, why it is low, at 4M sampling rate it was higher???

On Thu, Oct 2, 2014 at 9:27 AM, Marcus D. Leech mleech@ripnet.com mleech@ripnet.com wrote:

On 10/01/2014 11:46 PM, gsmandvoip wrote:

Yes I am running single channel, but when trying to achieve my desired sampling rate without _4rx.rbf, it says, requested sampling rate is not valid, adjusting to some 3.9M or so. sorry for misleading info I gave earlier, I have i3, with 32 bit and i7 with 64 bit, but getting same result on both machines

Here is my command to capture signal:

./rx_samples_to_file --args="fpga=usrp1_fpga_4rx.rbf, subdev=DBSRX" --freq "$FC" --rate="$SR" $FILE --nsamps "$NSAMPLES"

and here is its output:

Creating the usrp device with: fpga=usrp1_fpga_4rx.rbf, subdev=DBSRX...
-- Loading firmware image: /usr/share/uhd/images/usrp1_fw.ihx... done
-- Opening a USRP1 device...
-- Loading FPGA image: /usr/share/uhd/images/usrp1_
fpga_4rx.rbf... done
-- Using FPGA clock rate of 52.000000MHz...
ERROR: LOOKUPERROR: INDEXERROR: MULTI_USRP::GET_TX_SUBDEV_SPEC(0) FAILED TO MAKE DEFAULT SPEC - VALUEERROR: THE SUBDEVICE SPECIFICATION "A:0" IS TOO LONG.
The user specified 1 channels, but there are only 0 tx dsps on mboard 0.

Don't use the _4rx image if you don't need it.

The USRP1 only does strict-integer resampling, and with a master clock (NON STANDARD FOR USRP1) of 52.000MHz, 4Msps is not a sample rate
that it can produce. Try 5.2Msps or 4.3333Msps.

At 5.2Msps, it's recording at roughly 20.8Mbytes/second, so your system needs to be able to sustain that for at least as long as the capture lasts.


USRP-users mailing listUSRP-users@lists.ettus.comhttp://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com


USRP-users mailing list
USRP-users@lists.ettus.com
http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com

--
Peter Witkowski
pwitkowski@gmail.com

So I'm confused. You state that if I can't use rx_samples_to_file, my system is failing to perform as specified to write data out, then you give an example of several things that can happen to create a stochastic write speed (which I totally understand and agree with). Given that writes can be stochastic, why is there not a software buffer implemented in the UHD sample code to account for such issues? I understand that it's meant to be an example, but I've also seen it referenced as being used effectively as a debugger or test for people having issues (i.e. recommendation to use the UHD programs in place of GNURadio to resolve issues). Also, in terms of benchmarking, I'm quoting minimum values, not averages. I agree with you that average values are pointless, and in reality the disk subsystem needs to perform when called up. My minimum values for a 4 disk RAID0 with a dedicated controller are well within the data rate that I am pushing. Is there an example system that can handle sustained data capture from the USRP at (or near the limits) of 10GigE or the PCIe interfaces (maybe the requirement is enterprise class PCIe SSDs)? I'm running a two socket Xenon system (two hex core processors) with 64GB of RAM. How much more hardware should I throw at the problem to be able to sample/write at 100MS (half of what is quoted on the website for bandwidth for the 10GigE kit) using the provided code? I think the issue here is that the code itself can't simply get through it's main loop fast enough. There's a difference between data bandwidth and CPU throughput. The sequential nature of the code means that if any weird stuff happens (your example was a good set of kernel related hilarity that can lead to stochastic timing) you will have overflows since you cannot read fast enough. This is why a 90% solution for my application was to just set the dirty_background_ratio to 0 and also why redirection to /dev/null makes overflows go away. With either method I didn't have to wait for a large write cache to flush before moving on to the next read from the USRP. Note that there can also be things that happen on the read side as well. Does this mean that I can only run the code on an RTOS? As a final note, my understanding is that GNURadio and the USRP were developed for domain experts in DSP to use. These users may or may not have prior experience in software. As a result, I'd recommend perhaps adding a buffered example or have the USRP GNURadio block allow for buffering. Otherwise, I just don't see how you can advertise 200 MS/s (maybe even a simple "buffer" block in GNURadio would do the trick?). I understand that this is theoretical limit of the bus, but if there doesn't exist a driver or other software to make use of this, the practical limit becomes much, much smaller. On Fri, Oct 3, 2014 at 10:55 AM, Marcus Müller <usrp-users@lists.ettus.com> wrote: > I have to agree with Marcus on this. Also, keep in mind that storage is > really what an operating system should take care of in any "general > purpose" scenario, ie. that as long as I just write to a file, I'd expect > that the thing in charge of storage (my kernel / the filesystems / block > device drivers) does the best it can to keep up. If I find myself in a > situation where my specific storage needs dictate a huge write buffer, > changing the application might be one way, but as I'm responsible for my > won storage subsystem, I could just as well increase the cache buffer > sizes, and let the operating system handle storage operation. If your RAID > is really performing as well as it is benchmarked to, then this should not > be one of your problems. All rx_samples_to_file does is really sequentially > writing out data at a constant rate, which is the most basic write > benchmark I can think of. > > If your storage subsystem (filesystem + storage abstraction + raid driver > + interface driver + hard drive interface + hard drives + hardware caches) > can't keep up, it's failing to perform as specified, simple as that. In > this case, saying that the application needs to be smarter when dealing > with storage seems like a bit of a cop-out to me ;) > > I'd like to point out that most benchmarks use heavily averaged numbers > for write speeds etc. UHD on the other hand kind of demands soft real-time > performance of a write subsystem, which is a lot harder to fulfill. This > comes up rather frequently, but I have to stress it: you need a fast > guaranteed write rate, not only an average one, and as soon as your > operating system has to postpone writing data[1], it has to have enough > performance to catch up whilst still meeting continued demand. This is > general purpose hardware running general purpose OS with dozens of > processes, and you can't just say "every single component is up to my task, > thus my system suffices", because everything potentially blocks everything! > > Greetings, > Marcus > > [1] e.g. because the filesystems needs to calculate checksums, update > tables, another process gets scheduled, a device blocks your PCIe bus, your > platters randomly need a bit longer to seek, you reach the physical end of > an LVM volume and have to move across a disk, an interrupt does what an > interrupt does, some process is getting noticed on a changing file > descriptor, DBUS is happening in the kernel, token ring has run out of > tokens, thermal throttling, bitflips on SATA leading to retransmission, > some page getting fetched from swap... > > > On 03.10.2014 15:34, Marcus D. Leech via USRP-users wrote: > > > > One has to keep firmly in mind that programs like rx_samples_to_file are > *examples* that show how to use > > the underlying UHD API. They are not necessarily optimized for all > situations, and indeed, one could > > restructure rx_samples_to_file to decouple UHD I/O from filesystem I/O, > using a large buffer between them. > > The fact is that dynamic performance of high-speed, real-time, flows is > something that almost-invariably needs > > tweaking for any particular situation. There's no way for an example > application to meet all those requirements. > > But the fact also remains that for *some* systems, rx_samples_to_file > (and uhd_rx_cfile on the Gnu Radio side) > > are able to stream high-speed data just fine as-is. > > On 2014-10-03 09:26, Peter Witkowski via USRP-users wrote: > > > To say that the issue is just because the disk subsystem can't keep up is a bit of cop-out. > > I had issues writing to disk when the incoming stream was 400MB/s and my RAID0 system was benchmarked at being much higher than that. > > The issue that I've been seeing stems from the fact that it appears that you cannot concurrently read/write from the data stream as its coming in. In effect you have a main loop that reads from the device and then immediately tries to write that buffer to file. If you do not complete these operations in a timely fashion overflows occur. > > One way to solve (or at least band aid the issue) is to set your dirty_background_ratio to 0. I was able to get writing to disk working somewhat with this setting as it is more predictable to directly write to disk instead of having your write cache fill up and then having a large amount of data to push to disk. That said, my RAID0 array is capable of such speeds and even then I was getting a few (but much reduced) overflows. > > The one surefire way I know of getting this working (even on a slow disk system) is to buffer the data. The buffer can then be consumed by the disk writing process while being concurrently added onto by the device reader. The easiest way to test buffering (that I've found) is to simply set up a GNURadio Companion program with a stream-to-vector block between the USRP and file sink blocks. This is exactly what I am doing currently since even with a very powerful system, I could not get data saved to disk quickly enough given the aforementioned issues with the provided UHD software. > > On Thu, Oct 2, 2014 at 11:48 PM, gsmandvoip via USRP-users <usrp-users@lists.ettus.com> <usrp-users@lists.ettus.com> wrote: > > Thanks Marcus for your replies. Yes O gone away. > > On Thu, Oct 2, 2014 at 5:50 PM, Marcus D. Leech <mleech@ripnet.com> <mleech@ripnet.com> wrote: > > with rx_samples_to_file without _4rx.rbf, Initially I tried on my i3, 4GB ram, it gave me > some OOOO but was lesser than earlier, but I do not understand, my most of the ram capacity and processor was sitting idle while it shows OOOO, why is this strange behaviour The default format for uhd_rx_cfile is complex-float, thus doubling the amount of data written compared to rx_samples_to_file. > > You can't just use CPU usage as an indicator of loading--if you're writing to disk, the disk subsystem may be much slower than you think, so the > "rate limiting step" is writes to the disk, not computational elements. > > Try using /dev/null as the file that you write to. If the 'O' go away, even at higher sampling rates, then it's your disk subsystem. > > using uhd_rx_cfile getting similar result, but strangely, why it is low, at 4M sampling rate it was higher??? > > On Thu, Oct 2, 2014 at 9:27 AM, Marcus D. Leech <mleech@ripnet.com> <mleech@ripnet.com> wrote: > > On 10/01/2014 11:46 PM, gsmandvoip wrote: > > Yes I am running single channel, but when trying to achieve my desired sampling rate without _4rx.rbf, it says, requested sampling rate is not valid, adjusting to some 3.9M or so. sorry for misleading info I gave earlier, I have i3, with 32 bit and i7 with 64 bit, but getting same result on both machines > > Here is my command to capture signal: > > ./rx_samples_to_file --args="fpga=usrp1_fpga_4rx.rbf, subdev=DBSRX" --freq "$FC" --rate="$SR" $FILE --nsamps "$NSAMPLES" > > and here is its output: > > Creating the usrp device with: fpga=usrp1_fpga_4rx.rbf, subdev=DBSRX... > -- Loading firmware image: /usr/share/uhd/images/usrp1_fw.ihx... done > -- Opening a USRP1 device... > -- Loading FPGA image: /usr/share/uhd/images/usrp1_ > fpga_4rx.rbf... done > -- Using FPGA clock rate of 52.000000MHz... > ERROR: LOOKUPERROR: INDEXERROR: MULTI_USRP::GET_TX_SUBDEV_SPEC(0) FAILED TO MAKE DEFAULT SPEC - VALUEERROR: THE SUBDEVICE SPECIFICATION "A:0" IS TOO LONG. > The user specified 1 channels, but there are only 0 tx dsps on mboard 0. > > Don't use the _4rx image if you don't need it. > > The USRP1 only does strict-integer resampling, and with a master clock (NON STANDARD FOR USRP1) of 52.000MHz, 4Msps is not a sample rate > that it can produce. Try 5.2Msps or 4.3333Msps. > > At 5.2Msps, it's recording at roughly 20.8Mbytes/second, so your system needs to be able to sustain that for at least as long as the capture lasts. > > > > _______________________________________________ > USRP-users mailing listUSRP-users@lists.ettus.comhttp://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com > > > > _______________________________________________ > USRP-users mailing list > USRP-users@lists.ettus.com > http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com > > -- Peter Witkowski pwitkowski@gmail.com
MM
Marcus Müller
Sat, Oct 4, 2014 1:14 PM

Hi Peter,

didn't mean to confuse you! Actually, my job is doing the opposite (ie.
providing useful information), and thus let me just shortly follow up on
this:
On 03.10.2014 17:44, Peter Witkowski via USRP-users wrote:

So I'm confused.

You state that if I can't use rx_samples_to_file, my system is failing to
perform as specified to write data out, then you give an example of several
things that can happen to create a stochastic write speed (which I totally
understand and agree with).  Given that writes can be stochastic, why is
there not a software buffer implemented in the UHD sample code to account
for such issues?

Well, because that's, in my opinion, an operating system's job. Being a
code example, rx_samples_to_file just musn't contain the complexity
introduced when you try to implement buffering functionality smarter
than what your OS can do. And, I do think it's nearly impossible to be
smarter than the linux kernel when optimizing writes -- but you'll
have to tell your kernel what you want, as a user. The kernel, as it is
configured by any modern distribution by default, won't do enormous
write buffers, because that's not what the user usually wants,
increasing the risk of data loss in case of system failure, and because
you usually don't want to spend all of your RAM on filesystem buffers.
In your 64GB RAM case, though, default buffer sizes should suffice, I
guess, so I'm a bit out of clues here.
It is definitely not very hard to increase these buffers' sizes[1], so I
encourage you to try it and see if that solves your problem. Now, I must
admit that up to here I was always assuming you hadn't already played
around with these values, if this is not the case, please accept my
apologies!

I understand that it's meant to be an example, but I've
also seen it referenced as being used effectively as a debugger or test for
people having issues (i.e. recommendation to use the UHD programs in place
of GNURadio to resolve issues).

...and it's done many users and thus Ettus a great job of supplying
basic functionality! The fact that it works in almost any situation with
this very minimalistic approach (repeated recv->write) proves that UHD
is in fact a rather nice driver interface, IMHO. The fact that GNU Radio
sometimes solves issues that rx_samples_to_file can't indicates exactly
the buffering approach to be helpful. But in that case, buffering is not
increased by increasing kernel buffer sizes, but by introducing GNU
Radio buffers between blocks. The USRP source (Martin, scold me if I say
something stupid) is not really much smarter than rx_samples_to_file: It
recv()s a packet of samples, and returns these samples from the work
function, and then GNU Radio takes care of shuffling and buffering that
data. Basically, GNU Radio behaves much like an operating system from
the source block's point of view.

Also, in terms of benchmarking, I'm quoting minimum values, not averages.
I agree with you that average values are pointless, and in reality the disk
subsystem needs to perform when called up.  My minimum values for a 4 disk
RAID0 with a dedicated controller are well within the data rate that I am
pushing.

Well, I'll kind of disagree with you: If your minimum write rate of your
system was bigger than the rate rx_samples_to_file causes, then you
wouldn't see the problem. The point, I believe, here is that the storage
system does not only consist of the hardware side of your RAID, but also
on your complete operating environment. Something slows down how fast
data is written to the RAID.
I think we both would expect the following to happen:

repeatedly:

rx_samples_to_file:
uhd::rx_streamer::recv
(blocks until a packet of samples has arrived. Instantly returns if
it has before the call)
write(file_handle, recv_buff)
(instantly returns, because writing should hit a buffer that the
operating system transparently pushes out to a disk. If buffer is full,
then block until enough space in buffer -- unless your filesystem is
mounted with some sync option...)

Now, if your RAID is definitely fast enough, the write buffer should
never get full. My hypothesis here is that either, your buffer size is
just to small, and a block of samples doesn't fit and has to be written
out instantly (which is unlikely), or something else occupies your
system. That might be just the fact that 400MB/s (are we talking about
an X3x0?) inevitably places a heavy load on things like PCIe busses and
CPUs, and that introduces a bottleneck in your storage chain which isn't
there if you "just" benchmark without the USRP. Also, the rather
smallish sizes of network packets dictate that journalling file systems
introduce a very bad overhead -- I don't know if you benchmarked with
files on a journaling file system and a (network packet size - header)
block size...

Is there an example system that can handle sustained data capture from the
USRP at (or near the limits) of 10GigE or the PCIe interfaces (maybe the
requirement is enterprise class PCIe SSDs)?  I'm running a two socket Xenon
system (two hex core processors) with 64GB of RAM.  How much more hardware
should I throw at the problem to be able to sample/write at 100MS (half of
what is quoted on the website for bandwidth for the 10GigE kit) using the
provided code?

Definitely a nice system! I must admit that I don't have access to a
comparable setup, and thus I can't really offer you any first-hand
experience. Maybe others can.

I think the issue here is that the code itself can't simply get through
it's main loop fast enough.  There's a difference between data bandwidth
and CPU throughput.  The sequential nature of the code means that if any
weird stuff happens (your example was a good set of kernel related hilarity
that can lead to stochastic timing) you will have overflows since you
cannot read fast enough.  This is why a 90% solution for my application was
to just set the dirty_background_ratio to 0 and also why redirection to
/dev/null makes overflows go away.

This is interesting, as dirty_background_ratio is the percentage at
which the kernel should start writing out dirty pages in the background.
Now, I'm the one who's confused, because I would have expected this to
negatively impact performance. On the other hand, 0 (at least in my
head) does not make very much sense, maybe it's semantically identical
to 100%? Are you swapping (64GB would tell me you shouldn't have swap or
extremly low swappiness)?
On the other hand, it might really be that storage is not the bottleneck
here, and in fact maybe the CPU gets saturated. Now, you said that
writing to /dev/null solves your problem. Do your RAID or filesystem
consume a lot of CPU cycles? This is an interesting mystery...

With either method I didn't have to
wait for a large write cache to flush before moving on to the next read
from the USRP.  Note that there can also be things that happen on the read
side as well.  Does this mean that I can only run the code on an RTOS?

No :) UHD has it's own incoming buffer handlers, but as you already
said, in this high performance scenario, you might be totally right, and
our single-threaded approach just doesn't cut it. Maybe dropping in some
asynchronous storage IO would help -- but I hate seeing that blowing up
in example users' faces, so I guess the fact that it doesn't work with a
system as potent as yours with the sample rates as high as you demand
might actually be a shortcoming of the examples that isn't going to be
fixed.

As a final note, my understanding is that GNURadio and the USRP were
developed for domain experts in DSP to use.

These are SDR frameworks and devices, respectively. The idea is to offer
people with the opportunity to build awesome DSP systems using
universally usable SDR blocks (GNU Radio) and universal software radio
peripherals, so well, they certainly address DSP people, but they
shouldn't be hard to use.

These users may or may not
have prior experience in software.  As a result, I'd recommend perhaps
adding a buffered example or have the USRP GNURadio block allow for
buffering.

That is something we might consider. On the other hand, when someone
goes as far as you do, maybe having an example that does the buffering
in a separate thread (or even process) isn't worth that much -- in the
end, one will want to write one's own high performance application, and
that will include handling such data rates.

Otherwise, I just don't see how you can advertise 200 MS/s
(maybe even a simple "buffer" block in GNURadio would do the trick?).

Well, the devices support these rates, and our driver is able to
withstand these rates and sustain them without hitting CPU barriers due
to having too much overhead. That's awesome (ok, I might be biased, but
I think it's awesome). I don't feel ashamed because on your specific
setup, we can't find a way to make any of our generic examples deliver
the full rate of rx streams to storage -- we sell RF hardware, and not
storage infrastructure, and the point of the examples is demonstrating
the usage of UHD, and not holding a lecture on high performance storage
handling. I wish, though, that we could solve your problem.

Now, GNU Radio/gr-uhd does in fact come with an application called
uhd_rx_cfile, which is more or less a clone of rx_samples_to_file using
gr-uhd and GNU Radio instead of raw UHD. Does that work out for you?

I
understand that this is theoretical limit of the bus, but if there doesn't
exist a driver or other software to make use of this, the practical limit
becomes much, much smaller.

Well, UHD seems to be able to sustain these rates, if you write to
/dev/null, right? So the practical limit for UHD is definitely not being
hit.
I have another --maybe even practical-- suggestion to make: Roll your
own buffer!

mkfifo /tmp/mybuffer #assuming tmpfs is in ram
dd if=/tmp/mybuffer of=/mount/raid_volume/data.dat & #start in
background; you could play around with block sizes using the bs= option
of dd
rx_samples_to_file --file /tmp/mybuffer [all the other options]

By the way: Thanks for bringing this up! We know that recording samples
is a core concern of many users.

Greetings,
Marcus

[1] https://www.kernel.org/doc/Documentation/sysctl/vm.txt

On Fri, Oct 3, 2014 at 10:55 AM, Marcus Müller usrp-users@lists.ettus.com
wrote:

I have to agree with Marcus on this. Also, keep in mind that storage is
really what an operating system should take care of in any "general
purpose" scenario, ie. that as long as I just write to a file, I'd expect
that the thing in charge of storage (my kernel / the filesystems / block
device drivers) does the best it can to keep up. If I find myself in a
situation where my specific storage needs dictate a huge write buffer,
changing the application might be one way, but as I'm responsible for my
won storage subsystem, I could just as well increase the cache buffer
sizes, and let the operating system handle storage operation. If your RAID
is really performing as well as it is benchmarked to, then this should not
be one of your problems. All rx_samples_to_file does is really sequentially
writing out data at a constant rate, which is the most basic write
benchmark I can think of.

If your storage subsystem (filesystem + storage abstraction + raid driver

  • interface driver + hard drive interface + hard drives + hardware caches)
    can't keep up, it's failing to perform as specified, simple as that. In
    this case, saying that the application needs to be smarter when dealing
    with storage seems like a bit of a cop-out to me ;)

I'd like to point out that most benchmarks use heavily averaged numbers
for write speeds etc. UHD on the other hand kind of demands soft real-time
performance of a write subsystem, which is a lot harder to fulfill. This
comes up rather frequently, but I have to stress it: you need a fast
guaranteed write rate, not only an average one, and as soon as your
operating system has to postpone writing data[1], it has to have enough
performance to catch up whilst still meeting continued demand. This is
general purpose hardware running general purpose OS with dozens of
processes, and you can't just say "every single component is up to my task,
thus my system suffices", because everything potentially blocks everything!

Greetings,
Marcus

[1] e.g. because the filesystems needs to calculate checksums, update
tables, another process gets scheduled, a device blocks your PCIe bus, your
platters randomly need a bit longer to seek, you reach the physical end of
an LVM volume and have to move across a disk, an interrupt does what an
interrupt does, some process is getting noticed on a changing file
descriptor, DBUS is happening in the kernel, token ring has run out of
tokens, thermal throttling, bitflips on SATA leading to retransmission,
some page getting fetched from swap...

On 03.10.2014 15:34, Marcus D. Leech via USRP-users wrote:

One has to keep firmly in mind that programs like rx_samples_to_file are
examples that show how to use

the underlying UHD API. They are not necessarily optimized for all
situations, and indeed, one could

restructure rx_samples_to_file to decouple UHD I/O from filesystem I/O,
using a large buffer between them.

The fact is that dynamic performance of high-speed, real-time, flows is
something that almost-invariably needs

tweaking for any particular situation. There's no way for an example
application to meet all those requirements.

But the fact also remains that for some systems, rx_samples_to_file
(and uhd_rx_cfile on the Gnu Radio side)

are able to stream high-speed data just fine as-is.

On 2014-10-03 09:26, Peter Witkowski via USRP-users wrote:

To say that the issue is just because the disk subsystem can't keep up is a bit of cop-out.

I had issues writing to disk when the incoming stream was 400MB/s and my RAID0 system was benchmarked at being much higher than that.

The issue that I've been seeing stems from the fact that it appears that you cannot concurrently read/write from the data stream as its coming in. In effect you have a main loop that reads from the device and then immediately tries to write that buffer to file. If you do not complete these operations in a timely fashion overflows occur.

One way to solve (or at least band aid the issue) is to set your dirty_background_ratio to 0. I was able to get writing to disk working somewhat with this setting as it is more predictable to directly write to disk instead of having your write cache fill up and then having a large amount of data to push to disk. That said, my RAID0 array is capable of such speeds and even then I was getting a few (but much reduced) overflows.

The one surefire way I know of getting this working (even on a slow disk system) is to buffer the data. The buffer can then be consumed by the disk writing process while being concurrently added onto by the device reader. The easiest way to test buffering (that I've found) is to simply set up a GNURadio Companion program with a stream-to-vector block between the USRP and file sink blocks. This is exactly what I am doing currently since even with a very powerful system, I could not get data saved to disk quickly enough given the aforementioned issues with the provided UHD software.

On Thu, Oct 2, 2014 at 11:48 PM, gsmandvoip via USRP-users usrp-users@lists.ettus.com usrp-users@lists.ettus.com wrote:

Thanks Marcus for your replies. Yes O gone away.

On Thu, Oct 2, 2014 at 5:50 PM, Marcus D. Leech mleech@ripnet.com mleech@ripnet.com wrote:

with rx_samples_to_file without _4rx.rbf, Initially I tried on my i3, 4GB ram, it gave me
some OOOO but was lesser than earlier, but I do not understand, my most of the ram capacity and processor was sitting idle while it shows OOOO, why is this strange behaviour The default format for uhd_rx_cfile is complex-float, thus doubling the amount of data written compared to rx_samples_to_file.

You can't just use CPU usage as an indicator of loading--if you're writing to disk, the disk subsystem may be much slower than you think, so the
"rate limiting step" is writes to the disk, not computational elements.

Try using /dev/null as the file that you write to. If the 'O' go away, even at higher sampling rates, then it's your disk subsystem.

using uhd_rx_cfile getting similar result, but strangely, why it is low, at 4M sampling rate it was higher???

On Thu, Oct 2, 2014 at 9:27 AM, Marcus D. Leech mleech@ripnet.com mleech@ripnet.com wrote:

On 10/01/2014 11:46 PM, gsmandvoip wrote:

Yes I am running single channel, but when trying to achieve my desired sampling rate without _4rx.rbf, it says, requested sampling rate is not valid, adjusting to some 3.9M or so. sorry for misleading info I gave earlier, I have i3, with 32 bit and i7 with 64 bit, but getting same result on both machines

Here is my command to capture signal:

./rx_samples_to_file --args="fpga=usrp1_fpga_4rx.rbf, subdev=DBSRX" --freq "$FC" --rate="$SR" $FILE --nsamps "$NSAMPLES"

and here is its output:

Creating the usrp device with: fpga=usrp1_fpga_4rx.rbf, subdev=DBSRX...
-- Loading firmware image: /usr/share/uhd/images/usrp1_fw.ihx... done
-- Opening a USRP1 device...
-- Loading FPGA image: /usr/share/uhd/images/usrp1_
fpga_4rx.rbf... done
-- Using FPGA clock rate of 52.000000MHz...
ERROR: LOOKUPERROR: INDEXERROR: MULTI_USRP::GET_TX_SUBDEV_SPEC(0) FAILED TO MAKE DEFAULT SPEC - VALUEERROR: THE SUBDEVICE SPECIFICATION "A:0" IS TOO LONG.
The user specified 1 channels, but there are only 0 tx dsps on mboard 0.

Don't use the _4rx image if you don't need it.

The USRP1 only does strict-integer resampling, and with a master clock (NON STANDARD FOR USRP1) of 52.000MHz, 4Msps is not a sample rate
that it can produce. Try 5.2Msps or 4.3333Msps.

At 5.2Msps, it's recording at roughly 20.8Mbytes/second, so your system needs to be able to sustain that for at least as long as the capture lasts.


USRP-users mailing listUSRP-users@lists.ettus.comhttp://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com


USRP-users mailing list
USRP-users@lists.ettus.com
http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com

Hi Peter, didn't mean to confuse you! Actually, my job is doing the opposite (ie. providing useful information), and thus let me just shortly follow up on this: On 03.10.2014 17:44, Peter Witkowski via USRP-users wrote: > So I'm confused. > > You state that if I can't use rx_samples_to_file, my system is failing to > perform as specified to write data out, then you give an example of several > things that can happen to create a stochastic write speed (which I totally > understand and agree with). Given that writes can be stochastic, why is > there not a software buffer implemented in the UHD sample code to account > for such issues? Well, because that's, in my opinion, an operating system's job. Being a code example, rx_samples_to_file just *musn't* contain the complexity introduced when you try to implement buffering functionality smarter than what your OS can do. And, I do think it's nearly impossible to be smarter than the linux kernel when optimizing writes -- *but* you'll have to tell your kernel what you want, as a user. The kernel, as it is configured by any modern distribution by default, won't do enormous write buffers, because that's not what the user usually wants, increasing the risk of data loss in case of system failure, and because you usually don't want to spend all of your RAM on filesystem buffers. In your 64GB RAM case, though, default buffer sizes should suffice, I guess, so I'm a bit out of clues here. It is definitely not very hard to increase these buffers' sizes[1], so I encourage you to try it and see if that solves your problem. Now, I must admit that up to here I was always assuming you hadn't already played around with these values, if this is not the case, please accept my apologies! > I understand that it's meant to be an example, but I've > also seen it referenced as being used effectively as a debugger or test for > people having issues (i.e. recommendation to use the UHD programs in place > of GNURadio to resolve issues). ...and it's done many users and thus Ettus a great job of supplying basic functionality! The fact that it works in almost any situation with this very minimalistic approach (repeated recv->write) proves that UHD is in fact a rather nice driver interface, IMHO. The fact that GNU Radio sometimes solves issues that rx_samples_to_file can't indicates exactly the buffering approach to be helpful. But in that case, buffering is not increased by increasing kernel buffer sizes, but by introducing GNU Radio buffers between blocks. The USRP source (Martin, scold me if I say something stupid) is not really much smarter than rx_samples_to_file: It recv()s a packet of samples, and returns these samples from the work function, and then GNU Radio takes care of shuffling and buffering that data. Basically, GNU Radio behaves much like an operating system from the source block's point of view. > > Also, in terms of benchmarking, I'm quoting minimum values, not averages. > I agree with you that average values are pointless, and in reality the disk > subsystem needs to perform when called up. My minimum values for a 4 disk > RAID0 with a dedicated controller are well within the data rate that I am > pushing. Well, I'll kind of disagree with you: If your minimum write rate of your system was bigger than the rate rx_samples_to_file causes, then you wouldn't see the problem. The point, I believe, here is that the storage system does not only consist of the hardware side of your RAID, but also on your complete operating environment. Something slows down how fast data is written to the RAID. I think we both would expect the following to happen: repeatedly: rx_samples_to_file: uhd::rx_streamer::recv (blocks until a packet of samples has arrived. Instantly returns if it has before the call) write(file_handle, recv_buff) (instantly returns, because writing should hit a buffer that the operating system transparently pushes out to a disk. If buffer is full, then block until enough space in buffer -- unless your filesystem is mounted with some sync option...) Now, if your RAID is definitely fast enough, the write buffer should never get full. My hypothesis here is that either, your buffer size is just to small, and a block of samples doesn't fit and has to be written out instantly (which is unlikely), or something else occupies your system. That might be just the fact that 400MB/s (are we talking about an X3x0?) inevitably places a heavy load on things like PCIe busses and CPUs, and that introduces a bottleneck in your storage chain which isn't there if you "just" benchmark without the USRP. Also, the rather smallish sizes of network packets dictate that journalling file systems introduce a very bad overhead -- I don't know if you benchmarked with files on a journaling file system and a (network packet size - header) block size... > > Is there an example system that can handle sustained data capture from the > USRP at (or near the limits) of 10GigE or the PCIe interfaces (maybe the > requirement is enterprise class PCIe SSDs)? I'm running a two socket Xenon > system (two hex core processors) with 64GB of RAM. How much more hardware > should I throw at the problem to be able to sample/write at 100MS (half of > what is quoted on the website for bandwidth for the 10GigE kit) using the > provided code? Definitely a nice system! I must admit that I don't have access to a comparable setup, and thus I can't really offer you any first-hand experience. Maybe others can. > I think the issue here is that the code itself can't simply get through > it's main loop fast enough. There's a difference between data bandwidth > and CPU throughput. The sequential nature of the code means that if any > weird stuff happens (your example was a good set of kernel related hilarity > that can lead to stochastic timing) you will have overflows since you > cannot read fast enough. This is why a 90% solution for my application was > to just set the dirty_background_ratio to 0 and also why redirection to > /dev/null makes overflows go away. This is interesting, as dirty_background_ratio is the percentage at which the kernel should start writing out dirty pages in the background. Now, I'm the one who's confused, because I would have expected this to negatively impact performance. On the other hand, 0 (at least in my head) does not make very much sense, maybe it's semantically identical to 100%? Are you swapping (64GB would tell me you shouldn't have swap or extremly low swappiness)? On the other hand, it might really be that storage is not the bottleneck here, and in fact maybe the CPU gets saturated. Now, you said that writing to /dev/null solves your problem. Do your RAID or filesystem consume a lot of CPU cycles? This is an interesting mystery... > With either method I didn't have to > wait for a large write cache to flush before moving on to the next read > from the USRP. Note that there can also be things that happen on the read > side as well. Does this mean that I can only run the code on an RTOS? No :) UHD has it's own incoming buffer handlers, but as you already said, in this high performance scenario, you might be totally right, and our single-threaded approach just doesn't cut it. Maybe dropping in some asynchronous storage IO would help -- but I hate seeing that blowing up in example users' faces, so I guess the fact that it doesn't work with a system as potent as yours with the sample rates as high as you demand might actually be a shortcoming of the examples that isn't going to be fixed. > As a final note, my understanding is that GNURadio and the USRP were > developed for domain experts in DSP to use. These are SDR frameworks and devices, respectively. The idea is to offer people with the opportunity to build awesome DSP systems using universally usable SDR blocks (GNU Radio) and universal software radio peripherals, so well, they certainly address DSP people, but they shouldn't be hard to use. > These users may or may not > have prior experience in software. As a result, I'd recommend perhaps > adding a buffered example or have the USRP GNURadio block allow for > buffering. That is something we might consider. On the other hand, when someone goes as far as you do, maybe having an example that does the buffering in a separate thread (or even process) isn't worth that much -- in the end, one will want to write one's own high performance application, and that will include handling such data rates. > Otherwise, I just don't see how you can advertise 200 MS/s > (maybe even a simple "buffer" block in GNURadio would do the trick?). Well, the devices support these rates, and our driver is able to withstand these rates and sustain them without hitting CPU barriers due to having too much overhead. That's awesome (ok, I might be biased, but *I* think it's awesome). I don't feel ashamed because on your specific setup, we can't find a way to make any of our generic examples deliver the full rate of rx streams to storage -- we sell RF hardware, and not storage infrastructure, and the point of the examples is demonstrating the usage of UHD, and not holding a lecture on high performance storage handling. I wish, though, that we could solve your problem. Now, GNU Radio/gr-uhd does in fact come with an application called uhd_rx_cfile, which is more or less a clone of rx_samples_to_file using gr-uhd and GNU Radio instead of raw UHD. Does that work out for you? > I > understand that this is theoretical limit of the bus, but if there doesn't > exist a driver or other software to make use of this, the practical limit > becomes much, much smaller. Well, UHD seems to be able to sustain these rates, if you write to /dev/null, right? So the practical limit for UHD is definitely not being hit. I have another --maybe even practical-- suggestion to make: Roll your own buffer! mkfifo /tmp/mybuffer #assuming tmpfs is in ram dd if=/tmp/mybuffer of=/mount/raid_volume/data.dat & #start in background; you could play around with block sizes using the bs= option of dd rx_samples_to_file --file /tmp/mybuffer [all the other options] By the way: Thanks for bringing this up! We know that recording samples is a core concern of many users. Greetings, Marcus [1] https://www.kernel.org/doc/Documentation/sysctl/vm.txt > > On Fri, Oct 3, 2014 at 10:55 AM, Marcus Müller <usrp-users@lists.ettus.com> > wrote: > >> I have to agree with Marcus on this. Also, keep in mind that storage is >> really what an operating system should take care of in any "general >> purpose" scenario, ie. that as long as I just write to a file, I'd expect >> that the thing in charge of storage (my kernel / the filesystems / block >> device drivers) does the best it can to keep up. If I find myself in a >> situation where my specific storage needs dictate a huge write buffer, >> changing the application might be one way, but as I'm responsible for my >> won storage subsystem, I could just as well increase the cache buffer >> sizes, and let the operating system handle storage operation. If your RAID >> is really performing as well as it is benchmarked to, then this should not >> be one of your problems. All rx_samples_to_file does is really sequentially >> writing out data at a constant rate, which is the most basic write >> benchmark I can think of. >> >> If your storage subsystem (filesystem + storage abstraction + raid driver >> + interface driver + hard drive interface + hard drives + hardware caches) >> can't keep up, it's failing to perform as specified, simple as that. In >> this case, saying that the application needs to be smarter when dealing >> with storage seems like a bit of a cop-out to me ;) >> >> I'd like to point out that most benchmarks use heavily averaged numbers >> for write speeds etc. UHD on the other hand kind of demands soft real-time >> performance of a write subsystem, which is a lot harder to fulfill. This >> comes up rather frequently, but I have to stress it: you need a fast >> guaranteed write rate, not only an average one, and as soon as your >> operating system has to postpone writing data[1], it has to have enough >> performance to catch up whilst still meeting continued demand. This is >> general purpose hardware running general purpose OS with dozens of >> processes, and you can't just say "every single component is up to my task, >> thus my system suffices", because everything potentially blocks everything! >> >> Greetings, >> Marcus >> >> [1] e.g. because the filesystems needs to calculate checksums, update >> tables, another process gets scheduled, a device blocks your PCIe bus, your >> platters randomly need a bit longer to seek, you reach the physical end of >> an LVM volume and have to move across a disk, an interrupt does what an >> interrupt does, some process is getting noticed on a changing file >> descriptor, DBUS is happening in the kernel, token ring has run out of >> tokens, thermal throttling, bitflips on SATA leading to retransmission, >> some page getting fetched from swap... >> >> >> On 03.10.2014 15:34, Marcus D. Leech via USRP-users wrote: >> >> >> >> One has to keep firmly in mind that programs like rx_samples_to_file are >> *examples* that show how to use >> >> the underlying UHD API. They are not necessarily optimized for all >> situations, and indeed, one could >> >> restructure rx_samples_to_file to decouple UHD I/O from filesystem I/O, >> using a large buffer between them. >> >> The fact is that dynamic performance of high-speed, real-time, flows is >> something that almost-invariably needs >> >> tweaking for any particular situation. There's no way for an example >> application to meet all those requirements. >> >> But the fact also remains that for *some* systems, rx_samples_to_file >> (and uhd_rx_cfile on the Gnu Radio side) >> >> are able to stream high-speed data just fine as-is. >> >> On 2014-10-03 09:26, Peter Witkowski via USRP-users wrote: >> >> >> To say that the issue is just because the disk subsystem can't keep up is a bit of cop-out. >> >> I had issues writing to disk when the incoming stream was 400MB/s and my RAID0 system was benchmarked at being much higher than that. >> >> The issue that I've been seeing stems from the fact that it appears that you cannot concurrently read/write from the data stream as its coming in. In effect you have a main loop that reads from the device and then immediately tries to write that buffer to file. If you do not complete these operations in a timely fashion overflows occur. >> >> One way to solve (or at least band aid the issue) is to set your dirty_background_ratio to 0. I was able to get writing to disk working somewhat with this setting as it is more predictable to directly write to disk instead of having your write cache fill up and then having a large amount of data to push to disk. That said, my RAID0 array is capable of such speeds and even then I was getting a few (but much reduced) overflows. >> >> The one surefire way I know of getting this working (even on a slow disk system) is to buffer the data. The buffer can then be consumed by the disk writing process while being concurrently added onto by the device reader. The easiest way to test buffering (that I've found) is to simply set up a GNURadio Companion program with a stream-to-vector block between the USRP and file sink blocks. This is exactly what I am doing currently since even with a very powerful system, I could not get data saved to disk quickly enough given the aforementioned issues with the provided UHD software. >> >> On Thu, Oct 2, 2014 at 11:48 PM, gsmandvoip via USRP-users <usrp-users@lists.ettus.com> <usrp-users@lists.ettus.com> wrote: >> >> Thanks Marcus for your replies. Yes O gone away. >> >> On Thu, Oct 2, 2014 at 5:50 PM, Marcus D. Leech <mleech@ripnet.com> <mleech@ripnet.com> wrote: >> >> with rx_samples_to_file without _4rx.rbf, Initially I tried on my i3, 4GB ram, it gave me >> some OOOO but was lesser than earlier, but I do not understand, my most of the ram capacity and processor was sitting idle while it shows OOOO, why is this strange behaviour The default format for uhd_rx_cfile is complex-float, thus doubling the amount of data written compared to rx_samples_to_file. >> >> You can't just use CPU usage as an indicator of loading--if you're writing to disk, the disk subsystem may be much slower than you think, so the >> "rate limiting step" is writes to the disk, not computational elements. >> >> Try using /dev/null as the file that you write to. If the 'O' go away, even at higher sampling rates, then it's your disk subsystem. >> >> using uhd_rx_cfile getting similar result, but strangely, why it is low, at 4M sampling rate it was higher??? >> >> On Thu, Oct 2, 2014 at 9:27 AM, Marcus D. Leech <mleech@ripnet.com> <mleech@ripnet.com> wrote: >> >> On 10/01/2014 11:46 PM, gsmandvoip wrote: >> >> Yes I am running single channel, but when trying to achieve my desired sampling rate without _4rx.rbf, it says, requested sampling rate is not valid, adjusting to some 3.9M or so. sorry for misleading info I gave earlier, I have i3, with 32 bit and i7 with 64 bit, but getting same result on both machines >> >> Here is my command to capture signal: >> >> ./rx_samples_to_file --args="fpga=usrp1_fpga_4rx.rbf, subdev=DBSRX" --freq "$FC" --rate="$SR" $FILE --nsamps "$NSAMPLES" >> >> and here is its output: >> >> Creating the usrp device with: fpga=usrp1_fpga_4rx.rbf, subdev=DBSRX... >> -- Loading firmware image: /usr/share/uhd/images/usrp1_fw.ihx... done >> -- Opening a USRP1 device... >> -- Loading FPGA image: /usr/share/uhd/images/usrp1_ >> fpga_4rx.rbf... done >> -- Using FPGA clock rate of 52.000000MHz... >> ERROR: LOOKUPERROR: INDEXERROR: MULTI_USRP::GET_TX_SUBDEV_SPEC(0) FAILED TO MAKE DEFAULT SPEC - VALUEERROR: THE SUBDEVICE SPECIFICATION "A:0" IS TOO LONG. >> The user specified 1 channels, but there are only 0 tx dsps on mboard 0. >> >> Don't use the _4rx image if you don't need it. >> >> The USRP1 only does strict-integer resampling, and with a master clock (NON STANDARD FOR USRP1) of 52.000MHz, 4Msps is not a sample rate >> that it can produce. Try 5.2Msps or 4.3333Msps. >> >> At 5.2Msps, it's recording at roughly 20.8Mbytes/second, so your system needs to be able to sustain that for at least as long as the capture lasts. >> >> >> >> _______________________________________________ >> USRP-users mailing listUSRP-users@lists.ettus.comhttp://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com >> >> >> >> _______________________________________________ >> USRP-users mailing list >> USRP-users@lists.ettus.com >> http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com >> >> > > > > _______________________________________________ > USRP-users mailing list > USRP-users@lists.ettus.com > http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com
RK
Robert Kossler
Mon, Oct 6, 2014 3:11 PM

Hi Marcus,
The example you provided in your most recent post regarding setting up a
RAM FIFO seems to me to be an excellent idea (although I haven't tried it
yet).  I would just like to comment that these kinds of examples would be
extremely helpful if they were added to the UHD documentation in some way.
Personally, I am a relative novice using Linux so finding my own way to
these kinds of solutions can be time consuming.  In order to solve my own
issues with overflows using rx_samples_to_file, I setup a RAM file system
and simply constrained the memory depth of my captures to the size of the
RAM file system.  This solution can also be useful, and it would have been
helpful to have some tips of this nature in the UHD documentation.

I would also like to comment that it would be helpful if the
rx_samples_to_file utility worked for multiple channels.  I understand the
"it's only an example" mindset, but on the other hand, there is no "actual"
application level software provided with the Ettus boards so the examples
take on a greater level of importance.  It would only minorly add to the
complexity of this example to have it work for multiple channels.  And,
given the number of users that seem to be interested in capturing Rx data,
it would be a welcome change - especially for non-programmers.

Rob Kossler

On Sat, Oct 4, 2014 at 9:14 AM, Marcus Müller usrp-users@lists.ettus.com
wrote:

Hi Peter,

didn't mean to confuse you! Actually, my job is doing the opposite (ie.
providing useful information), and thus let me just shortly follow up on
this:
On 03.10.2014 17:44, Peter Witkowski via USRP-users wrote:

So I'm confused.

You state that if I can't use rx_samples_to_file, my system is failing to
perform as specified to write data out, then you give an example of several
things that can happen to create a stochastic write speed (which I totally
understand and agree with).  Given that writes can be stochastic, why is
there not a software buffer implemented in the UHD sample code to account
for such issues?

Well, because that's, in my opinion, an operating system's job. Being a
code example, rx_samples_to_file just musn't contain the complexity
introduced when you try to implement buffering functionality smarter than
what your OS can do. And, I do think it's nearly impossible to be smarter
than the linux kernel when optimizing writes -- but you'll have to tell
your kernel what you want, as a user. The kernel, as it is configured by
any modern distribution by default, won't do enormous write buffers,
because that's not what the user usually wants, increasing the risk of data
loss in case of system failure, and because you usually don't want to spend
all of your RAM on filesystem buffers. In your 64GB RAM case, though,
default buffer sizes should suffice, I guess, so I'm a bit out of clues
here.
It is definitely not very hard to increase these buffers' sizes[1], so I
encourage you to try it and see if that solves your problem. Now, I must
admit that up to here I was always assuming you hadn't already played
around with these values, if this is not the case, please accept my
apologies!

I understand that it's meant to be an example, but I've
also seen it referenced as being used effectively as a debugger or test for
people having issues (i.e. recommendation to use the UHD programs in place
of GNURadio to resolve issues).

...and it's done many users and thus Ettus a great job of supplying basic
functionality! The fact that it works in almost any situation with this
very minimalistic approach (repeated recv->write) proves that UHD is in
fact a rather nice driver interface, IMHO. The fact that GNU Radio
sometimes solves issues that rx_samples_to_file can't indicates exactly the
buffering approach to be helpful. But in that case, buffering is not
increased by increasing kernel buffer sizes, but by introducing GNU Radio
buffers between blocks. The USRP source (Martin, scold me if I say
something stupid) is not really much smarter than rx_samples_to_file: It
recv()s a packet of samples, and returns these samples from the work
function, and then GNU Radio takes care of shuffling and buffering that
data. Basically, GNU Radio behaves much like an operating system from the
source block's point of view.

Also, in terms of benchmarking, I'm quoting minimum values, not averages.
I agree with you that average values are pointless, and in reality the disk
subsystem needs to perform when called up.  My minimum values for a 4 disk
RAID0 with a dedicated controller are well within the data rate that I am
pushing.

Well, I'll kind of disagree with you: If your minimum write rate of your
system was bigger than the rate rx_samples_to_file causes, then you
wouldn't see the problem. The point, I believe, here is that the storage
system does not only consist of the hardware side of your RAID, but also on
your complete operating environment. Something slows down how fast data is
written to the RAID.
I think we both would expect the following to happen:

repeatedly:

rx_samples_to_file:
uhd::rx_streamer::recv
(blocks until a packet of samples has arrived. Instantly returns if it
has before the call)
write(file_handle, recv_buff)
(instantly returns, because writing should hit a buffer that the
operating system transparently pushes out to a disk. If buffer is full,
then block until enough space in buffer -- unless your filesystem is
mounted with some sync option...)

Now, if your RAID is definitely fast enough, the write buffer should never
get full. My hypothesis here is that either, your buffer size is just to
small, and a block of samples doesn't fit and has to be written out
instantly (which is unlikely), or something else occupies your system. That
might be just the fact that 400MB/s (are we talking about an X3x0?)
inevitably places a heavy load on things like PCIe busses and CPUs, and
that introduces a bottleneck in your storage chain which isn't there if you
"just" benchmark without the USRP. Also, the rather smallish sizes of
network packets dictate that journalling file systems introduce a very bad
overhead -- I don't know if you benchmarked with files on a journaling file
system and a (network packet size - header) block size...

Is there an example system that can handle sustained data capture from the
USRP at (or near the limits) of 10GigE or the PCIe interfaces (maybe the
requirement is enterprise class PCIe SSDs)?  I'm running a two socket Xenon
system (two hex core processors) with 64GB of RAM.  How much more hardware
should I throw at the problem to be able to sample/write at 100MS (half of
what is quoted on the website for bandwidth for the 10GigE kit) using the
provided code?

Definitely a nice system! I must admit that I don't have access to a
comparable setup, and thus I can't really offer you any first-hand
experience. Maybe others can.

I think the issue here is that the code itself can't simply get through
it's main loop fast enough.  There's a difference between data bandwidth
and CPU throughput.  The sequential nature of the code means that if any
weird stuff happens (your example was a good set of kernel related hilarity
that can lead to stochastic timing) you will have overflows since you
cannot read fast enough.  This is why a 90% solution for my application was
to just set the dirty_background_ratio to 0 and also why redirection to
/dev/null makes overflows go away.

This is interesting, as dirty_background_ratio is the percentage at which
the kernel should start writing out dirty pages in the background. Now, I'm
the one who's confused, because I would have expected this to negatively
impact performance. On the other hand, 0 (at least in my head) does not
make very much sense, maybe it's semantically identical to 100%? Are you
swapping (64GB would tell me you shouldn't have swap or extremly low
swappiness)?
On the other hand, it might really be that storage is not the bottleneck
here, and in fact maybe the CPU gets saturated. Now, you said that writing
to /dev/null solves your problem. Do your RAID or filesystem consume a lot
of CPU cycles? This is an interesting mystery...

With either method I didn't have to
wait for a large write cache to flush before moving on to the next read
from the USRP.  Note that there can also be things that happen on the read
side as well.  Does this mean that I can only run the code on an RTOS?

No :) UHD has it's own incoming buffer handlers, but as you already said,
in this high performance scenario, you might be totally right, and our
single-threaded approach just doesn't cut it. Maybe dropping in some
asynchronous storage IO would help -- but I hate seeing that blowing up in
example users' faces, so I guess the fact that it doesn't work with a
system as potent as yours with the sample rates as high as you demand might
actually be a shortcoming of the examples that isn't going to be fixed.

As a final note, my understanding is that GNURadio and the USRP were
developed for domain experts in DSP to use.

These are SDR frameworks and devices, respectively. The idea is to offer
people with the opportunity to build awesome DSP systems using universally
usable SDR blocks (GNU Radio) and universal software radio peripherals, so
well, they certainly address DSP people, but they shouldn't be hard to use.

These users may or may not
have prior experience in software.  As a result, I'd recommend perhaps
adding a buffered example or have the USRP GNURadio block allow for
buffering.

That is something we might consider. On the other hand, when someone goes
as far as you do, maybe having an example that does the buffering in a
separate thread (or even process) isn't worth that much -- in the end, one
will want to write one's own high performance application, and that will
include handling such data rates.

Otherwise, I just don't see how you can advertise 200 MS/s
(maybe even a simple "buffer" block in GNURadio would do the trick?).

Well, the devices support these rates, and our driver is able to
withstand these rates and sustain them without hitting CPU barriers due to
having too much overhead. That's awesome (ok, I might be biased, but I
think it's awesome). I don't feel ashamed because on your specific setup,
we can't find a way to make any of our generic examples deliver the full
rate of rx streams to storage -- we sell RF hardware, and not storage
infrastructure, and the point of the examples is demonstrating the usage of
UHD, and not holding a lecture on high performance storage handling. I
wish, though, that we could solve your problem.

Now, GNU Radio/gr-uhd does in fact come with an application called
uhd_rx_cfile, which is more or less a clone of rx_samples_to_file using
gr-uhd and GNU Radio instead of raw UHD. Does that work out for you?

I
understand that this is theoretical limit of the bus, but if there doesn't
exist a driver or other software to make use of this, the practical limit
becomes much, much smaller.

Well, UHD seems to be able to sustain these rates, if you write to
/dev/null, right? So the practical limit for UHD is definitely not being
hit.
I have another --maybe even practical-- suggestion to make: Roll your own
buffer!

mkfifo /tmp/mybuffer #assuming tmpfs is in ram
dd if=/tmp/mybuffer of=/mount/raid_volume/data.dat & #start in background;
you could play around with block sizes using the bs= option of dd
rx_samples_to_file --file /tmp/mybuffer [all the other options]

By the way: Thanks for bringing this up! We know that recording samples is
a core concern of many users.

Greetings,
Marcus

[1] https://www.kernel.org/doc/Documentation/sysctl/vm.txt

On Fri, Oct 3, 2014 at 10:55 AM, Marcus Müller usrp-users@lists.ettus.com usrp-users@lists.ettus.com
wrote:

I have to agree with Marcus on this. Also, keep in mind that storage is
really what an operating system should take care of in any "general
purpose" scenario, ie. that as long as I just write to a file, I'd expect
that the thing in charge of storage (my kernel / the filesystems / block
device drivers) does the best it can to keep up. If I find myself in a
situation where my specific storage needs dictate a huge write buffer,
changing the application might be one way, but as I'm responsible for my
won storage subsystem, I could just as well increase the cache buffer
sizes, and let the operating system handle storage operation. If your RAID
is really performing as well as it is benchmarked to, then this should not
be one of your problems. All rx_samples_to_file does is really sequentially
writing out data at a constant rate, which is the most basic write
benchmark I can think of.

If your storage subsystem (filesystem + storage abstraction + raid driver

  • interface driver + hard drive interface + hard drives + hardware caches)
    can't keep up, it's failing to perform as specified, simple as that. In
    this case, saying that the application needs to be smarter when dealing
    with storage seems like a bit of a cop-out to me ;)

I'd like to point out that most benchmarks use heavily averaged numbers
for write speeds etc. UHD on the other hand kind of demands soft real-time
performance of a write subsystem, which is a lot harder to fulfill. This
comes up rather frequently, but I have to stress it: you need a fast
guaranteed write rate, not only an average one, and as soon as your
operating system has to postpone writing data[1], it has to have enough
performance to catch up whilst still meeting continued demand. This is
general purpose hardware running general purpose OS with dozens of
processes, and you can't just say "every single component is up to my task,
thus my system suffices", because everything potentially blocks everything!

Greetings,
Marcus

[1] e.g. because the filesystems needs to calculate checksums, update
tables, another process gets scheduled, a device blocks your PCIe bus, your
platters randomly need a bit longer to seek, you reach the physical end of
an LVM volume and have to move across a disk, an interrupt does what an
interrupt does, some process is getting noticed on a changing file
descriptor, DBUS is happening in the kernel, token ring has run out of
tokens, thermal throttling, bitflips on SATA leading to retransmission,
some page getting fetched from swap...

On 03.10.2014 15:34, Marcus D. Leech via USRP-users wrote:

One has to keep firmly in mind that programs like rx_samples_to_file are
examples that show how to use

the underlying UHD API. They are not necessarily optimized for all
situations, and indeed, one could

restructure rx_samples_to_file to decouple UHD I/O from filesystem I/O,
using a large buffer between them.

The fact is that dynamic performance of high-speed, real-time, flows is
something that almost-invariably needs

tweaking for any particular situation. There's no way for an example
application to meet all those requirements.

But the fact also remains that for some systems, rx_samples_to_file
(and uhd_rx_cfile on the Gnu Radio side)

are able to stream high-speed data just fine as-is.

On 2014-10-03 09:26, Peter Witkowski via USRP-users wrote:

To say that the issue is just because the disk subsystem can't keep up is a bit of cop-out.

I had issues writing to disk when the incoming stream was 400MB/s and my RAID0 system was benchmarked at being much higher than that.

The issue that I've been seeing stems from the fact that it appears that you cannot concurrently read/write from the data stream as its coming in. In effect you have a main loop that reads from the device and then immediately tries to write that buffer to file. If you do not complete these operations in a timely fashion overflows occur.

One way to solve (or at least band aid the issue) is to set your dirty_background_ratio to 0. I was able to get writing to disk working somewhat with this setting as it is more predictable to directly write to disk instead of having your write cache fill up and then having a large amount of data to push to disk. That said, my RAID0 array is capable of such speeds and even then I was getting a few (but much reduced) overflows.

The one surefire way I know of getting this working (even on a slow disk system) is to buffer the data. The buffer can then be consumed by the disk writing process while being concurrently added onto by the device reader. The easiest way to test buffering (that I've found) is to simply set up a GNURadio Companion program with a stream-to-vector block between the USRP and file sink blocks. This is exactly what I am doing currently since even with a very powerful system, I could not get data saved to disk quickly enough given the aforementioned issues with the provided UHD software.

On Thu, Oct 2, 2014 at 11:48 PM, gsmandvoip via USRP-users usrp-users@lists.ettus.com usrp-users@lists.ettus.com usrp-users@lists.ettus.com usrp-users@lists.ettus.com wrote:

Thanks Marcus for your replies. Yes O gone away.

On Thu, Oct 2, 2014 at 5:50 PM, Marcus D. Leech mleech@ripnet.com mleech@ripnet.com mleech@ripnet.com mleech@ripnet.com wrote:

with rx_samples_to_file without _4rx.rbf, Initially I tried on my i3, 4GB ram, it gave me
some OOOO but was lesser than earlier, but I do not understand, my most of the ram capacity and processor was sitting idle while it shows OOOO, why is this strange behaviour The default format for uhd_rx_cfile is complex-float, thus doubling the amount of data written compared to rx_samples_to_file.

You can't just use CPU usage as an indicator of loading--if you're writing to disk, the disk subsystem may be much slower than you think, so the
"rate limiting step" is writes to the disk, not computational elements.

Try using /dev/null as the file that you write to. If the 'O' go away, even at higher sampling rates, then it's your disk subsystem.

using uhd_rx_cfile getting similar result, but strangely, why it is low, at 4M sampling rate it was higher???

On Thu, Oct 2, 2014 at 9:27 AM, Marcus D. Leech mleech@ripnet.com mleech@ripnet.com mleech@ripnet.com mleech@ripnet.com wrote:

On 10/01/2014 11:46 PM, gsmandvoip wrote:

Yes I am running single channel, but when trying to achieve my desired sampling rate without _4rx.rbf, it says, requested sampling rate is not valid, adjusting to some 3.9M or so. sorry for misleading info I gave earlier, I have i3, with 32 bit and i7 with 64 bit, but getting same result on both machines

Here is my command to capture signal:

./rx_samples_to_file --args="fpga=usrp1_fpga_4rx.rbf, subdev=DBSRX" --freq "$FC" --rate="$SR" $FILE --nsamps "$NSAMPLES"

and here is its output:

Creating the usrp device with: fpga=usrp1_fpga_4rx.rbf, subdev=DBSRX...
-- Loading firmware image: /usr/share/uhd/images/usrp1_fw.ihx... done
-- Opening a USRP1 device...
-- Loading FPGA image: /usr/share/uhd/images/usrp1_
fpga_4rx.rbf... done
-- Using FPGA clock rate of 52.000000MHz...
ERROR: LOOKUPERROR: INDEXERROR: MULTI_USRP::GET_TX_SUBDEV_SPEC(0) FAILED TO MAKE DEFAULT SPEC - VALUEERROR: THE SUBDEVICE SPECIFICATION "A:0" IS TOO LONG.
The user specified 1 channels, but there are only 0 tx dsps on mboard 0.

Don't use the _4rx image if you don't need it.

The USRP1 only does strict-integer resampling, and with a master clock (NON STANDARD FOR USRP1) of 52.000MHz, 4Msps is not a sample rate
that it can produce. Try 5.2Msps or 4.3333Msps.

At 5.2Msps, it's recording at roughly 20.8Mbytes/second, so your system needs to be able to sustain that for at least as long as the capture lasts.


USRP-users mailing listUSRP-users@lists.ettus.comhttp://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com


USRP-users mailing listUSRP-users@lists.ettus.comhttp://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com


USRP-users mailing listUSRP-users@lists.ettus.comhttp://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com


USRP-users mailing list
USRP-users@lists.ettus.com
http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com

Hi Marcus, The example you provided in your most recent post regarding setting up a RAM FIFO seems to me to be an excellent idea (although I haven't tried it yet). I would just like to comment that these kinds of examples would be extremely helpful if they were added to the UHD documentation in some way. Personally, I am a relative novice using Linux so finding my own way to these kinds of solutions can be time consuming. In order to solve my own issues with overflows using rx_samples_to_file, I setup a RAM file system and simply constrained the memory depth of my captures to the size of the RAM file system. This solution can also be useful, and it would have been helpful to have some tips of this nature in the UHD documentation. I would also like to comment that it would be helpful if the rx_samples_to_file utility worked for multiple channels. I understand the "it's only an example" mindset, but on the other hand, there is no "actual" application level software provided with the Ettus boards so the examples take on a greater level of importance. It would only minorly add to the complexity of this example to have it work for multiple channels. And, given the number of users that seem to be interested in capturing Rx data, it would be a welcome change - especially for non-programmers. Rob Kossler On Sat, Oct 4, 2014 at 9:14 AM, Marcus Müller <usrp-users@lists.ettus.com> wrote: > Hi Peter, > > didn't mean to confuse you! Actually, my job is doing the opposite (ie. > providing useful information), and thus let me just shortly follow up on > this: > On 03.10.2014 17:44, Peter Witkowski via USRP-users wrote: > > So I'm confused. > > You state that if I can't use rx_samples_to_file, my system is failing to > perform as specified to write data out, then you give an example of several > things that can happen to create a stochastic write speed (which I totally > understand and agree with). Given that writes can be stochastic, why is > there not a software buffer implemented in the UHD sample code to account > for such issues? > > Well, because that's, in my opinion, an operating system's job. Being a > code example, rx_samples_to_file just *musn't* contain the complexity > introduced when you try to implement buffering functionality smarter than > what your OS can do. And, I do think it's nearly impossible to be smarter > than the linux kernel when optimizing writes -- *but* you'll have to tell > your kernel what you want, as a user. The kernel, as it is configured by > any modern distribution by default, won't do enormous write buffers, > because that's not what the user usually wants, increasing the risk of data > loss in case of system failure, and because you usually don't want to spend > all of your RAM on filesystem buffers. In your 64GB RAM case, though, > default buffer sizes should suffice, I guess, so I'm a bit out of clues > here. > It is definitely not very hard to increase these buffers' sizes[1], so I > encourage you to try it and see if that solves your problem. Now, I must > admit that up to here I was always assuming you hadn't already played > around with these values, if this is not the case, please accept my > apologies! > > I understand that it's meant to be an example, but I've > also seen it referenced as being used effectively as a debugger or test for > people having issues (i.e. recommendation to use the UHD programs in place > of GNURadio to resolve issues). > > ...and it's done many users and thus Ettus a great job of supplying basic > functionality! The fact that it works in almost any situation with this > very minimalistic approach (repeated recv->write) proves that UHD is in > fact a rather nice driver interface, IMHO. The fact that GNU Radio > sometimes solves issues that rx_samples_to_file can't indicates exactly the > buffering approach to be helpful. But in that case, buffering is not > increased by increasing kernel buffer sizes, but by introducing GNU Radio > buffers between blocks. The USRP source (Martin, scold me if I say > something stupid) is not really much smarter than rx_samples_to_file: It > recv()s a packet of samples, and returns these samples from the work > function, and then GNU Radio takes care of shuffling and buffering that > data. Basically, GNU Radio behaves much like an operating system from the > source block's point of view. > > > Also, in terms of benchmarking, I'm quoting minimum values, not averages. > I agree with you that average values are pointless, and in reality the disk > subsystem needs to perform when called up. My minimum values for a 4 disk > RAID0 with a dedicated controller are well within the data rate that I am > pushing. > > Well, I'll kind of disagree with you: If your minimum write rate of your > system was bigger than the rate rx_samples_to_file causes, then you > wouldn't see the problem. The point, I believe, here is that the storage > system does not only consist of the hardware side of your RAID, but also on > your complete operating environment. Something slows down how fast data is > written to the RAID. > I think we both would expect the following to happen: > > repeatedly: > > rx_samples_to_file: > uhd::rx_streamer::recv > (blocks until a packet of samples has arrived. Instantly returns if it > has before the call) > write(file_handle, recv_buff) > (instantly returns, because writing should hit a buffer that the > operating system transparently pushes out to a disk. If buffer is full, > then block until enough space in buffer -- unless your filesystem is > mounted with some sync option...) > > Now, if your RAID is definitely fast enough, the write buffer should never > get full. My hypothesis here is that either, your buffer size is just to > small, and a block of samples doesn't fit and has to be written out > instantly (which is unlikely), or something else occupies your system. That > might be just the fact that 400MB/s (are we talking about an X3x0?) > inevitably places a heavy load on things like PCIe busses and CPUs, and > that introduces a bottleneck in your storage chain which isn't there if you > "just" benchmark without the USRP. Also, the rather smallish sizes of > network packets dictate that journalling file systems introduce a very bad > overhead -- I don't know if you benchmarked with files on a journaling file > system and a (network packet size - header) block size... > > > Is there an example system that can handle sustained data capture from the > USRP at (or near the limits) of 10GigE or the PCIe interfaces (maybe the > requirement is enterprise class PCIe SSDs)? I'm running a two socket Xenon > system (two hex core processors) with 64GB of RAM. How much more hardware > should I throw at the problem to be able to sample/write at 100MS (half of > what is quoted on the website for bandwidth for the 10GigE kit) using the > provided code? > > Definitely a nice system! I must admit that I don't have access to a > comparable setup, and thus I can't really offer you any first-hand > experience. Maybe others can. > > I think the issue here is that the code itself can't simply get through > it's main loop fast enough. There's a difference between data bandwidth > and CPU throughput. The sequential nature of the code means that if any > weird stuff happens (your example was a good set of kernel related hilarity > that can lead to stochastic timing) you will have overflows since you > cannot read fast enough. This is why a 90% solution for my application was > to just set the dirty_background_ratio to 0 and also why redirection to > /dev/null makes overflows go away. > > This is interesting, as dirty_background_ratio is the percentage at which > the kernel should start writing out dirty pages in the background. Now, I'm > the one who's confused, because I would have expected this to negatively > impact performance. On the other hand, 0 (at least in my head) does not > make very much sense, maybe it's semantically identical to 100%? Are you > swapping (64GB would tell me you shouldn't have swap or extremly low > swappiness)? > On the other hand, it might really be that storage is not the bottleneck > here, and in fact maybe the CPU gets saturated. Now, you said that writing > to /dev/null solves your problem. Do your RAID or filesystem consume a lot > of CPU cycles? This is an interesting mystery... > > With either method I didn't have to > wait for a large write cache to flush before moving on to the next read > from the USRP. Note that there can also be things that happen on the read > side as well. Does this mean that I can only run the code on an RTOS? > > No :) UHD has it's own incoming buffer handlers, but as you already said, > in this high performance scenario, you might be totally right, and our > single-threaded approach just doesn't cut it. Maybe dropping in some > asynchronous storage IO would help -- but I hate seeing that blowing up in > example users' faces, so I guess the fact that it doesn't work with a > system as potent as yours with the sample rates as high as you demand might > actually be a shortcoming of the examples that isn't going to be fixed. > > As a final note, my understanding is that GNURadio and the USRP were > developed for domain experts in DSP to use. > > These are SDR frameworks and devices, respectively. The idea is to offer > people with the opportunity to build awesome DSP systems using universally > usable SDR blocks (GNU Radio) and universal software radio peripherals, so > well, they certainly address DSP people, but they shouldn't be hard to use. > > These users may or may not > have prior experience in software. As a result, I'd recommend perhaps > adding a buffered example or have the USRP GNURadio block allow for > buffering. > > That is something we might consider. On the other hand, when someone goes > as far as you do, maybe having an example that does the buffering in a > separate thread (or even process) isn't worth that much -- in the end, one > will want to write one's own high performance application, and that will > include handling such data rates. > > Otherwise, I just don't see how you can advertise 200 MS/s > (maybe even a simple "buffer" block in GNURadio would do the trick?). > > Well, the devices support these rates, and our driver is able to > withstand these rates and sustain them without hitting CPU barriers due to > having too much overhead. That's awesome (ok, I might be biased, but *I* > think it's awesome). I don't feel ashamed because on your specific setup, > we can't find a way to make any of our generic examples deliver the full > rate of rx streams to storage -- we sell RF hardware, and not storage > infrastructure, and the point of the examples is demonstrating the usage of > UHD, and not holding a lecture on high performance storage handling. I > wish, though, that we could solve your problem. > > Now, GNU Radio/gr-uhd does in fact come with an application called > uhd_rx_cfile, which is more or less a clone of rx_samples_to_file using > gr-uhd and GNU Radio instead of raw UHD. Does that work out for you? > > I > understand that this is theoretical limit of the bus, but if there doesn't > exist a driver or other software to make use of this, the practical limit > becomes much, much smaller. > > Well, UHD seems to be able to sustain these rates, if you write to > /dev/null, right? So the practical limit for UHD is definitely not being > hit. > I have another --maybe even practical-- suggestion to make: Roll your own > buffer! > > mkfifo /tmp/mybuffer #assuming tmpfs is in ram > dd if=/tmp/mybuffer of=/mount/raid_volume/data.dat & #start in background; > you could play around with block sizes using the bs= option of dd > rx_samples_to_file --file /tmp/mybuffer [all the other options] > > By the way: Thanks for bringing this up! We know that recording samples is > a core concern of many users. > > Greetings, > Marcus > > [1] https://www.kernel.org/doc/Documentation/sysctl/vm.txt > > > On Fri, Oct 3, 2014 at 10:55 AM, Marcus Müller <usrp-users@lists.ettus.com> <usrp-users@lists.ettus.com> > wrote: > > > I have to agree with Marcus on this. Also, keep in mind that storage is > really what an operating system should take care of in any "general > purpose" scenario, ie. that as long as I just write to a file, I'd expect > that the thing in charge of storage (my kernel / the filesystems / block > device drivers) does the best it can to keep up. If I find myself in a > situation where my specific storage needs dictate a huge write buffer, > changing the application might be one way, but as I'm responsible for my > won storage subsystem, I could just as well increase the cache buffer > sizes, and let the operating system handle storage operation. If your RAID > is really performing as well as it is benchmarked to, then this should not > be one of your problems. All rx_samples_to_file does is really sequentially > writing out data at a constant rate, which is the most basic write > benchmark I can think of. > > If your storage subsystem (filesystem + storage abstraction + raid driver > + interface driver + hard drive interface + hard drives + hardware caches) > can't keep up, it's failing to perform as specified, simple as that. In > this case, saying that the application needs to be smarter when dealing > with storage seems like a bit of a cop-out to me ;) > > I'd like to point out that most benchmarks use heavily averaged numbers > for write speeds etc. UHD on the other hand kind of demands soft real-time > performance of a write subsystem, which is a lot harder to fulfill. This > comes up rather frequently, but I have to stress it: you need a fast > guaranteed write rate, not only an average one, and as soon as your > operating system has to postpone writing data[1], it has to have enough > performance to catch up whilst still meeting continued demand. This is > general purpose hardware running general purpose OS with dozens of > processes, and you can't just say "every single component is up to my task, > thus my system suffices", because everything potentially blocks everything! > > Greetings, > Marcus > > [1] e.g. because the filesystems needs to calculate checksums, update > tables, another process gets scheduled, a device blocks your PCIe bus, your > platters randomly need a bit longer to seek, you reach the physical end of > an LVM volume and have to move across a disk, an interrupt does what an > interrupt does, some process is getting noticed on a changing file > descriptor, DBUS is happening in the kernel, token ring has run out of > tokens, thermal throttling, bitflips on SATA leading to retransmission, > some page getting fetched from swap... > > > On 03.10.2014 15:34, Marcus D. Leech via USRP-users wrote: > > > > One has to keep firmly in mind that programs like rx_samples_to_file are > *examples* that show how to use > > the underlying UHD API. They are not necessarily optimized for all > situations, and indeed, one could > > restructure rx_samples_to_file to decouple UHD I/O from filesystem I/O, > using a large buffer between them. > > The fact is that dynamic performance of high-speed, real-time, flows is > something that almost-invariably needs > > tweaking for any particular situation. There's no way for an example > application to meet all those requirements. > > But the fact also remains that for *some* systems, rx_samples_to_file > (and uhd_rx_cfile on the Gnu Radio side) > > are able to stream high-speed data just fine as-is. > > On 2014-10-03 09:26, Peter Witkowski via USRP-users wrote: > > > To say that the issue is just because the disk subsystem can't keep up is a bit of cop-out. > > I had issues writing to disk when the incoming stream was 400MB/s and my RAID0 system was benchmarked at being much higher than that. > > The issue that I've been seeing stems from the fact that it appears that you cannot concurrently read/write from the data stream as its coming in. In effect you have a main loop that reads from the device and then immediately tries to write that buffer to file. If you do not complete these operations in a timely fashion overflows occur. > > One way to solve (or at least band aid the issue) is to set your dirty_background_ratio to 0. I was able to get writing to disk working somewhat with this setting as it is more predictable to directly write to disk instead of having your write cache fill up and then having a large amount of data to push to disk. That said, my RAID0 array is capable of such speeds and even then I was getting a few (but much reduced) overflows. > > The one surefire way I know of getting this working (even on a slow disk system) is to buffer the data. The buffer can then be consumed by the disk writing process while being concurrently added onto by the device reader. The easiest way to test buffering (that I've found) is to simply set up a GNURadio Companion program with a stream-to-vector block between the USRP and file sink blocks. This is exactly what I am doing currently since even with a very powerful system, I could not get data saved to disk quickly enough given the aforementioned issues with the provided UHD software. > > On Thu, Oct 2, 2014 at 11:48 PM, gsmandvoip via USRP-users <usrp-users@lists.ettus.com> <usrp-users@lists.ettus.com> <usrp-users@lists.ettus.com> <usrp-users@lists.ettus.com> wrote: > > Thanks Marcus for your replies. Yes O gone away. > > On Thu, Oct 2, 2014 at 5:50 PM, Marcus D. Leech <mleech@ripnet.com> <mleech@ripnet.com> <mleech@ripnet.com> <mleech@ripnet.com> wrote: > > with rx_samples_to_file without _4rx.rbf, Initially I tried on my i3, 4GB ram, it gave me > some OOOO but was lesser than earlier, but I do not understand, my most of the ram capacity and processor was sitting idle while it shows OOOO, why is this strange behaviour The default format for uhd_rx_cfile is complex-float, thus doubling the amount of data written compared to rx_samples_to_file. > > You can't just use CPU usage as an indicator of loading--if you're writing to disk, the disk subsystem may be much slower than you think, so the > "rate limiting step" is writes to the disk, not computational elements. > > Try using /dev/null as the file that you write to. If the 'O' go away, even at higher sampling rates, then it's your disk subsystem. > > using uhd_rx_cfile getting similar result, but strangely, why it is low, at 4M sampling rate it was higher??? > > On Thu, Oct 2, 2014 at 9:27 AM, Marcus D. Leech <mleech@ripnet.com> <mleech@ripnet.com> <mleech@ripnet.com> <mleech@ripnet.com> wrote: > > On 10/01/2014 11:46 PM, gsmandvoip wrote: > > Yes I am running single channel, but when trying to achieve my desired sampling rate without _4rx.rbf, it says, requested sampling rate is not valid, adjusting to some 3.9M or so. sorry for misleading info I gave earlier, I have i3, with 32 bit and i7 with 64 bit, but getting same result on both machines > > Here is my command to capture signal: > > ./rx_samples_to_file --args="fpga=usrp1_fpga_4rx.rbf, subdev=DBSRX" --freq "$FC" --rate="$SR" $FILE --nsamps "$NSAMPLES" > > and here is its output: > > Creating the usrp device with: fpga=usrp1_fpga_4rx.rbf, subdev=DBSRX... > -- Loading firmware image: /usr/share/uhd/images/usrp1_fw.ihx... done > -- Opening a USRP1 device... > -- Loading FPGA image: /usr/share/uhd/images/usrp1_ > fpga_4rx.rbf... done > -- Using FPGA clock rate of 52.000000MHz... > ERROR: LOOKUPERROR: INDEXERROR: MULTI_USRP::GET_TX_SUBDEV_SPEC(0) FAILED TO MAKE DEFAULT SPEC - VALUEERROR: THE SUBDEVICE SPECIFICATION "A:0" IS TOO LONG. > The user specified 1 channels, but there are only 0 tx dsps on mboard 0. > > Don't use the _4rx image if you don't need it. > > The USRP1 only does strict-integer resampling, and with a master clock (NON STANDARD FOR USRP1) of 52.000MHz, 4Msps is not a sample rate > that it can produce. Try 5.2Msps or 4.3333Msps. > > At 5.2Msps, it's recording at roughly 20.8Mbytes/second, so your system needs to be able to sustain that for at least as long as the capture lasts. > > > > _______________________________________________ > USRP-users mailing listUSRP-users@lists.ettus.comhttp://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com > > > > _______________________________________________ > USRP-users mailing listUSRP-users@lists.ettus.comhttp://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com > > > > _______________________________________________ > USRP-users mailing listUSRP-users@lists.ettus.comhttp://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com > > > > _______________________________________________ > USRP-users mailing list > USRP-users@lists.ettus.com > http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com > >
PW
Peter Witkowski
Mon, Oct 6, 2014 7:33 PM

Marcus,

Great discussion so far.

I think my point is that earlier I saw that it was alluded to that if
rx_samples_to_file fails, then your disk subsystem is to blame and you
should invest in more hardware.  My point was that the application in
general can benefit from multi-threading and internal buffering, otherwise
overflows are a fact of life for high data rates.  While the OS can do more
buffering, you are required to do periodic (at a rather high rate) reads of
the USRP, otherwise its buffers will overflow.  This is why it is critical
to minimize the time between two successive reads via threading and having
a buffer that can concurrently be added onto (from the device) and
processed (by writing to disk).

I think we can agree that there's a delta between what the I/O can sustain
(and what is displayed on a synthetic benchmark) versus what is actually
happening on the machine.  There's potentially a good deal of CPU cycles
that occur between two reads (a vast percentage of these are during disk
I/O, especially if the device is "busy") that seem to be causing the
overflow.  Stated differently, I can have a bunch of blocking reads occur
and sustain the needed rate.  Similarly (and this is what I see via my
benchmarks), I can have a series of blocking writes occur and sustain the
needed data rate.  However, the combination thereof (as well as the fact
that the kernel likes to preempt things) might be unsustainable in a single
threaded application.  That is, by the time I get to the next read, enough
time has passed to overwhelm the USRP read buffers.

My point with the dirty background ratio is that when set to zero, I have a
predictable amount of time that I block for each write.  My understanding
is that once triggered, background flushes will attempt to write at 100%
device speed.  During this time, additional device access (such as those
being called by the application) will be blocked (sequential writes are
therefore performance degraded).  When set to 0, I see in iotop that my
writes are a consistent value which is right about where resource monitor
shows the incoming 10GigE bandwidth to be.  Setting the
dirty_background_ratio any higher and the "Total Disk Write" and "Actual
Disk Write" differ wildly.  "Actual Disk Write" will spike at times, and
these spikes correlate with additional overflow errors.  When the device is
more stable (i.e. "Actual Disk Write" is consistent) I see a huge reduction
in overflows.  Stated differently, I'm OK with a flush, as long as its a
quick one.  In the cases of larger flushes (as is the case with a
dirty_background_ratio greater than 0), I can almost guarantee an overflow
(I can't get around the mainloop fast enough if the device is busy with a
long write).  Note that for my buffered application, I do run a default
background ratio.

The only thing that I got to work was buffering, and the easiest way I know
to accomplish this is to place a stream-to-vector block (in GNURadio)
between the USRP source and the file sink.  The vectorization effectively
works as a buffer, and was the only way I was able to get 100 MS/s working.

Other things I tried:

  1. Using CPU affinity and shielding.
  2. Increasing processes priority to 99 and SCHED_FIFO.

Also, I can confirm that rx_samples_to_file and uhd_rx_cfile (as well as
GNU Radio application that does effectively the same thing, but controls
CPU affinity) all behave the exact same way in terms of the number of
overflows I have encountered.

If there are any other kernel tweaks that I can try, please let me know.
However, in practice, my CPU is not fast enough to get through the mainloop
of rx_samples_to_file quickly enough.  As discussed above, buffering was
the only way I was able to consistently ensure that my data was saved and I
didn't encounter an overflow.

If you would like, I can provide whatever benchmarks of the system that you
would like.  Perhaps I am not using the correct tool when I quote my
minimum write speeds to disk.

On Sat, Oct 4, 2014 at 9:14 AM, Marcus Müller usrp-users@lists.ettus.com
wrote:

Hi Peter,

didn't mean to confuse you! Actually, my job is doing the opposite (ie.
providing useful information), and thus let me just shortly follow up on
this:
On 03.10.2014 17:44, Peter Witkowski via USRP-users wrote:

So I'm confused.

You state that if I can't use rx_samples_to_file, my system is failing to
perform as specified to write data out, then you give an example of several
things that can happen to create a stochastic write speed (which I totally
understand and agree with).  Given that writes can be stochastic, why is
there not a software buffer implemented in the UHD sample code to account
for such issues?

Well, because that's, in my opinion, an operating system's job. Being a
code example, rx_samples_to_file just musn't contain the complexity
introduced when you try to implement buffering functionality smarter than
what your OS can do. And, I do think it's nearly impossible to be smarter
than the linux kernel when optimizing writes -- but you'll have to tell
your kernel what you want, as a user. The kernel, as it is configured by
any modern distribution by default, won't do enormous write buffers,
because that's not what the user usually wants, increasing the risk of data
loss in case of system failure, and because you usually don't want to spend
all of your RAM on filesystem buffers. In your 64GB RAM case, though,
default buffer sizes should suffice, I guess, so I'm a bit out of clues
here.
It is definitely not very hard to increase these buffers' sizes[1], so I
encourage you to try it and see if that solves your problem. Now, I must
admit that up to here I was always assuming you hadn't already played
around with these values, if this is not the case, please accept my
apologies!

I understand that it's meant to be an example, but I've
also seen it referenced as being used effectively as a debugger or test for
people having issues (i.e. recommendation to use the UHD programs in place
of GNURadio to resolve issues).

...and it's done many users and thus Ettus a great job of supplying basic
functionality! The fact that it works in almost any situation with this
very minimalistic approach (repeated recv->write) proves that UHD is in
fact a rather nice driver interface, IMHO. The fact that GNU Radio
sometimes solves issues that rx_samples_to_file can't indicates exactly the
buffering approach to be helpful. But in that case, buffering is not
increased by increasing kernel buffer sizes, but by introducing GNU Radio
buffers between blocks. The USRP source (Martin, scold me if I say
something stupid) is not really much smarter than rx_samples_to_file: It
recv()s a packet of samples, and returns these samples from the work
function, and then GNU Radio takes care of shuffling and buffering that
data. Basically, GNU Radio behaves much like an operating system from the
source block's point of view.

Also, in terms of benchmarking, I'm quoting minimum values, not averages.
I agree with you that average values are pointless, and in reality the disk
subsystem needs to perform when called up.  My minimum values for a 4 disk
RAID0 with a dedicated controller are well within the data rate that I am
pushing.

Well, I'll kind of disagree with you: If your minimum write rate of your
system was bigger than the rate rx_samples_to_file causes, then you
wouldn't see the problem. The point, I believe, here is that the storage
system does not only consist of the hardware side of your RAID, but also on
your complete operating environment. Something slows down how fast data is
written to the RAID.
I think we both would expect the following to happen:

repeatedly:

rx_samples_to_file:
uhd::rx_streamer::recv
(blocks until a packet of samples has arrived. Instantly returns if it
has before the call)
write(file_handle, recv_buff)
(instantly returns, because writing should hit a buffer that the
operating system transparently pushes out to a disk. If buffer is full,
then block until enough space in buffer -- unless your filesystem is
mounted with some sync option...)

Now, if your RAID is definitely fast enough, the write buffer should never
get full. My hypothesis here is that either, your buffer size is just to
small, and a block of samples doesn't fit and has to be written out
instantly (which is unlikely), or something else occupies your system. That
might be just the fact that 400MB/s (are we talking about an X3x0?)
inevitably places a heavy load on things like PCIe busses and CPUs, and
that introduces a bottleneck in your storage chain which isn't there if you
"just" benchmark without the USRP. Also, the rather smallish sizes of
network packets dictate that journalling file systems introduce a very bad
overhead -- I don't know if you benchmarked with files on a journaling file
system and a (network packet size - header) block size...

Is there an example system that can handle sustained data capture from the
USRP at (or near the limits) of 10GigE or the PCIe interfaces (maybe the
requirement is enterprise class PCIe SSDs)?  I'm running a two socket Xenon
system (two hex core processors) with 64GB of RAM.  How much more hardware
should I throw at the problem to be able to sample/write at 100MS (half of
what is quoted on the website for bandwidth for the 10GigE kit) using the
provided code?

Definitely a nice system! I must admit that I don't have access to a
comparable setup, and thus I can't really offer you any first-hand
experience. Maybe others can.

I think the issue here is that the code itself can't simply get through
it's main loop fast enough.  There's a difference between data bandwidth
and CPU throughput.  The sequential nature of the code means that if any
weird stuff happens (your example was a good set of kernel related hilarity
that can lead to stochastic timing) you will have overflows since you
cannot read fast enough.  This is why a 90% solution for my application was
to just set the dirty_background_ratio to 0 and also why redirection to
/dev/null makes overflows go away.

This is interesting, as dirty_background_ratio is the percentage at which
the kernel should start writing out dirty pages in the background. Now, I'm
the one who's confused, because I would have expected this to negatively
impact performance. On the other hand, 0 (at least in my head) does not
make very much sense, maybe it's semantically identical to 100%? Are you
swapping (64GB would tell me you shouldn't have swap or extremly low
swappiness)?
On the other hand, it might really be that storage is not the bottleneck
here, and in fact maybe the CPU gets saturated. Now, you said that writing
to /dev/null solves your problem. Do your RAID or filesystem consume a lot
of CPU cycles? This is an interesting mystery...

With either method I didn't have to
wait for a large write cache to flush before moving on to the next read
from the USRP.  Note that there can also be things that happen on the read
side as well.  Does this mean that I can only run the code on an RTOS?

No :) UHD has it's own incoming buffer handlers, but as you already said,
in this high performance scenario, you might be totally right, and our
single-threaded approach just doesn't cut it. Maybe dropping in some
asynchronous storage IO would help -- but I hate seeing that blowing up in
example users' faces, so I guess the fact that it doesn't work with a
system as potent as yours with the sample rates as high as you demand might
actually be a shortcoming of the examples that isn't going to be fixed.

As a final note, my understanding is that GNURadio and the USRP were
developed for domain experts in DSP to use.

These are SDR frameworks and devices, respectively. The idea is to offer
people with the opportunity to build awesome DSP systems using universally
usable SDR blocks (GNU Radio) and universal software radio peripherals, so
well, they certainly address DSP people, but they shouldn't be hard to use.

These users may or may not
have prior experience in software.  As a result, I'd recommend perhaps
adding a buffered example or have the USRP GNURadio block allow for
buffering.

That is something we might consider. On the other hand, when someone goes
as far as you do, maybe having an example that does the buffering in a
separate thread (or even process) isn't worth that much -- in the end, one
will want to write one's own high performance application, and that will
include handling such data rates.

Otherwise, I just don't see how you can advertise 200 MS/s
(maybe even a simple "buffer" block in GNURadio would do the trick?).

Well, the devices support these rates, and our driver is able to
withstand these rates and sustain them without hitting CPU barriers due to
having too much overhead. That's awesome (ok, I might be biased, but I
think it's awesome). I don't feel ashamed because on your specific setup,
we can't find a way to make any of our generic examples deliver the full
rate of rx streams to storage -- we sell RF hardware, and not storage
infrastructure, and the point of the examples is demonstrating the usage of
UHD, and not holding a lecture on high performance storage handling. I
wish, though, that we could solve your problem.

Now, GNU Radio/gr-uhd does in fact come with an application called
uhd_rx_cfile, which is more or less a clone of rx_samples_to_file using
gr-uhd and GNU Radio instead of raw UHD. Does that work out for you?

I
understand that this is theoretical limit of the bus, but if there doesn't
exist a driver or other software to make use of this, the practical limit
becomes much, much smaller.

Well, UHD seems to be able to sustain these rates, if you write to
/dev/null, right? So the practical limit for UHD is definitely not being
hit.
I have another --maybe even practical-- suggestion to make: Roll your own
buffer!

mkfifo /tmp/mybuffer #assuming tmpfs is in ram
dd if=/tmp/mybuffer of=/mount/raid_volume/data.dat & #start in background;
you could play around with block sizes using the bs= option of dd
rx_samples_to_file --file /tmp/mybuffer [all the other options]

By the way: Thanks for bringing this up! We know that recording samples is
a core concern of many users.

Greetings,
Marcus

[1] https://www.kernel.org/doc/Documentation/sysctl/vm.txt

On Fri, Oct 3, 2014 at 10:55 AM, Marcus Müller usrp-users@lists.ettus.com usrp-users@lists.ettus.com
wrote:

I have to agree with Marcus on this. Also, keep in mind that storage is
really what an operating system should take care of in any "general
purpose" scenario, ie. that as long as I just write to a file, I'd expect
that the thing in charge of storage (my kernel / the filesystems / block
device drivers) does the best it can to keep up. If I find myself in a
situation where my specific storage needs dictate a huge write buffer,
changing the application might be one way, but as I'm responsible for my
won storage subsystem, I could just as well increase the cache buffer
sizes, and let the operating system handle storage operation. If your RAID
is really performing as well as it is benchmarked to, then this should not
be one of your problems. All rx_samples_to_file does is really sequentially
writing out data at a constant rate, which is the most basic write
benchmark I can think of.

If your storage subsystem (filesystem + storage abstraction + raid driver

  • interface driver + hard drive interface + hard drives + hardware caches)
    can't keep up, it's failing to perform as specified, simple as that. In
    this case, saying that the application needs to be smarter when dealing
    with storage seems like a bit of a cop-out to me ;)

I'd like to point out that most benchmarks use heavily averaged numbers
for write speeds etc. UHD on the other hand kind of demands soft real-time
performance of a write subsystem, which is a lot harder to fulfill. This
comes up rather frequently, but I have to stress it: you need a fast
guaranteed write rate, not only an average one, and as soon as your
operating system has to postpone writing data[1], it has to have enough
performance to catch up whilst still meeting continued demand. This is
general purpose hardware running general purpose OS with dozens of
processes, and you can't just say "every single component is up to my task,
thus my system suffices", because everything potentially blocks everything!

Greetings,
Marcus

[1] e.g. because the filesystems needs to calculate checksums, update
tables, another process gets scheduled, a device blocks your PCIe bus, your
platters randomly need a bit longer to seek, you reach the physical end of
an LVM volume and have to move across a disk, an interrupt does what an
interrupt does, some process is getting noticed on a changing file
descriptor, DBUS is happening in the kernel, token ring has run out of
tokens, thermal throttling, bitflips on SATA leading to retransmission,
some page getting fetched from swap...

On 03.10.2014 15:34, Marcus D. Leech via USRP-users wrote:

One has to keep firmly in mind that programs like rx_samples_to_file are
examples that show how to use

the underlying UHD API. They are not necessarily optimized for all
situations, and indeed, one could

restructure rx_samples_to_file to decouple UHD I/O from filesystem I/O,
using a large buffer between them.

The fact is that dynamic performance of high-speed, real-time, flows is
something that almost-invariably needs

tweaking for any particular situation. There's no way for an example
application to meet all those requirements.

But the fact also remains that for some systems, rx_samples_to_file
(and uhd_rx_cfile on the Gnu Radio side)

are able to stream high-speed data just fine as-is.

On 2014-10-03 09:26, Peter Witkowski via USRP-users wrote:

To say that the issue is just because the disk subsystem can't keep up is a bit of cop-out.

I had issues writing to disk when the incoming stream was 400MB/s and my RAID0 system was benchmarked at being much higher than that.

The issue that I've been seeing stems from the fact that it appears that you cannot concurrently read/write from the data stream as its coming in. In effect you have a main loop that reads from the device and then immediately tries to write that buffer to file. If you do not complete these operations in a timely fashion overflows occur.

One way to solve (or at least band aid the issue) is to set your dirty_background_ratio to 0. I was able to get writing to disk working somewhat with this setting as it is more predictable to directly write to disk instead of having your write cache fill up and then having a large amount of data to push to disk. That said, my RAID0 array is capable of such speeds and even then I was getting a few (but much reduced) overflows.

The one surefire way I know of getting this working (even on a slow disk system) is to buffer the data. The buffer can then be consumed by the disk writing process while being concurrently added onto by the device reader. The easiest way to test buffering (that I've found) is to simply set up a GNURadio Companion program with a stream-to-vector block between the USRP and file sink blocks. This is exactly what I am doing currently since even with a very powerful system, I could not get data saved to disk quickly enough given the aforementioned issues with the provided UHD software.

On Thu, Oct 2, 2014 at 11:48 PM, gsmandvoip via USRP-users usrp-users@lists.ettus.com usrp-users@lists.ettus.com usrp-users@lists.ettus.com usrp-users@lists.ettus.com wrote:

Thanks Marcus for your replies. Yes O gone away.

On Thu, Oct 2, 2014 at 5:50 PM, Marcus D. Leech mleech@ripnet.com mleech@ripnet.com mleech@ripnet.com mleech@ripnet.com wrote:

with rx_samples_to_file without _4rx.rbf, Initially I tried on my i3, 4GB ram, it gave me
some OOOO but was lesser than earlier, but I do not understand, my most of the ram capacity and processor was sitting idle while it shows OOOO, why is this strange behaviour The default format for uhd_rx_cfile is complex-float, thus doubling the amount of data written compared to rx_samples_to_file.

You can't just use CPU usage as an indicator of loading--if you're writing to disk, the disk subsystem may be much slower than you think, so the
"rate limiting step" is writes to the disk, not computational elements.

Try using /dev/null as the file that you write to. If the 'O' go away, even at higher sampling rates, then it's your disk subsystem.

using uhd_rx_cfile getting similar result, but strangely, why it is low, at 4M sampling rate it was higher???

On Thu, Oct 2, 2014 at 9:27 AM, Marcus D. Leech mleech@ripnet.com mleech@ripnet.com mleech@ripnet.com mleech@ripnet.com wrote:

On 10/01/2014 11:46 PM, gsmandvoip wrote:

Yes I am running single channel, but when trying to achieve my desired sampling rate without _4rx.rbf, it says, requested sampling rate is not valid, adjusting to some 3.9M or so. sorry for misleading info I gave earlier, I have i3, with 32 bit and i7 with 64 bit, but getting same result on both machines

Here is my command to capture signal:

./rx_samples_to_file --args="fpga=usrp1_fpga_4rx.rbf, subdev=DBSRX" --freq "$FC" --rate="$SR" $FILE --nsamps "$NSAMPLES"

and here is its output:

Creating the usrp device with: fpga=usrp1_fpga_4rx.rbf, subdev=DBSRX...
-- Loading firmware image: /usr/share/uhd/images/usrp1_fw.ihx... done
-- Opening a USRP1 device...
-- Loading FPGA image: /usr/share/uhd/images/usrp1_
fpga_4rx.rbf... done
-- Using FPGA clock rate of 52.000000MHz...
ERROR: LOOKUPERROR: INDEXERROR: MULTI_USRP::GET_TX_SUBDEV_SPEC(0) FAILED TO MAKE DEFAULT SPEC - VALUEERROR: THE SUBDEVICE SPECIFICATION "A:0" IS TOO LONG.
The user specified 1 channels, but there are only 0 tx dsps on mboard 0.

Don't use the _4rx image if you don't need it.

The USRP1 only does strict-integer resampling, and with a master clock (NON STANDARD FOR USRP1) of 52.000MHz, 4Msps is not a sample rate
that it can produce. Try 5.2Msps or 4.3333Msps.

At 5.2Msps, it's recording at roughly 20.8Mbytes/second, so your system needs to be able to sustain that for at least as long as the capture lasts.


USRP-users mailing listUSRP-users@lists.ettus.comhttp://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com


USRP-users mailing listUSRP-users@lists.ettus.comhttp://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com


USRP-users mailing listUSRP-users@lists.ettus.comhttp://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com


USRP-users mailing list
USRP-users@lists.ettus.com
http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com

--
Peter Witkowski
pwitkowski@gmail.com

Marcus, Great discussion so far. I think my point is that earlier I saw that it was alluded to that if rx_samples_to_file fails, then your disk subsystem is to blame and you should invest in more hardware. My point was that the application in general can benefit from multi-threading and internal buffering, otherwise overflows are a fact of life for high data rates. While the OS can do more buffering, you are required to do periodic (at a rather high rate) reads of the USRP, otherwise its buffers will overflow. This is why it is critical to minimize the time between two successive reads via threading and having a buffer that can concurrently be added onto (from the device) and processed (by writing to disk). I think we can agree that there's a delta between what the I/O can sustain (and what is displayed on a synthetic benchmark) versus what is actually happening on the machine. There's potentially a good deal of CPU cycles that occur between two reads (a vast percentage of these are during disk I/O, especially if the device is "busy") that seem to be causing the overflow. Stated differently, I can have a bunch of blocking reads occur and sustain the needed rate. Similarly (and this is what I see via my benchmarks), I can have a series of blocking writes occur and sustain the needed data rate. However, the combination thereof (as well as the fact that the kernel likes to preempt things) might be unsustainable in a single threaded application. That is, by the time I get to the next read, enough time has passed to overwhelm the USRP read buffers. My point with the dirty background ratio is that when set to zero, I have a predictable amount of time that I block for each write. My understanding is that once triggered, background flushes will attempt to write at 100% device speed. During this time, additional device access (such as those being called by the application) will be blocked (sequential writes are therefore performance degraded). When set to 0, I see in iotop that my writes are a consistent value which is right about where resource monitor shows the incoming 10GigE bandwidth to be. Setting the dirty_background_ratio any higher and the "Total Disk Write" and "Actual Disk Write" differ wildly. "Actual Disk Write" will spike at times, and these spikes correlate with additional overflow errors. When the device is more stable (i.e. "Actual Disk Write" is consistent) I see a huge reduction in overflows. Stated differently, I'm OK with a flush, as long as its a quick one. In the cases of larger flushes (as is the case with a dirty_background_ratio greater than 0), I can almost guarantee an overflow (I can't get around the mainloop fast enough if the device is busy with a long write). Note that for my buffered application, I do run a default background ratio. The only thing that I got to work was buffering, and the easiest way I know to accomplish this is to place a stream-to-vector block (in GNURadio) between the USRP source and the file sink. The vectorization effectively works as a buffer, and was the only way I was able to get 100 MS/s working. Other things I tried: 1. Using CPU affinity and shielding. 2. Increasing processes priority to 99 and SCHED_FIFO. Also, I can confirm that rx_samples_to_file and uhd_rx_cfile (as well as GNU Radio application that does effectively the same thing, but controls CPU affinity) all behave the exact same way in terms of the number of overflows I have encountered. If there are any other kernel tweaks that I can try, please let me know. However, in practice, my CPU is not fast enough to get through the mainloop of rx_samples_to_file quickly enough. As discussed above, buffering was the only way I was able to consistently ensure that my data was saved and I didn't encounter an overflow. If you would like, I can provide whatever benchmarks of the system that you would like. Perhaps I am not using the correct tool when I quote my minimum write speeds to disk. On Sat, Oct 4, 2014 at 9:14 AM, Marcus Müller <usrp-users@lists.ettus.com> wrote: > Hi Peter, > > didn't mean to confuse you! Actually, my job is doing the opposite (ie. > providing useful information), and thus let me just shortly follow up on > this: > On 03.10.2014 17:44, Peter Witkowski via USRP-users wrote: > > So I'm confused. > > You state that if I can't use rx_samples_to_file, my system is failing to > perform as specified to write data out, then you give an example of several > things that can happen to create a stochastic write speed (which I totally > understand and agree with). Given that writes can be stochastic, why is > there not a software buffer implemented in the UHD sample code to account > for such issues? > > Well, because that's, in my opinion, an operating system's job. Being a > code example, rx_samples_to_file just *musn't* contain the complexity > introduced when you try to implement buffering functionality smarter than > what your OS can do. And, I do think it's nearly impossible to be smarter > than the linux kernel when optimizing writes -- *but* you'll have to tell > your kernel what you want, as a user. The kernel, as it is configured by > any modern distribution by default, won't do enormous write buffers, > because that's not what the user usually wants, increasing the risk of data > loss in case of system failure, and because you usually don't want to spend > all of your RAM on filesystem buffers. In your 64GB RAM case, though, > default buffer sizes should suffice, I guess, so I'm a bit out of clues > here. > It is definitely not very hard to increase these buffers' sizes[1], so I > encourage you to try it and see if that solves your problem. Now, I must > admit that up to here I was always assuming you hadn't already played > around with these values, if this is not the case, please accept my > apologies! > > I understand that it's meant to be an example, but I've > also seen it referenced as being used effectively as a debugger or test for > people having issues (i.e. recommendation to use the UHD programs in place > of GNURadio to resolve issues). > > ...and it's done many users and thus Ettus a great job of supplying basic > functionality! The fact that it works in almost any situation with this > very minimalistic approach (repeated recv->write) proves that UHD is in > fact a rather nice driver interface, IMHO. The fact that GNU Radio > sometimes solves issues that rx_samples_to_file can't indicates exactly the > buffering approach to be helpful. But in that case, buffering is not > increased by increasing kernel buffer sizes, but by introducing GNU Radio > buffers between blocks. The USRP source (Martin, scold me if I say > something stupid) is not really much smarter than rx_samples_to_file: It > recv()s a packet of samples, and returns these samples from the work > function, and then GNU Radio takes care of shuffling and buffering that > data. Basically, GNU Radio behaves much like an operating system from the > source block's point of view. > > Also, in terms of benchmarking, I'm quoting minimum values, not averages. > I agree with you that average values are pointless, and in reality the disk > subsystem needs to perform when called up. My minimum values for a 4 disk > RAID0 with a dedicated controller are well within the data rate that I am > pushing. > > Well, I'll kind of disagree with you: If your minimum write rate of your > system was bigger than the rate rx_samples_to_file causes, then you > wouldn't see the problem. The point, I believe, here is that the storage > system does not only consist of the hardware side of your RAID, but also on > your complete operating environment. Something slows down how fast data is > written to the RAID. > I think we both would expect the following to happen: > > repeatedly: > > rx_samples_to_file: > uhd::rx_streamer::recv > (blocks until a packet of samples has arrived. Instantly returns if it > has before the call) > write(file_handle, recv_buff) > (instantly returns, because writing should hit a buffer that the > operating system transparently pushes out to a disk. If buffer is full, > then block until enough space in buffer -- unless your filesystem is > mounted with some sync option...) > > Now, if your RAID is definitely fast enough, the write buffer should never > get full. My hypothesis here is that either, your buffer size is just to > small, and a block of samples doesn't fit and has to be written out > instantly (which is unlikely), or something else occupies your system. That > might be just the fact that 400MB/s (are we talking about an X3x0?) > inevitably places a heavy load on things like PCIe busses and CPUs, and > that introduces a bottleneck in your storage chain which isn't there if you > "just" benchmark without the USRP. Also, the rather smallish sizes of > network packets dictate that journalling file systems introduce a very bad > overhead -- I don't know if you benchmarked with files on a journaling file > system and a (network packet size - header) block size... > > Is there an example system that can handle sustained data capture from the > USRP at (or near the limits) of 10GigE or the PCIe interfaces (maybe the > requirement is enterprise class PCIe SSDs)? I'm running a two socket Xenon > system (two hex core processors) with 64GB of RAM. How much more hardware > should I throw at the problem to be able to sample/write at 100MS (half of > what is quoted on the website for bandwidth for the 10GigE kit) using the > provided code? > > Definitely a nice system! I must admit that I don't have access to a > comparable setup, and thus I can't really offer you any first-hand > experience. Maybe others can. > > I think the issue here is that the code itself can't simply get through > it's main loop fast enough. There's a difference between data bandwidth > and CPU throughput. The sequential nature of the code means that if any > weird stuff happens (your example was a good set of kernel related hilarity > that can lead to stochastic timing) you will have overflows since you > cannot read fast enough. This is why a 90% solution for my application was > to just set the dirty_background_ratio to 0 and also why redirection to > /dev/null makes overflows go away. > > This is interesting, as dirty_background_ratio is the percentage at which > the kernel should start writing out dirty pages in the background. Now, I'm > the one who's confused, because I would have expected this to negatively > impact performance. On the other hand, 0 (at least in my head) does not > make very much sense, maybe it's semantically identical to 100%? Are you > swapping (64GB would tell me you shouldn't have swap or extremly low > swappiness)? > On the other hand, it might really be that storage is not the bottleneck > here, and in fact maybe the CPU gets saturated. Now, you said that writing > to /dev/null solves your problem. Do your RAID or filesystem consume a lot > of CPU cycles? This is an interesting mystery... > > With either method I didn't have to > wait for a large write cache to flush before moving on to the next read > from the USRP. Note that there can also be things that happen on the read > side as well. Does this mean that I can only run the code on an RTOS? > > No :) UHD has it's own incoming buffer handlers, but as you already said, > in this high performance scenario, you might be totally right, and our > single-threaded approach just doesn't cut it. Maybe dropping in some > asynchronous storage IO would help -- but I hate seeing that blowing up in > example users' faces, so I guess the fact that it doesn't work with a > system as potent as yours with the sample rates as high as you demand might > actually be a shortcoming of the examples that isn't going to be fixed. > > As a final note, my understanding is that GNURadio and the USRP were > developed for domain experts in DSP to use. > > These are SDR frameworks and devices, respectively. The idea is to offer > people with the opportunity to build awesome DSP systems using universally > usable SDR blocks (GNU Radio) and universal software radio peripherals, so > well, they certainly address DSP people, but they shouldn't be hard to use. > > These users may or may not > have prior experience in software. As a result, I'd recommend perhaps > adding a buffered example or have the USRP GNURadio block allow for > buffering. > > That is something we might consider. On the other hand, when someone goes > as far as you do, maybe having an example that does the buffering in a > separate thread (or even process) isn't worth that much -- in the end, one > will want to write one's own high performance application, and that will > include handling such data rates. > > Otherwise, I just don't see how you can advertise 200 MS/s > (maybe even a simple "buffer" block in GNURadio would do the trick?). > > Well, the devices support these rates, and our driver is able to > withstand these rates and sustain them without hitting CPU barriers due to > having too much overhead. That's awesome (ok, I might be biased, but *I* > think it's awesome). I don't feel ashamed because on your specific setup, > we can't find a way to make any of our generic examples deliver the full > rate of rx streams to storage -- we sell RF hardware, and not storage > infrastructure, and the point of the examples is demonstrating the usage of > UHD, and not holding a lecture on high performance storage handling. I > wish, though, that we could solve your problem. > > Now, GNU Radio/gr-uhd does in fact come with an application called > uhd_rx_cfile, which is more or less a clone of rx_samples_to_file using > gr-uhd and GNU Radio instead of raw UHD. Does that work out for you? > > I > understand that this is theoretical limit of the bus, but if there doesn't > exist a driver or other software to make use of this, the practical limit > becomes much, much smaller. > > Well, UHD seems to be able to sustain these rates, if you write to > /dev/null, right? So the practical limit for UHD is definitely not being > hit. > I have another --maybe even practical-- suggestion to make: Roll your own > buffer! > > mkfifo /tmp/mybuffer #assuming tmpfs is in ram > dd if=/tmp/mybuffer of=/mount/raid_volume/data.dat & #start in background; > you could play around with block sizes using the bs= option of dd > rx_samples_to_file --file /tmp/mybuffer [all the other options] > > By the way: Thanks for bringing this up! We know that recording samples is > a core concern of many users. > > Greetings, > Marcus > > [1] https://www.kernel.org/doc/Documentation/sysctl/vm.txt > > On Fri, Oct 3, 2014 at 10:55 AM, Marcus Müller <usrp-users@lists.ettus.com> <usrp-users@lists.ettus.com> > wrote: > > > I have to agree with Marcus on this. Also, keep in mind that storage is > really what an operating system should take care of in any "general > purpose" scenario, ie. that as long as I just write to a file, I'd expect > that the thing in charge of storage (my kernel / the filesystems / block > device drivers) does the best it can to keep up. If I find myself in a > situation where my specific storage needs dictate a huge write buffer, > changing the application might be one way, but as I'm responsible for my > won storage subsystem, I could just as well increase the cache buffer > sizes, and let the operating system handle storage operation. If your RAID > is really performing as well as it is benchmarked to, then this should not > be one of your problems. All rx_samples_to_file does is really sequentially > writing out data at a constant rate, which is the most basic write > benchmark I can think of. > > If your storage subsystem (filesystem + storage abstraction + raid driver > + interface driver + hard drive interface + hard drives + hardware caches) > can't keep up, it's failing to perform as specified, simple as that. In > this case, saying that the application needs to be smarter when dealing > with storage seems like a bit of a cop-out to me ;) > > I'd like to point out that most benchmarks use heavily averaged numbers > for write speeds etc. UHD on the other hand kind of demands soft real-time > performance of a write subsystem, which is a lot harder to fulfill. This > comes up rather frequently, but I have to stress it: you need a fast > guaranteed write rate, not only an average one, and as soon as your > operating system has to postpone writing data[1], it has to have enough > performance to catch up whilst still meeting continued demand. This is > general purpose hardware running general purpose OS with dozens of > processes, and you can't just say "every single component is up to my task, > thus my system suffices", because everything potentially blocks everything! > > Greetings, > Marcus > > [1] e.g. because the filesystems needs to calculate checksums, update > tables, another process gets scheduled, a device blocks your PCIe bus, your > platters randomly need a bit longer to seek, you reach the physical end of > an LVM volume and have to move across a disk, an interrupt does what an > interrupt does, some process is getting noticed on a changing file > descriptor, DBUS is happening in the kernel, token ring has run out of > tokens, thermal throttling, bitflips on SATA leading to retransmission, > some page getting fetched from swap... > > > On 03.10.2014 15:34, Marcus D. Leech via USRP-users wrote: > > > > One has to keep firmly in mind that programs like rx_samples_to_file are > *examples* that show how to use > > the underlying UHD API. They are not necessarily optimized for all > situations, and indeed, one could > > restructure rx_samples_to_file to decouple UHD I/O from filesystem I/O, > using a large buffer between them. > > The fact is that dynamic performance of high-speed, real-time, flows is > something that almost-invariably needs > > tweaking for any particular situation. There's no way for an example > application to meet all those requirements. > > But the fact also remains that for *some* systems, rx_samples_to_file > (and uhd_rx_cfile on the Gnu Radio side) > > are able to stream high-speed data just fine as-is. > > On 2014-10-03 09:26, Peter Witkowski via USRP-users wrote: > > > To say that the issue is just because the disk subsystem can't keep up is a bit of cop-out. > > I had issues writing to disk when the incoming stream was 400MB/s and my RAID0 system was benchmarked at being much higher than that. > > The issue that I've been seeing stems from the fact that it appears that you cannot concurrently read/write from the data stream as its coming in. In effect you have a main loop that reads from the device and then immediately tries to write that buffer to file. If you do not complete these operations in a timely fashion overflows occur. > > One way to solve (or at least band aid the issue) is to set your dirty_background_ratio to 0. I was able to get writing to disk working somewhat with this setting as it is more predictable to directly write to disk instead of having your write cache fill up and then having a large amount of data to push to disk. That said, my RAID0 array is capable of such speeds and even then I was getting a few (but much reduced) overflows. > > The one surefire way I know of getting this working (even on a slow disk system) is to buffer the data. The buffer can then be consumed by the disk writing process while being concurrently added onto by the device reader. The easiest way to test buffering (that I've found) is to simply set up a GNURadio Companion program with a stream-to-vector block between the USRP and file sink blocks. This is exactly what I am doing currently since even with a very powerful system, I could not get data saved to disk quickly enough given the aforementioned issues with the provided UHD software. > > On Thu, Oct 2, 2014 at 11:48 PM, gsmandvoip via USRP-users <usrp-users@lists.ettus.com> <usrp-users@lists.ettus.com> <usrp-users@lists.ettus.com> <usrp-users@lists.ettus.com> wrote: > > Thanks Marcus for your replies. Yes O gone away. > > On Thu, Oct 2, 2014 at 5:50 PM, Marcus D. Leech <mleech@ripnet.com> <mleech@ripnet.com> <mleech@ripnet.com> <mleech@ripnet.com> wrote: > > with rx_samples_to_file without _4rx.rbf, Initially I tried on my i3, 4GB ram, it gave me > some OOOO but was lesser than earlier, but I do not understand, my most of the ram capacity and processor was sitting idle while it shows OOOO, why is this strange behaviour The default format for uhd_rx_cfile is complex-float, thus doubling the amount of data written compared to rx_samples_to_file. > > You can't just use CPU usage as an indicator of loading--if you're writing to disk, the disk subsystem may be much slower than you think, so the > "rate limiting step" is writes to the disk, not computational elements. > > Try using /dev/null as the file that you write to. If the 'O' go away, even at higher sampling rates, then it's your disk subsystem. > > using uhd_rx_cfile getting similar result, but strangely, why it is low, at 4M sampling rate it was higher??? > > On Thu, Oct 2, 2014 at 9:27 AM, Marcus D. Leech <mleech@ripnet.com> <mleech@ripnet.com> <mleech@ripnet.com> <mleech@ripnet.com> wrote: > > On 10/01/2014 11:46 PM, gsmandvoip wrote: > > Yes I am running single channel, but when trying to achieve my desired sampling rate without _4rx.rbf, it says, requested sampling rate is not valid, adjusting to some 3.9M or so. sorry for misleading info I gave earlier, I have i3, with 32 bit and i7 with 64 bit, but getting same result on both machines > > Here is my command to capture signal: > > ./rx_samples_to_file --args="fpga=usrp1_fpga_4rx.rbf, subdev=DBSRX" --freq "$FC" --rate="$SR" $FILE --nsamps "$NSAMPLES" > > and here is its output: > > Creating the usrp device with: fpga=usrp1_fpga_4rx.rbf, subdev=DBSRX... > -- Loading firmware image: /usr/share/uhd/images/usrp1_fw.ihx... done > -- Opening a USRP1 device... > -- Loading FPGA image: /usr/share/uhd/images/usrp1_ > fpga_4rx.rbf... done > -- Using FPGA clock rate of 52.000000MHz... > ERROR: LOOKUPERROR: INDEXERROR: MULTI_USRP::GET_TX_SUBDEV_SPEC(0) FAILED TO MAKE DEFAULT SPEC - VALUEERROR: THE SUBDEVICE SPECIFICATION "A:0" IS TOO LONG. > The user specified 1 channels, but there are only 0 tx dsps on mboard 0. > > Don't use the _4rx image if you don't need it. > > The USRP1 only does strict-integer resampling, and with a master clock (NON STANDARD FOR USRP1) of 52.000MHz, 4Msps is not a sample rate > that it can produce. Try 5.2Msps or 4.3333Msps. > > At 5.2Msps, it's recording at roughly 20.8Mbytes/second, so your system needs to be able to sustain that for at least as long as the capture lasts. > > > > _______________________________________________ > USRP-users mailing listUSRP-users@lists.ettus.comhttp://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com > > > > _______________________________________________ > USRP-users mailing listUSRP-users@lists.ettus.comhttp://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com > > > > _______________________________________________ > USRP-users mailing listUSRP-users@lists.ettus.comhttp://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com > > > > _______________________________________________ > USRP-users mailing list > USRP-users@lists.ettus.com > http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com > > -- Peter Witkowski pwitkowski@gmail.com
PW
Peter Witkowski
Mon, Oct 6, 2014 7:48 PM

Forgot to add:

I have all the recommended kernel tweaks for 10GigE running and found no
difference in number of overflows vs. the network buffer size once you pass
the point where UHD no longer throws a warning for your network buffer size
being too small (if I recall correctly I think this is at 32MB or so?).  I
can provide a copy of my sysctl.conf file if necessary.

I really hope that there's a kernel setting that I'm not using properly,
otherwise I just don't see how the single-threaded approach can work for
high data rates (in both the provided UHD programs and in user-built
GNURadio programs).  As I have mentioned, it is possible to get GNURadio to
do some multi-threading and concurrent producer/consumer behavior, but this
requires a re-purposing of the stream-to-vector block as it currently
stands (or "rolling your own").

On Mon, Oct 6, 2014 at 3:33 PM, Peter Witkowski pwitkowski@gmail.com
wrote:

Marcus,

Great discussion so far.

I think my point is that earlier I saw that it was alluded to that if
rx_samples_to_file fails, then your disk subsystem is to blame and you
should invest in more hardware.  My point was that the application in
general can benefit from multi-threading and internal buffering, otherwise
overflows are a fact of life for high data rates.  While the OS can do more
buffering, you are required to do periodic (at a rather high rate) reads of
the USRP, otherwise its buffers will overflow.  This is why it is critical
to minimize the time between two successive reads via threading and having
a buffer that can concurrently be added onto (from the device) and
processed (by writing to disk).

I think we can agree that there's a delta between what the I/O can sustain
(and what is displayed on a synthetic benchmark) versus what is actually
happening on the machine.  There's potentially a good deal of CPU cycles
that occur between two reads (a vast percentage of these are during disk
I/O, especially if the device is "busy") that seem to be causing the
overflow.  Stated differently, I can have a bunch of blocking reads occur
and sustain the needed rate.  Similarly (and this is what I see via my
benchmarks), I can have a series of blocking writes occur and sustain the
needed data rate.  However, the combination thereof (as well as the fact
that the kernel likes to preempt things) might be unsustainable in a single
threaded application.  That is, by the time I get to the next read, enough
time has passed to overwhelm the USRP read buffers.

My point with the dirty background ratio is that when set to zero, I have
a predictable amount of time that I block for each write.  My understanding
is that once triggered, background flushes will attempt to write at 100%
device speed.  During this time, additional device access (such as those
being called by the application) will be blocked (sequential writes are
therefore performance degraded).  When set to 0, I see in iotop that my
writes are a consistent value which is right about where resource monitor
shows the incoming 10GigE bandwidth to be.  Setting the
dirty_background_ratio any higher and the "Total Disk Write" and "Actual
Disk Write" differ wildly.  "Actual Disk Write" will spike at times, and
these spikes correlate with additional overflow errors.  When the device is
more stable (i.e. "Actual Disk Write" is consistent) I see a huge reduction
in overflows.  Stated differently, I'm OK with a flush, as long as its a
quick one.  In the cases of larger flushes (as is the case with a
dirty_background_ratio greater than 0), I can almost guarantee an overflow
(I can't get around the mainloop fast enough if the device is busy with a
long write).  Note that for my buffered application, I do run a default
background ratio.

The only thing that I got to work was buffering, and the easiest way I
know to accomplish this is to place a stream-to-vector block (in GNURadio)
between the USRP source and the file sink.  The vectorization effectively
works as a buffer, and was the only way I was able to get 100 MS/s working.

Other things I tried:

  1. Using CPU affinity and shielding.
  2. Increasing processes priority to 99 and SCHED_FIFO.

Also, I can confirm that rx_samples_to_file and uhd_rx_cfile (as well as
GNU Radio application that does effectively the same thing, but controls
CPU affinity) all behave the exact same way in terms of the number of
overflows I have encountered.

If there are any other kernel tweaks that I can try, please let me know.
However, in practice, my CPU is not fast enough to get through the mainloop
of rx_samples_to_file quickly enough.  As discussed above, buffering was
the only way I was able to consistently ensure that my data was saved and I
didn't encounter an overflow.

If you would like, I can provide whatever benchmarks of the system that
you would like.  Perhaps I am not using the correct tool when I quote my
minimum write speeds to disk.

On Sat, Oct 4, 2014 at 9:14 AM, Marcus Müller usrp-users@lists.ettus.com
wrote:

Hi Peter,

didn't mean to confuse you! Actually, my job is doing the opposite (ie.
providing useful information), and thus let me just shortly follow up on
this:
On 03.10.2014 17:44, Peter Witkowski via USRP-users wrote:

So I'm confused.

You state that if I can't use rx_samples_to_file, my system is failing to
perform as specified to write data out, then you give an example of several
things that can happen to create a stochastic write speed (which I totally
understand and agree with).  Given that writes can be stochastic, why is
there not a software buffer implemented in the UHD sample code to account
for such issues?

Well, because that's, in my opinion, an operating system's job. Being a
code example, rx_samples_to_file just musn't contain the complexity
introduced when you try to implement buffering functionality smarter than
what your OS can do. And, I do think it's nearly impossible to be smarter
than the linux kernel when optimizing writes -- but you'll have to tell
your kernel what you want, as a user. The kernel, as it is configured by
any modern distribution by default, won't do enormous write buffers,
because that's not what the user usually wants, increasing the risk of data
loss in case of system failure, and because you usually don't want to spend
all of your RAM on filesystem buffers. In your 64GB RAM case, though,
default buffer sizes should suffice, I guess, so I'm a bit out of clues
here.
It is definitely not very hard to increase these buffers' sizes[1], so I
encourage you to try it and see if that solves your problem. Now, I must
admit that up to here I was always assuming you hadn't already played
around with these values, if this is not the case, please accept my
apologies!

I understand that it's meant to be an example, but I've
also seen it referenced as being used effectively as a debugger or test for
people having issues (i.e. recommendation to use the UHD programs in place
of GNURadio to resolve issues).

...and it's done many users and thus Ettus a great job of supplying
basic functionality! The fact that it works in almost any situation with
this very minimalistic approach (repeated recv->write) proves that UHD is
in fact a rather nice driver interface, IMHO. The fact that GNU Radio
sometimes solves issues that rx_samples_to_file can't indicates exactly the
buffering approach to be helpful. But in that case, buffering is not
increased by increasing kernel buffer sizes, but by introducing GNU Radio
buffers between blocks. The USRP source (Martin, scold me if I say
something stupid) is not really much smarter than rx_samples_to_file: It
recv()s a packet of samples, and returns these samples from the work
function, and then GNU Radio takes care of shuffling and buffering that
data. Basically, GNU Radio behaves much like an operating system from the
source block's point of view.

Also, in terms of benchmarking, I'm quoting minimum values, not averages.
I agree with you that average values are pointless, and in reality the disk
subsystem needs to perform when called up.  My minimum values for a 4 disk
RAID0 with a dedicated controller are well within the data rate that I am
pushing.

Well, I'll kind of disagree with you: If your minimum write rate of your
system was bigger than the rate rx_samples_to_file causes, then you
wouldn't see the problem. The point, I believe, here is that the storage
system does not only consist of the hardware side of your RAID, but also on
your complete operating environment. Something slows down how fast data is
written to the RAID.
I think we both would expect the following to happen:

repeatedly:

rx_samples_to_file:
uhd::rx_streamer::recv
(blocks until a packet of samples has arrived. Instantly returns if
it has before the call)
write(file_handle, recv_buff)
(instantly returns, because writing should hit a buffer that the
operating system transparently pushes out to a disk. If buffer is full,
then block until enough space in buffer -- unless your filesystem is
mounted with some sync option...)

Now, if your RAID is definitely fast enough, the write buffer should
never get full. My hypothesis here is that either, your buffer size is just
to small, and a block of samples doesn't fit and has to be written out
instantly (which is unlikely), or something else occupies your system. That
might be just the fact that 400MB/s (are we talking about an X3x0?)
inevitably places a heavy load on things like PCIe busses and CPUs, and
that introduces a bottleneck in your storage chain which isn't there if you
"just" benchmark without the USRP. Also, the rather smallish sizes of
network packets dictate that journalling file systems introduce a very bad
overhead -- I don't know if you benchmarked with files on a journaling file
system and a (network packet size - header) block size...

Is there an example system that can handle sustained data capture from the
USRP at (or near the limits) of 10GigE or the PCIe interfaces (maybe the
requirement is enterprise class PCIe SSDs)?  I'm running a two socket Xenon
system (two hex core processors) with 64GB of RAM.  How much more hardware
should I throw at the problem to be able to sample/write at 100MS (half of
what is quoted on the website for bandwidth for the 10GigE kit) using the
provided code?

Definitely a nice system! I must admit that I don't have access to a
comparable setup, and thus I can't really offer you any first-hand
experience. Maybe others can.

I think the issue here is that the code itself can't simply get through
it's main loop fast enough.  There's a difference between data bandwidth
and CPU throughput.  The sequential nature of the code means that if any
weird stuff happens (your example was a good set of kernel related hilarity
that can lead to stochastic timing) you will have overflows since you
cannot read fast enough.  This is why a 90% solution for my application was
to just set the dirty_background_ratio to 0 and also why redirection to
/dev/null makes overflows go away.

This is interesting, as dirty_background_ratio is the percentage at
which the kernel should start writing out dirty pages in the background.
Now, I'm the one who's confused, because I would have expected this to
negatively impact performance. On the other hand, 0 (at least in my head)
does not make very much sense, maybe it's semantically identical to 100%?
Are you swapping (64GB would tell me you shouldn't have swap or extremly
low swappiness)?
On the other hand, it might really be that storage is not the bottleneck
here, and in fact maybe the CPU gets saturated. Now, you said that writing
to /dev/null solves your problem. Do your RAID or filesystem consume a lot
of CPU cycles? This is an interesting mystery...

With either method I didn't have to
wait for a large write cache to flush before moving on to the next read
from the USRP.  Note that there can also be things that happen on the read
side as well.  Does this mean that I can only run the code on an RTOS?

No :) UHD has it's own incoming buffer handlers, but as you already
said, in this high performance scenario, you might be totally right, and
our single-threaded approach just doesn't cut it. Maybe dropping in some
asynchronous storage IO would help -- but I hate seeing that blowing up in
example users' faces, so I guess the fact that it doesn't work with a
system as potent as yours with the sample rates as high as you demand might
actually be a shortcoming of the examples that isn't going to be fixed.

As a final note, my understanding is that GNURadio and the USRP were
developed for domain experts in DSP to use.

These are SDR frameworks and devices, respectively. The idea is to offer
people with the opportunity to build awesome DSP systems using universally
usable SDR blocks (GNU Radio) and universal software radio peripherals, so
well, they certainly address DSP people, but they shouldn't be hard to use.

These users may or may not
have prior experience in software.  As a result, I'd recommend perhaps
adding a buffered example or have the USRP GNURadio block allow for
buffering.

That is something we might consider. On the other hand, when someone
goes as far as you do, maybe having an example that does the buffering in a
separate thread (or even process) isn't worth that much -- in the end, one
will want to write one's own high performance application, and that will
include handling such data rates.

Otherwise, I just don't see how you can advertise 200 MS/s
(maybe even a simple "buffer" block in GNURadio would do the trick?).

Well, the devices support these rates, and our driver is able to
withstand these rates and sustain them without hitting CPU barriers due to
having too much overhead. That's awesome (ok, I might be biased, but I
think it's awesome). I don't feel ashamed because on your specific setup,
we can't find a way to make any of our generic examples deliver the full
rate of rx streams to storage -- we sell RF hardware, and not storage
infrastructure, and the point of the examples is demonstrating the usage of
UHD, and not holding a lecture on high performance storage handling. I
wish, though, that we could solve your problem.

Now, GNU Radio/gr-uhd does in fact come with an application called
uhd_rx_cfile, which is more or less a clone of rx_samples_to_file using
gr-uhd and GNU Radio instead of raw UHD. Does that work out for you?

I
understand that this is theoretical limit of the bus, but if there doesn't
exist a driver or other software to make use of this, the practical limit
becomes much, much smaller.

Well, UHD seems to be able to sustain these rates, if you write to
/dev/null, right? So the practical limit for UHD is definitely not being
hit.
I have another --maybe even practical-- suggestion to make: Roll your own
buffer!

mkfifo /tmp/mybuffer #assuming tmpfs is in ram
dd if=/tmp/mybuffer of=/mount/raid_volume/data.dat & #start in
background; you could play around with block sizes using the bs= option of
dd
rx_samples_to_file --file /tmp/mybuffer [all the other options]

By the way: Thanks for bringing this up! We know that recording samples
is a core concern of many users.

Greetings,
Marcus

[1] https://www.kernel.org/doc/Documentation/sysctl/vm.txt

On Fri, Oct 3, 2014 at 10:55 AM, Marcus Müller usrp-users@lists.ettus.com usrp-users@lists.ettus.com
wrote:

I have to agree with Marcus on this. Also, keep in mind that storage is
really what an operating system should take care of in any "general
purpose" scenario, ie. that as long as I just write to a file, I'd expect
that the thing in charge of storage (my kernel / the filesystems / block
device drivers) does the best it can to keep up. If I find myself in a
situation where my specific storage needs dictate a huge write buffer,
changing the application might be one way, but as I'm responsible for my
won storage subsystem, I could just as well increase the cache buffer
sizes, and let the operating system handle storage operation. If your RAID
is really performing as well as it is benchmarked to, then this should not
be one of your problems. All rx_samples_to_file does is really sequentially
writing out data at a constant rate, which is the most basic write
benchmark I can think of.

If your storage subsystem (filesystem + storage abstraction + raid driver

  • interface driver + hard drive interface + hard drives + hardware caches)
    can't keep up, it's failing to perform as specified, simple as that. In
    this case, saying that the application needs to be smarter when dealing
    with storage seems like a bit of a cop-out to me ;)

I'd like to point out that most benchmarks use heavily averaged numbers
for write speeds etc. UHD on the other hand kind of demands soft real-time
performance of a write subsystem, which is a lot harder to fulfill. This
comes up rather frequently, but I have to stress it: you need a fast
guaranteed write rate, not only an average one, and as soon as your
operating system has to postpone writing data[1], it has to have enough
performance to catch up whilst still meeting continued demand. This is
general purpose hardware running general purpose OS with dozens of
processes, and you can't just say "every single component is up to my task,
thus my system suffices", because everything potentially blocks everything!

Greetings,
Marcus

[1] e.g. because the filesystems needs to calculate checksums, update
tables, another process gets scheduled, a device blocks your PCIe bus, your
platters randomly need a bit longer to seek, you reach the physical end of
an LVM volume and have to move across a disk, an interrupt does what an
interrupt does, some process is getting noticed on a changing file
descriptor, DBUS is happening in the kernel, token ring has run out of
tokens, thermal throttling, bitflips on SATA leading to retransmission,
some page getting fetched from swap...

On 03.10.2014 15:34, Marcus D. Leech via USRP-users wrote:

One has to keep firmly in mind that programs like rx_samples_to_file are
examples that show how to use

the underlying UHD API. They are not necessarily optimized for all
situations, and indeed, one could

restructure rx_samples_to_file to decouple UHD I/O from filesystem I/O,
using a large buffer between them.

The fact is that dynamic performance of high-speed, real-time, flows is
something that almost-invariably needs

tweaking for any particular situation. There's no way for an example
application to meet all those requirements.

But the fact also remains that for some systems, rx_samples_to_file
(and uhd_rx_cfile on the Gnu Radio side)

are able to stream high-speed data just fine as-is.

On 2014-10-03 09:26, Peter Witkowski via USRP-users wrote:

To say that the issue is just because the disk subsystem can't keep up is a bit of cop-out.

I had issues writing to disk when the incoming stream was 400MB/s and my RAID0 system was benchmarked at being much higher than that.

The issue that I've been seeing stems from the fact that it appears that you cannot concurrently read/write from the data stream as its coming in. In effect you have a main loop that reads from the device and then immediately tries to write that buffer to file. If you do not complete these operations in a timely fashion overflows occur.

One way to solve (or at least band aid the issue) is to set your dirty_background_ratio to 0. I was able to get writing to disk working somewhat with this setting as it is more predictable to directly write to disk instead of having your write cache fill up and then having a large amount of data to push to disk. That said, my RAID0 array is capable of such speeds and even then I was getting a few (but much reduced) overflows.

The one surefire way I know of getting this working (even on a slow disk system) is to buffer the data. The buffer can then be consumed by the disk writing process while being concurrently added onto by the device reader. The easiest way to test buffering (that I've found) is to simply set up a GNURadio Companion program with a stream-to-vector block between the USRP and file sink blocks. This is exactly what I am doing currently since even with a very powerful system, I could not get data saved to disk quickly enough given the aforementioned issues with the provided UHD software.

On Thu, Oct 2, 2014 at 11:48 PM, gsmandvoip via USRP-users usrp-users@lists.ettus.com usrp-users@lists.ettus.com usrp-users@lists.ettus.com usrp-users@lists.ettus.com wrote:

Thanks Marcus for your replies. Yes O gone away.

On Thu, Oct 2, 2014 at 5:50 PM, Marcus D. Leech mleech@ripnet.com mleech@ripnet.com mleech@ripnet.com mleech@ripnet.com wrote:

with rx_samples_to_file without _4rx.rbf, Initially I tried on my i3, 4GB ram, it gave me
some OOOO but was lesser than earlier, but I do not understand, my most of the ram capacity and processor was sitting idle while it shows OOOO, why is this strange behaviour The default format for uhd_rx_cfile is complex-float, thus doubling the amount of data written compared to rx_samples_to_file.

You can't just use CPU usage as an indicator of loading--if you're writing to disk, the disk subsystem may be much slower than you think, so the
"rate limiting step" is writes to the disk, not computational elements.

Try using /dev/null as the file that you write to. If the 'O' go away, even at higher sampling rates, then it's your disk subsystem.

using uhd_rx_cfile getting similar result, but strangely, why it is low, at 4M sampling rate it was higher???

On Thu, Oct 2, 2014 at 9:27 AM, Marcus D. Leech mleech@ripnet.com mleech@ripnet.com mleech@ripnet.com mleech@ripnet.com wrote:

On 10/01/2014 11:46 PM, gsmandvoip wrote:

Yes I am running single channel, but when trying to achieve my desired sampling rate without _4rx.rbf, it says, requested sampling rate is not valid, adjusting to some 3.9M or so. sorry for misleading info I gave earlier, I have i3, with 32 bit and i7 with 64 bit, but getting same result on both machines

Here is my command to capture signal:

./rx_samples_to_file --args="fpga=usrp1_fpga_4rx.rbf, subdev=DBSRX" --freq "$FC" --rate="$SR" $FILE --nsamps "$NSAMPLES"

and here is its output:

Creating the usrp device with: fpga=usrp1_fpga_4rx.rbf, subdev=DBSRX...
-- Loading firmware image: /usr/share/uhd/images/usrp1_fw.ihx... done
-- Opening a USRP1 device...
-- Loading FPGA image: /usr/share/uhd/images/usrp1_
fpga_4rx.rbf... done
-- Using FPGA clock rate of 52.000000MHz...
ERROR: LOOKUPERROR: INDEXERROR: MULTI_USRP::GET_TX_SUBDEV_SPEC(0) FAILED TO MAKE DEFAULT SPEC - VALUEERROR: THE SUBDEVICE SPECIFICATION "A:0" IS TOO LONG.
The user specified 1 channels, but there are only 0 tx dsps on mboard 0.

Don't use the _4rx image if you don't need it.

The USRP1 only does strict-integer resampling, and with a master clock (NON STANDARD FOR USRP1) of 52.000MHz, 4Msps is not a sample rate
that it can produce. Try 5.2Msps or 4.3333Msps.

At 5.2Msps, it's recording at roughly 20.8Mbytes/second, so your system needs to be able to sustain that for at least as long as the capture lasts.


USRP-users mailing listUSRP-users@lists.ettus.comhttp://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com


USRP-users mailing listUSRP-users@lists.ettus.comhttp://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com


USRP-users mailing listUSRP-users@lists.ettus.comhttp://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com


USRP-users mailing list
USRP-users@lists.ettus.com
http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com

--
Peter Witkowski
pwitkowski@gmail.com

--
Peter Witkowski
pwitkowski@gmail.com

Forgot to add: I have all the recommended kernel tweaks for 10GigE running and found no difference in number of overflows vs. the network buffer size once you pass the point where UHD no longer throws a warning for your network buffer size being too small (if I recall correctly I think this is at 32MB or so?). I can provide a copy of my sysctl.conf file if necessary. I really hope that there's a kernel setting that I'm not using properly, otherwise I just don't see how the single-threaded approach can work for high data rates (in both the provided UHD programs and in user-built GNURadio programs). As I have mentioned, it is possible to get GNURadio to do some multi-threading and concurrent producer/consumer behavior, but this requires a re-purposing of the stream-to-vector block as it currently stands (or "rolling your own"). On Mon, Oct 6, 2014 at 3:33 PM, Peter Witkowski <pwitkowski@gmail.com> wrote: > Marcus, > > Great discussion so far. > > I think my point is that earlier I saw that it was alluded to that if > rx_samples_to_file fails, then your disk subsystem is to blame and you > should invest in more hardware. My point was that the application in > general can benefit from multi-threading and internal buffering, otherwise > overflows are a fact of life for high data rates. While the OS can do more > buffering, you are required to do periodic (at a rather high rate) reads of > the USRP, otherwise its buffers will overflow. This is why it is critical > to minimize the time between two successive reads via threading and having > a buffer that can concurrently be added onto (from the device) and > processed (by writing to disk). > > I think we can agree that there's a delta between what the I/O can sustain > (and what is displayed on a synthetic benchmark) versus what is actually > happening on the machine. There's potentially a good deal of CPU cycles > that occur between two reads (a vast percentage of these are during disk > I/O, especially if the device is "busy") that seem to be causing the > overflow. Stated differently, I can have a bunch of blocking reads occur > and sustain the needed rate. Similarly (and this is what I see via my > benchmarks), I can have a series of blocking writes occur and sustain the > needed data rate. However, the combination thereof (as well as the fact > that the kernel likes to preempt things) might be unsustainable in a single > threaded application. That is, by the time I get to the next read, enough > time has passed to overwhelm the USRP read buffers. > > My point with the dirty background ratio is that when set to zero, I have > a predictable amount of time that I block for each write. My understanding > is that once triggered, background flushes will attempt to write at 100% > device speed. During this time, additional device access (such as those > being called by the application) will be blocked (sequential writes are > therefore performance degraded). When set to 0, I see in iotop that my > writes are a consistent value which is right about where resource monitor > shows the incoming 10GigE bandwidth to be. Setting the > dirty_background_ratio any higher and the "Total Disk Write" and "Actual > Disk Write" differ wildly. "Actual Disk Write" will spike at times, and > these spikes correlate with additional overflow errors. When the device is > more stable (i.e. "Actual Disk Write" is consistent) I see a huge reduction > in overflows. Stated differently, I'm OK with a flush, as long as its a > quick one. In the cases of larger flushes (as is the case with a > dirty_background_ratio greater than 0), I can almost guarantee an overflow > (I can't get around the mainloop fast enough if the device is busy with a > long write). Note that for my buffered application, I do run a default > background ratio. > > The only thing that I got to work was buffering, and the easiest way I > know to accomplish this is to place a stream-to-vector block (in GNURadio) > between the USRP source and the file sink. The vectorization effectively > works as a buffer, and was the only way I was able to get 100 MS/s working. > > Other things I tried: > 1. Using CPU affinity and shielding. > 2. Increasing processes priority to 99 and SCHED_FIFO. > > Also, I can confirm that rx_samples_to_file and uhd_rx_cfile (as well as > GNU Radio application that does effectively the same thing, but controls > CPU affinity) all behave the exact same way in terms of the number of > overflows I have encountered. > > If there are any other kernel tweaks that I can try, please let me know. > However, in practice, my CPU is not fast enough to get through the mainloop > of rx_samples_to_file quickly enough. As discussed above, buffering was > the only way I was able to consistently ensure that my data was saved and I > didn't encounter an overflow. > > If you would like, I can provide whatever benchmarks of the system that > you would like. Perhaps I am not using the correct tool when I quote my > minimum write speeds to disk. > > On Sat, Oct 4, 2014 at 9:14 AM, Marcus Müller <usrp-users@lists.ettus.com> > wrote: > >> Hi Peter, >> >> didn't mean to confuse you! Actually, my job is doing the opposite (ie. >> providing useful information), and thus let me just shortly follow up on >> this: >> On 03.10.2014 17:44, Peter Witkowski via USRP-users wrote: >> >> So I'm confused. >> >> You state that if I can't use rx_samples_to_file, my system is failing to >> perform as specified to write data out, then you give an example of several >> things that can happen to create a stochastic write speed (which I totally >> understand and agree with). Given that writes can be stochastic, why is >> there not a software buffer implemented in the UHD sample code to account >> for such issues? >> >> Well, because that's, in my opinion, an operating system's job. Being a >> code example, rx_samples_to_file just *musn't* contain the complexity >> introduced when you try to implement buffering functionality smarter than >> what your OS can do. And, I do think it's nearly impossible to be smarter >> than the linux kernel when optimizing writes -- *but* you'll have to tell >> your kernel what you want, as a user. The kernel, as it is configured by >> any modern distribution by default, won't do enormous write buffers, >> because that's not what the user usually wants, increasing the risk of data >> loss in case of system failure, and because you usually don't want to spend >> all of your RAM on filesystem buffers. In your 64GB RAM case, though, >> default buffer sizes should suffice, I guess, so I'm a bit out of clues >> here. >> It is definitely not very hard to increase these buffers' sizes[1], so I >> encourage you to try it and see if that solves your problem. Now, I must >> admit that up to here I was always assuming you hadn't already played >> around with these values, if this is not the case, please accept my >> apologies! >> >> I understand that it's meant to be an example, but I've >> also seen it referenced as being used effectively as a debugger or test for >> people having issues (i.e. recommendation to use the UHD programs in place >> of GNURadio to resolve issues). >> >> ...and it's done many users and thus Ettus a great job of supplying >> basic functionality! The fact that it works in almost any situation with >> this very minimalistic approach (repeated recv->write) proves that UHD is >> in fact a rather nice driver interface, IMHO. The fact that GNU Radio >> sometimes solves issues that rx_samples_to_file can't indicates exactly the >> buffering approach to be helpful. But in that case, buffering is not >> increased by increasing kernel buffer sizes, but by introducing GNU Radio >> buffers between blocks. The USRP source (Martin, scold me if I say >> something stupid) is not really much smarter than rx_samples_to_file: It >> recv()s a packet of samples, and returns these samples from the work >> function, and then GNU Radio takes care of shuffling and buffering that >> data. Basically, GNU Radio behaves much like an operating system from the >> source block's point of view. >> >> Also, in terms of benchmarking, I'm quoting minimum values, not averages. >> I agree with you that average values are pointless, and in reality the disk >> subsystem needs to perform when called up. My minimum values for a 4 disk >> RAID0 with a dedicated controller are well within the data rate that I am >> pushing. >> >> Well, I'll kind of disagree with you: If your minimum write rate of your >> system was bigger than the rate rx_samples_to_file causes, then you >> wouldn't see the problem. The point, I believe, here is that the storage >> system does not only consist of the hardware side of your RAID, but also on >> your complete operating environment. Something slows down how fast data is >> written to the RAID. >> I think we both would expect the following to happen: >> >> repeatedly: >> >> rx_samples_to_file: >> uhd::rx_streamer::recv >> (blocks until a packet of samples has arrived. Instantly returns if >> it has before the call) >> write(file_handle, recv_buff) >> (instantly returns, because writing should hit a buffer that the >> operating system transparently pushes out to a disk. If buffer is full, >> then block until enough space in buffer -- unless your filesystem is >> mounted with some sync option...) >> >> Now, if your RAID is definitely fast enough, the write buffer should >> never get full. My hypothesis here is that either, your buffer size is just >> to small, and a block of samples doesn't fit and has to be written out >> instantly (which is unlikely), or something else occupies your system. That >> might be just the fact that 400MB/s (are we talking about an X3x0?) >> inevitably places a heavy load on things like PCIe busses and CPUs, and >> that introduces a bottleneck in your storage chain which isn't there if you >> "just" benchmark without the USRP. Also, the rather smallish sizes of >> network packets dictate that journalling file systems introduce a very bad >> overhead -- I don't know if you benchmarked with files on a journaling file >> system and a (network packet size - header) block size... >> >> Is there an example system that can handle sustained data capture from the >> USRP at (or near the limits) of 10GigE or the PCIe interfaces (maybe the >> requirement is enterprise class PCIe SSDs)? I'm running a two socket Xenon >> system (two hex core processors) with 64GB of RAM. How much more hardware >> should I throw at the problem to be able to sample/write at 100MS (half of >> what is quoted on the website for bandwidth for the 10GigE kit) using the >> provided code? >> >> Definitely a nice system! I must admit that I don't have access to a >> comparable setup, and thus I can't really offer you any first-hand >> experience. Maybe others can. >> >> I think the issue here is that the code itself can't simply get through >> it's main loop fast enough. There's a difference between data bandwidth >> and CPU throughput. The sequential nature of the code means that if any >> weird stuff happens (your example was a good set of kernel related hilarity >> that can lead to stochastic timing) you will have overflows since you >> cannot read fast enough. This is why a 90% solution for my application was >> to just set the dirty_background_ratio to 0 and also why redirection to >> /dev/null makes overflows go away. >> >> This is interesting, as dirty_background_ratio is the percentage at >> which the kernel should start writing out dirty pages in the background. >> Now, I'm the one who's confused, because I would have expected this to >> negatively impact performance. On the other hand, 0 (at least in my head) >> does not make very much sense, maybe it's semantically identical to 100%? >> Are you swapping (64GB would tell me you shouldn't have swap or extremly >> low swappiness)? >> On the other hand, it might really be that storage is not the bottleneck >> here, and in fact maybe the CPU gets saturated. Now, you said that writing >> to /dev/null solves your problem. Do your RAID or filesystem consume a lot >> of CPU cycles? This is an interesting mystery... >> >> With either method I didn't have to >> wait for a large write cache to flush before moving on to the next read >> from the USRP. Note that there can also be things that happen on the read >> side as well. Does this mean that I can only run the code on an RTOS? >> >> No :) UHD has it's own incoming buffer handlers, but as you already >> said, in this high performance scenario, you might be totally right, and >> our single-threaded approach just doesn't cut it. Maybe dropping in some >> asynchronous storage IO would help -- but I hate seeing that blowing up in >> example users' faces, so I guess the fact that it doesn't work with a >> system as potent as yours with the sample rates as high as you demand might >> actually be a shortcoming of the examples that isn't going to be fixed. >> >> As a final note, my understanding is that GNURadio and the USRP were >> developed for domain experts in DSP to use. >> >> These are SDR frameworks and devices, respectively. The idea is to offer >> people with the opportunity to build awesome DSP systems using universally >> usable SDR blocks (GNU Radio) and universal software radio peripherals, so >> well, they certainly address DSP people, but they shouldn't be hard to use. >> >> These users may or may not >> have prior experience in software. As a result, I'd recommend perhaps >> adding a buffered example or have the USRP GNURadio block allow for >> buffering. >> >> That is something we might consider. On the other hand, when someone >> goes as far as you do, maybe having an example that does the buffering in a >> separate thread (or even process) isn't worth that much -- in the end, one >> will want to write one's own high performance application, and that will >> include handling such data rates. >> >> Otherwise, I just don't see how you can advertise 200 MS/s >> (maybe even a simple "buffer" block in GNURadio would do the trick?). >> >> Well, the devices support these rates, and our driver is able to >> withstand these rates and sustain them without hitting CPU barriers due to >> having too much overhead. That's awesome (ok, I might be biased, but *I* >> think it's awesome). I don't feel ashamed because on your specific setup, >> we can't find a way to make any of our generic examples deliver the full >> rate of rx streams to storage -- we sell RF hardware, and not storage >> infrastructure, and the point of the examples is demonstrating the usage of >> UHD, and not holding a lecture on high performance storage handling. I >> wish, though, that we could solve your problem. >> >> Now, GNU Radio/gr-uhd does in fact come with an application called >> uhd_rx_cfile, which is more or less a clone of rx_samples_to_file using >> gr-uhd and GNU Radio instead of raw UHD. Does that work out for you? >> >> I >> understand that this is theoretical limit of the bus, but if there doesn't >> exist a driver or other software to make use of this, the practical limit >> becomes much, much smaller. >> >> Well, UHD seems to be able to sustain these rates, if you write to >> /dev/null, right? So the practical limit for UHD is definitely not being >> hit. >> I have another --maybe even practical-- suggestion to make: Roll your own >> buffer! >> >> mkfifo /tmp/mybuffer #assuming tmpfs is in ram >> dd if=/tmp/mybuffer of=/mount/raid_volume/data.dat & #start in >> background; you could play around with block sizes using the bs= option of >> dd >> rx_samples_to_file --file /tmp/mybuffer [all the other options] >> >> By the way: Thanks for bringing this up! We know that recording samples >> is a core concern of many users. >> >> Greetings, >> Marcus >> >> [1] https://www.kernel.org/doc/Documentation/sysctl/vm.txt >> >> On Fri, Oct 3, 2014 at 10:55 AM, Marcus Müller <usrp-users@lists.ettus.com> <usrp-users@lists.ettus.com> >> wrote: >> >> >> I have to agree with Marcus on this. Also, keep in mind that storage is >> really what an operating system should take care of in any "general >> purpose" scenario, ie. that as long as I just write to a file, I'd expect >> that the thing in charge of storage (my kernel / the filesystems / block >> device drivers) does the best it can to keep up. If I find myself in a >> situation where my specific storage needs dictate a huge write buffer, >> changing the application might be one way, but as I'm responsible for my >> won storage subsystem, I could just as well increase the cache buffer >> sizes, and let the operating system handle storage operation. If your RAID >> is really performing as well as it is benchmarked to, then this should not >> be one of your problems. All rx_samples_to_file does is really sequentially >> writing out data at a constant rate, which is the most basic write >> benchmark I can think of. >> >> If your storage subsystem (filesystem + storage abstraction + raid driver >> + interface driver + hard drive interface + hard drives + hardware caches) >> can't keep up, it's failing to perform as specified, simple as that. In >> this case, saying that the application needs to be smarter when dealing >> with storage seems like a bit of a cop-out to me ;) >> >> I'd like to point out that most benchmarks use heavily averaged numbers >> for write speeds etc. UHD on the other hand kind of demands soft real-time >> performance of a write subsystem, which is a lot harder to fulfill. This >> comes up rather frequently, but I have to stress it: you need a fast >> guaranteed write rate, not only an average one, and as soon as your >> operating system has to postpone writing data[1], it has to have enough >> performance to catch up whilst still meeting continued demand. This is >> general purpose hardware running general purpose OS with dozens of >> processes, and you can't just say "every single component is up to my task, >> thus my system suffices", because everything potentially blocks everything! >> >> Greetings, >> Marcus >> >> [1] e.g. because the filesystems needs to calculate checksums, update >> tables, another process gets scheduled, a device blocks your PCIe bus, your >> platters randomly need a bit longer to seek, you reach the physical end of >> an LVM volume and have to move across a disk, an interrupt does what an >> interrupt does, some process is getting noticed on a changing file >> descriptor, DBUS is happening in the kernel, token ring has run out of >> tokens, thermal throttling, bitflips on SATA leading to retransmission, >> some page getting fetched from swap... >> >> >> On 03.10.2014 15:34, Marcus D. Leech via USRP-users wrote: >> >> >> >> One has to keep firmly in mind that programs like rx_samples_to_file are >> *examples* that show how to use >> >> the underlying UHD API. They are not necessarily optimized for all >> situations, and indeed, one could >> >> restructure rx_samples_to_file to decouple UHD I/O from filesystem I/O, >> using a large buffer between them. >> >> The fact is that dynamic performance of high-speed, real-time, flows is >> something that almost-invariably needs >> >> tweaking for any particular situation. There's no way for an example >> application to meet all those requirements. >> >> But the fact also remains that for *some* systems, rx_samples_to_file >> (and uhd_rx_cfile on the Gnu Radio side) >> >> are able to stream high-speed data just fine as-is. >> >> On 2014-10-03 09:26, Peter Witkowski via USRP-users wrote: >> >> >> To say that the issue is just because the disk subsystem can't keep up is a bit of cop-out. >> >> I had issues writing to disk when the incoming stream was 400MB/s and my RAID0 system was benchmarked at being much higher than that. >> >> The issue that I've been seeing stems from the fact that it appears that you cannot concurrently read/write from the data stream as its coming in. In effect you have a main loop that reads from the device and then immediately tries to write that buffer to file. If you do not complete these operations in a timely fashion overflows occur. >> >> One way to solve (or at least band aid the issue) is to set your dirty_background_ratio to 0. I was able to get writing to disk working somewhat with this setting as it is more predictable to directly write to disk instead of having your write cache fill up and then having a large amount of data to push to disk. That said, my RAID0 array is capable of such speeds and even then I was getting a few (but much reduced) overflows. >> >> The one surefire way I know of getting this working (even on a slow disk system) is to buffer the data. The buffer can then be consumed by the disk writing process while being concurrently added onto by the device reader. The easiest way to test buffering (that I've found) is to simply set up a GNURadio Companion program with a stream-to-vector block between the USRP and file sink blocks. This is exactly what I am doing currently since even with a very powerful system, I could not get data saved to disk quickly enough given the aforementioned issues with the provided UHD software. >> >> On Thu, Oct 2, 2014 at 11:48 PM, gsmandvoip via USRP-users <usrp-users@lists.ettus.com> <usrp-users@lists.ettus.com> <usrp-users@lists.ettus.com> <usrp-users@lists.ettus.com> wrote: >> >> Thanks Marcus for your replies. Yes O gone away. >> >> On Thu, Oct 2, 2014 at 5:50 PM, Marcus D. Leech <mleech@ripnet.com> <mleech@ripnet.com> <mleech@ripnet.com> <mleech@ripnet.com> wrote: >> >> with rx_samples_to_file without _4rx.rbf, Initially I tried on my i3, 4GB ram, it gave me >> some OOOO but was lesser than earlier, but I do not understand, my most of the ram capacity and processor was sitting idle while it shows OOOO, why is this strange behaviour The default format for uhd_rx_cfile is complex-float, thus doubling the amount of data written compared to rx_samples_to_file. >> >> You can't just use CPU usage as an indicator of loading--if you're writing to disk, the disk subsystem may be much slower than you think, so the >> "rate limiting step" is writes to the disk, not computational elements. >> >> Try using /dev/null as the file that you write to. If the 'O' go away, even at higher sampling rates, then it's your disk subsystem. >> >> using uhd_rx_cfile getting similar result, but strangely, why it is low, at 4M sampling rate it was higher??? >> >> On Thu, Oct 2, 2014 at 9:27 AM, Marcus D. Leech <mleech@ripnet.com> <mleech@ripnet.com> <mleech@ripnet.com> <mleech@ripnet.com> wrote: >> >> On 10/01/2014 11:46 PM, gsmandvoip wrote: >> >> Yes I am running single channel, but when trying to achieve my desired sampling rate without _4rx.rbf, it says, requested sampling rate is not valid, adjusting to some 3.9M or so. sorry for misleading info I gave earlier, I have i3, with 32 bit and i7 with 64 bit, but getting same result on both machines >> >> Here is my command to capture signal: >> >> ./rx_samples_to_file --args="fpga=usrp1_fpga_4rx.rbf, subdev=DBSRX" --freq "$FC" --rate="$SR" $FILE --nsamps "$NSAMPLES" >> >> and here is its output: >> >> Creating the usrp device with: fpga=usrp1_fpga_4rx.rbf, subdev=DBSRX... >> -- Loading firmware image: /usr/share/uhd/images/usrp1_fw.ihx... done >> -- Opening a USRP1 device... >> -- Loading FPGA image: /usr/share/uhd/images/usrp1_ >> fpga_4rx.rbf... done >> -- Using FPGA clock rate of 52.000000MHz... >> ERROR: LOOKUPERROR: INDEXERROR: MULTI_USRP::GET_TX_SUBDEV_SPEC(0) FAILED TO MAKE DEFAULT SPEC - VALUEERROR: THE SUBDEVICE SPECIFICATION "A:0" IS TOO LONG. >> The user specified 1 channels, but there are only 0 tx dsps on mboard 0. >> >> Don't use the _4rx image if you don't need it. >> >> The USRP1 only does strict-integer resampling, and with a master clock (NON STANDARD FOR USRP1) of 52.000MHz, 4Msps is not a sample rate >> that it can produce. Try 5.2Msps or 4.3333Msps. >> >> At 5.2Msps, it's recording at roughly 20.8Mbytes/second, so your system needs to be able to sustain that for at least as long as the capture lasts. >> >> >> >> _______________________________________________ >> USRP-users mailing listUSRP-users@lists.ettus.comhttp://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com >> >> >> >> _______________________________________________ >> USRP-users mailing listUSRP-users@lists.ettus.comhttp://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com >> >> >> >> _______________________________________________ >> USRP-users mailing listUSRP-users@lists.ettus.comhttp://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com >> >> >> >> _______________________________________________ >> USRP-users mailing list >> USRP-users@lists.ettus.com >> http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com >> >> > > > -- > Peter Witkowski > pwitkowski@gmail.com > -- Peter Witkowski pwitkowski@gmail.com
AW
Andy Walls
Mon, Oct 6, 2014 8:31 PM

On Mon, 2014-10-06 at 15:33 -0400, Peter Witkowski via USRP-users wrote:

Marcus,

Great discussion so far.

I think my point is that earlier I saw that it was alluded to that if
rx_samples_to_file fails, then your disk subsystem is to blame and you
should invest in more hardware.  My point was that the application in
general can benefit from multi-threading and internal buffering,
otherwise overflows are a fact of life for high data rates.  While the
OS can do more buffering, you are required to do periodic (at a rather
high rate) reads of the USRP, otherwise its buffers will overflow.
This is why it is critical to minimize the time between two successive
reads via threading and having a buffer that can concurrently be added
onto (from the device) and processed (by writing to disk).

I think we can agree that there's a delta between what the I/O can
sustain (and what is displayed on a synthetic benchmark) versus what
is actually happening on the machine.  There's potentially a good deal
of CPU cycles that occur between two reads (a vast percentage of these
are during disk I/O, especially if the device is "busy") that seem to
be causing the overflow.  Stated differently, I can have a bunch of
blocking reads occur and sustain the needed rate.  Similarly (and this
is what I see via my benchmarks), I can have a series of blocking
writes occur and sustain the needed data rate.  However, the
combination thereof (as well as the fact that the kernel likes to
preempt things) might be unsustainable in a single threaded
application.  That is, by the time I get to the next read, enough time
has passed to overwhelm the USRP read buffers.

My point with the dirty background ratio is that when set to zero, I
have a predictable amount of time that I block for each write.  My
understanding is that once triggered, background flushes will attempt
to write at 100% device speed.  During this time, additional device
access (such as those being called by the application) will be blocked
(sequential writes are therefore performance degraded).  When set to
0, I see in iotop that my writes are a consistent value which is right
about where resource monitor shows the incoming 10GigE bandwidth to
be.  Setting the dirty_background_ratio any higher and the "Total Disk
Write" and "Actual Disk Write" differ wildly.  "Actual Disk Write"
will spike at times, and these spikes correlate with additional
overflow errors.  When the device is more stable (i.e. "Actual Disk
Write" is consistent) I see a huge reduction in overflows.  Stated
differently, I'm OK with a flush, as long as its a quick one.  In the
cases of larger flushes (as is the case with a dirty_background_ratio
greater than 0), I can almost guarantee an overflow (I can't get
around the mainloop fast enough if the device is busy with a long
write).  Note that for my buffered application, I do run a default
background ratio.

The only thing that I got to work was buffering, and the easiest way I
know to accomplish this is to place a stream-to-vector block (in
GNURadio) between the USRP source and the file sink.  The
vectorization effectively works as a buffer, and was the only way I
was able to get 100 MS/s working.

Other things I tried:

  1. Using CPU affinity and shielding.

  2. Increasing processes priority to 99 and SCHED_FIFO.

Also, I can confirm that rx_samples_to_file and uhd_rx_cfile (as well
as GNU Radio application that does effectively the same thing, but
controls CPU affinity) all behave the exact same way in terms of the
number of overflows I have encountered.

If there are any other kernel tweaks that I can try, please let me
know.  However, in practice, my CPU is not fast enough to get through
the mainloop of rx_samples_to_file quickly enough.  As discussed
above, buffering was the only way I was able to consistently ensure
that my data was saved and I didn't encounter an overflow.

In conjunction with process priority 99 SCHED_FIFO, have you tried
forcing (almost) all IRQ handlers to be threaded IRQ handlers with the
"threadirqs" (IIRC) kernel command line option?  Then use 'ps -eLo
pid,cls,rtprio,pri,nice,comm | grep irq' to identify interrupt handlers
an adjusting them up and down, with chrt, according to which hardware
you need responding promptly vs. hardware that can wait.

Also try disabling hyperthreading in your BIOS.  A while ago I found
that the Linux scheduler was not HT aware, and one could get RT priority
inversions.  Maybe things have changed with the scheduler since then,
but it's easy enough to test.

Regards,
Andy

If you would like, I can provide whatever benchmarks of the system
that you would like.  Perhaps I am not using the correct tool when I
quote my minimum write speeds to disk.

On Sat, Oct 4, 2014 at 9:14 AM, Marcus Müller
usrp-users@lists.ettus.com wrote:
Hi Peter,

     didn't mean to confuse you! Actually, my job is doing the
     opposite (ie. providing useful information), and thus let me
     just shortly follow up on this:
     On 03.10.2014 17:44, Peter Witkowski via USRP-users wrote:
     

So I'm confused.

You state that if I can't use rx_samples_to_file, my system is failing to
perform as specified to write data out, then you give an example of several
things that can happen to create a stochastic write speed (which I totally
understand and agree with).  Given that writes can be stochastic, why is
there not a software buffer implemented in the UHD sample code to account
for such issues?

     Well, because that's, in my opinion, an operating system's
     job. Being a code example, rx_samples_to_file just *musn't*
     contain the complexity introduced when you try to implement
     buffering functionality smarter than what your OS can do. And,
     I do think it's nearly impossible to be smarter than the linux
     kernel when optimizing writes -- *but* you'll have to tell
     your kernel what you want, as a user. The kernel, as it is
     configured by any modern distribution by default, won't do
     enormous write buffers, because that's not what the user
     usually wants, increasing the risk of data loss in case of
     system failure, and because you usually don't want to spend
     all of your RAM on filesystem buffers. In your 64GB RAM case,
     though, default buffer sizes should suffice, I guess, so I'm a
     bit out of clues here.
     It is definitely not very hard to increase these buffers'
     sizes[1], so I encourage you to try it and see if that solves
     your problem. Now, I must admit that up to here I was always
     assuming you hadn't already played around with these values,
     if this is not the case, please accept my apologies! 
     

I understand that it's meant to be an example, but I've
also seen it referenced as being used effectively as a debugger or test for
people having issues (i.e. recommendation to use the UHD programs in place
of GNURadio to resolve issues).

     ...and it's done many users and thus Ettus a great job of
     supplying basic functionality! The fact that it works in
     almost any situation with this very minimalistic approach
     (repeated recv->write) proves that UHD is in fact a rather
     nice driver interface, IMHO. The fact that GNU Radio sometimes
     solves issues that rx_samples_to_file can't indicates exactly
     the buffering approach to be helpful. But in that case,
     buffering is not increased by increasing kernel buffer sizes,
     but by introducing GNU Radio buffers between blocks. The USRP
     source (Martin, scold me if I say something stupid) is not
     really much smarter than rx_samples_to_file: It recv()s a
     packet of samples, and returns these samples from the work
     function, and then GNU Radio takes care of shuffling and
     buffering that data. Basically, GNU Radio behaves much like an
     operating system from the source block's point of view.

Also, in terms of benchmarking, I'm quoting minimum values, not averages.
I agree with you that average values are pointless, and in reality the disk
subsystem needs to perform when called up.  My minimum values for a 4 disk
RAID0 with a dedicated controller are well within the data rate that I am
pushing.

     Well, I'll kind of disagree with you: If your minimum write
     rate of your system was bigger than the rate
     rx_samples_to_file causes, then you wouldn't see the problem.
     The point, I believe, here is that the storage system does not
     only consist of the hardware side of your RAID, but also on
     your complete operating environment. Something slows down how
     fast data is written to the RAID. 
     I think we both would expect the following to happen:
     
     repeatedly:
     
     rx_samples_to_file:
     uhd::rx_streamer::recv
         (blocks until a packet of samples has arrived. Instantly
     returns if it has before the call)
     write(file_handle, recv_buff)
         (instantly returns, because writing should hit a buffer
     that the operating system transparently pushes out to a disk.
     If buffer is full, then block until enough space in buffer --
     unless your filesystem is mounted with some sync option...)
     
     Now, if your RAID is definitely fast enough, the write buffer
     should never get full. My hypothesis here is that either, your
     buffer size is just to small, and a block of samples doesn't
     fit and has to be written out instantly (which is unlikely),
     or something else occupies your system. That might be just the
     fact that 400MB/s (are we talking about an X3x0?) inevitably
     places a heavy load on things like PCIe busses and CPUs, and
     that introduces a bottleneck in your storage chain which isn't
     there if you "just" benchmark without the USRP. Also, the
     rather smallish sizes of network packets dictate that
     journalling file systems introduce a very bad overhead -- I
     don't know if you benchmarked with files on a journaling file
     system and a (network packet size - header) block size...

Is there an example system that can handle sustained data capture from the
USRP at (or near the limits) of 10GigE or the PCIe interfaces (maybe the
requirement is enterprise class PCIe SSDs)?  I'm running a two socket Xenon
system (two hex core processors) with 64GB of RAM.  How much more hardware
should I throw at the problem to be able to sample/write at 100MS (half of
what is quoted on the website for bandwidth for the 10GigE kit) using the
provided code?

     Definitely a nice system! I must admit that I don't have
     access to a comparable setup, and thus I can't really offer
     you any first-hand experience. Maybe others can.

I think the issue here is that the code itself can't simply get through
it's main loop fast enough.  There's a difference between data bandwidth
and CPU throughput.  The sequential nature of the code means that if any
weird stuff happens (your example was a good set of kernel related hilarity
that can lead to stochastic timing) you will have overflows since you
cannot read fast enough.  This is why a 90% solution for my application was
to just set the dirty_background_ratio to 0 and also why redirection to
/dev/null makes overflows go away.

     This is interesting, as dirty_background_ratio is the
     percentage at which the kernel should start writing out dirty
     pages in the background. Now, I'm the one who's confused,
     because I would have expected this to negatively impact
     performance. On the other hand, 0 (at least in my head) does
     not make very much sense, maybe it's semantically identical to
     100%? Are you swapping (64GB would tell me you shouldn't have
     swap or extremly low swappiness)?
     On the other hand, it might really be that storage is not the
     bottleneck here, and in fact maybe the CPU gets saturated.
     Now, you said that writing to /dev/null solves your problem.
     Do your RAID or filesystem consume a lot of CPU cycles? This
     is an interesting mystery...

With either method I didn't have to
wait for a large write cache to flush before moving on to the next read
from the USRP.  Note that there can also be things that happen on the read
side as well.  Does this mean that I can only run the code on an RTOS?

     No :) UHD has it's own incoming buffer handlers, but as you
     already said, in this high performance scenario, you might be
     totally right, and our single-threaded approach just doesn't
     cut it. Maybe dropping in some asynchronous storage IO would
     help -- but I hate seeing that blowing up in example users'
     faces, so I guess the fact that it doesn't work with a system
     as potent as yours with the sample rates as high as you demand
     might actually be a shortcoming of the examples that isn't
     going to be fixed. 

As a final note, my understanding is that GNURadio and the USRP were
developed for domain experts in DSP to use.

     These are SDR frameworks and devices, respectively. The idea
     is to offer people with the opportunity to build awesome DSP
     systems using universally usable SDR blocks (GNU Radio) and
     universal software radio peripherals, so well, they certainly
     address DSP people, but they shouldn't be hard to use.

These users may or may not
have prior experience in software.  As a result, I'd recommend perhaps
adding a buffered example or have the USRP GNURadio block allow for
buffering.

     That is something we might consider. On the other hand, when
     someone goes as far as you do, maybe having an example that
     does the buffering in a separate thread (or even process)
     isn't worth that much -- in the end, one will want to write
     one's own high performance application, and that will include
     handling such data rates. 

Otherwise, I just don't see how you can advertise 200 MS/s
(maybe even a simple "buffer" block in GNURadio would do the trick?).

     Well, the devices support these rates, and our driver is able
     to withstand these rates and sustain them without hitting CPU
     barriers due to having too much overhead. That's awesome (ok,
     I might be biased, but *I* think it's awesome). I don't feel
     ashamed because on your specific setup, we can't find a way to
     make any of our generic examples deliver the full rate of rx
     streams to storage -- we sell RF hardware, and not storage
     infrastructure, and the point of the examples is demonstrating
     the usage of UHD, and not holding a lecture on high
     performance storage handling. I wish, though, that we could
     solve your problem.
     
     Now, GNU Radio/gr-uhd does in fact come with an application
     called uhd_rx_cfile, which is more or less a clone of
     rx_samples_to_file using gr-uhd and GNU Radio instead of raw
     UHD. Does that work out for you?
     

I
understand that this is theoretical limit of the bus, but if there doesn't
exist a driver or other software to make use of this, the practical limit
becomes much, much smaller.

     Well, UHD seems to be able to sustain these rates, if you
     write to /dev/null, right? So the practical limit for UHD is
     definitely not being hit. 
     I have another --maybe even practical-- suggestion to make:
     Roll your own buffer!
     
     mkfifo /tmp/mybuffer #assuming tmpfs is in ram
     dd if=/tmp/mybuffer of=/mount/raid_volume/data.dat & #start in
     background; you could play around with block sizes using the
     bs= option of dd
     rx_samples_to_file --file /tmp/mybuffer [all the other
     options]
     
     By the way: Thanks for bringing this up! We know that
     recording samples is a core concern of many users.
     
     Greetings,
     Marcus
     
     [1] https://www.kernel.org/doc/Documentation/sysctl/vm.txt
     

On Fri, Oct 3, 2014 at 10:55 AM, Marcus Müller usrp-users@lists.ettus.com
wrote:

I have to agree with Marcus on this. Also, keep in mind that storage is
really what an operating system should take care of in any "general
purpose" scenario, ie. that as long as I just write to a file, I'd expect
that the thing in charge of storage (my kernel / the filesystems / block
device drivers) does the best it can to keep up. If I find myself in a
situation where my specific storage needs dictate a huge write buffer,
changing the application might be one way, but as I'm responsible for my
won storage subsystem, I could just as well increase the cache buffer
sizes, and let the operating system handle storage operation. If your RAID
is really performing as well as it is benchmarked to, then this should not
be one of your problems. All rx_samples_to_file does is really sequentially
writing out data at a constant rate, which is the most basic write
benchmark I can think of.

If your storage subsystem (filesystem + storage abstraction + raid driver

  • interface driver + hard drive interface + hard drives + hardware caches)
    can't keep up, it's failing to perform as specified, simple as that. In
    this case, saying that the application needs to be smarter when dealing
    with storage seems like a bit of a cop-out to me ;)

I'd like to point out that most benchmarks use heavily averaged numbers
for write speeds etc. UHD on the other hand kind of demands soft real-time
performance of a write subsystem, which is a lot harder to fulfill. This
comes up rather frequently, but I have to stress it: you need a fast
guaranteed write rate, not only an average one, and as soon as your
operating system has to postpone writing data[1], it has to have enough
performance to catch up whilst still meeting continued demand. This is
general purpose hardware running general purpose OS with dozens of
processes, and you can't just say "every single component is up to my task,
thus my system suffices", because everything potentially blocks everything!

Greetings,
Marcus

[1] e.g. because the filesystems needs to calculate checksums, update
tables, another process gets scheduled, a device blocks your PCIe bus, your
platters randomly need a bit longer to seek, you reach the physical end of
an LVM volume and have to move across a disk, an interrupt does what an
interrupt does, some process is getting noticed on a changing file
descriptor, DBUS is happening in the kernel, token ring has run out of
tokens, thermal throttling, bitflips on SATA leading to retransmission,
some page getting fetched from swap...

On 03.10.2014 15:34, Marcus D. Leech via USRP-users wrote:

One has to keep firmly in mind that programs like rx_samples_to_file are
examples that show how to use

the underlying UHD API. They are not necessarily optimized for all
situations, and indeed, one could

restructure rx_samples_to_file to decouple UHD I/O from filesystem I/O,
using a large buffer between them.

The fact is that dynamic performance of high-speed, real-time, flows is
something that almost-invariably needs

tweaking for any particular situation. There's no way for an example
application to meet all those requirements.

But the fact also remains that for some systems, rx_samples_to_file
(and uhd_rx_cfile on the Gnu Radio side)

are able to stream high-speed data just fine as-is.

On 2014-10-03 09:26, Peter Witkowski via USRP-users wrote:

To say that the issue is just because the disk subsystem can't keep up is a bit of cop-out.

I had issues writing to disk when the incoming stream was 400MB/s and my RAID0 system was benchmarked at being much higher than that.

The issue that I've been seeing stems from the fact that it appears that you cannot concurrently read/write from the data stream as its coming in. In effect you have a main loop that reads from the device and then immediately tries to write that buffer to file. If you do not complete these operations in a timely fashion overflows occur.

One way to solve (or at least band aid the issue) is to set your dirty_background_ratio to 0. I was able to get writing to disk working somewhat with this setting as it is more predictable to directly write to disk instead of having your write cache fill up and then having a large amount of data to push to disk. That said, my RAID0 array is capable of such speeds and even then I was getting a few (but much reduced) overflows.

The one surefire way I know of getting this working (even on a slow disk system) is to buffer the data. The buffer can then be consumed by the disk writing process while being concurrently added onto by the device reader. The easiest way to test buffering (that I've found) is to simply set up a GNURadio Companion program with a stream-to-vector block between the USRP and file sink blocks. This is exactly what I am doing currently since even with a very powerful system, I could not get data saved to disk quickly enough given the aforementioned issues with the provided UHD software.

On Thu, Oct 2, 2014 at 11:48 PM, gsmandvoip via USRP-users usrp-users@lists.ettus.com usrp-users@lists.ettus.com wrote:

Thanks Marcus for your replies. Yes O gone away.

On Thu, Oct 2, 2014 at 5:50 PM, Marcus D. Leech mleech@ripnet.com mleech@ripnet.com wrote:

with rx_samples_to_file without _4rx.rbf, Initially I tried on my i3, 4GB ram, it gave me
some OOOO but was lesser than earlier, but I do not understand, my most of the ram capacity and processor was sitting idle while it shows OOOO, why is this strange behaviour The default format for uhd_rx_cfile is complex-float, thus doubling the amount of data written compared to rx_samples_to_file.

You can't just use CPU usage as an indicator of loading--if you're writing to disk, the disk subsystem may be much slower than you think, so the
"rate limiting step" is writes to the disk, not computational elements.

Try using /dev/null as the file that you write to. If the 'O' go away, even at higher sampling rates, then it's your disk subsystem.

using uhd_rx_cfile getting similar result, but strangely, why it is low, at 4M sampling rate it was higher???

On Thu, Oct 2, 2014 at 9:27 AM, Marcus D. Leech mleech@ripnet.com mleech@ripnet.com wrote:

On 10/01/2014 11:46 PM, gsmandvoip wrote:

Yes I am running single channel, but when trying to achieve my desired sampling rate without _4rx.rbf, it says, requested sampling rate is not valid, adjusting to some 3.9M or so. sorry for misleading info I gave earlier, I have i3, with 32 bit and i7 with 64 bit, but getting same result on both machines

Here is my command to capture signal:

./rx_samples_to_file --args="fpga=usrp1_fpga_4rx.rbf, subdev=DBSRX" --freq "$FC" --rate="$SR" $FILE --nsamps "$NSAMPLES"

and here is its output:

Creating the usrp device with: fpga=usrp1_fpga_4rx.rbf, subdev=DBSRX...
-- Loading firmware image: /usr/share/uhd/images/usrp1_fw.ihx... done
-- Opening a USRP1 device...
-- Loading FPGA image: /usr/share/uhd/images/usrp1_
fpga_4rx.rbf... done
-- Using FPGA clock rate of 52.000000MHz...
ERROR: LOOKUPERROR: INDEXERROR: MULTI_USRP::GET_TX_SUBDEV_SPEC(0) FAILED TO MAKE DEFAULT SPEC - VALUEERROR: THE SUBDEVICE SPECIFICATION "A:0" IS TOO LONG.
The user specified 1 channels, but there are only 0 tx dsps on mboard 0.

Don't use the _4rx image if you don't need it.

The USRP1 only does strict-integer resampling, and with a master clock (NON STANDARD FOR USRP1) of 52.000MHz, 4Msps is not a sample rate
that it can produce. Try 5.2Msps or 4.3333Msps.

At 5.2Msps, it's recording at roughly 20.8Mbytes/second, so your system needs to be able to sustain that for at least as long as the capture lasts.


USRP-users mailing listUSRP-users@lists.ettus.comhttp://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com


USRP-users mailing list
USRP-users@lists.ettus.com
http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com

     _______________________________________________
     USRP-users mailing list
     USRP-users@lists.ettus.com
     http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com
     

--
Peter Witkowski
pwitkowski@gmail.com


USRP-users mailing list
USRP-users@lists.ettus.com
http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com

On Mon, 2014-10-06 at 15:33 -0400, Peter Witkowski via USRP-users wrote: > Marcus, > > > Great discussion so far. > > I think my point is that earlier I saw that it was alluded to that if > rx_samples_to_file fails, then your disk subsystem is to blame and you > should invest in more hardware. My point was that the application in > general can benefit from multi-threading and internal buffering, > otherwise overflows are a fact of life for high data rates. While the > OS can do more buffering, you are required to do periodic (at a rather > high rate) reads of the USRP, otherwise its buffers will overflow. > This is why it is critical to minimize the time between two successive > reads via threading and having a buffer that can concurrently be added > onto (from the device) and processed (by writing to disk). > > > I think we can agree that there's a delta between what the I/O can > sustain (and what is displayed on a synthetic benchmark) versus what > is actually happening on the machine. There's potentially a good deal > of CPU cycles that occur between two reads (a vast percentage of these > are during disk I/O, especially if the device is "busy") that seem to > be causing the overflow. Stated differently, I can have a bunch of > blocking reads occur and sustain the needed rate. Similarly (and this > is what I see via my benchmarks), I can have a series of blocking > writes occur and sustain the needed data rate. However, the > combination thereof (as well as the fact that the kernel likes to > preempt things) might be unsustainable in a single threaded > application. That is, by the time I get to the next read, enough time > has passed to overwhelm the USRP read buffers. > > > > My point with the dirty background ratio is that when set to zero, I > have a predictable amount of time that I block for each write. My > understanding is that once triggered, background flushes will attempt > to write at 100% device speed. During this time, additional device > access (such as those being called by the application) will be blocked > (sequential writes are therefore performance degraded). When set to > 0, I see in iotop that my writes are a consistent value which is right > about where resource monitor shows the incoming 10GigE bandwidth to > be. Setting the dirty_background_ratio any higher and the "Total Disk > Write" and "Actual Disk Write" differ wildly. "Actual Disk Write" > will spike at times, and these spikes correlate with additional > overflow errors. When the device is more stable (i.e. "Actual Disk > Write" is consistent) I see a huge reduction in overflows. Stated > differently, I'm OK with a flush, as long as its a quick one. In the > cases of larger flushes (as is the case with a dirty_background_ratio > greater than 0), I can almost guarantee an overflow (I can't get > around the mainloop fast enough if the device is busy with a long > write). Note that for my buffered application, I do run a default > background ratio. > > > The only thing that I got to work was buffering, and the easiest way I > know to accomplish this is to place a stream-to-vector block (in > GNURadio) between the USRP source and the file sink. The > vectorization effectively works as a buffer, and was the only way I > was able to get 100 MS/s working. > > Other things I tried: > > 1. Using CPU affinity and shielding. > > 2. Increasing processes priority to 99 and SCHED_FIFO. > > > Also, I can confirm that rx_samples_to_file and uhd_rx_cfile (as well > as GNU Radio application that does effectively the same thing, but > controls CPU affinity) all behave the exact same way in terms of the > number of overflows I have encountered. > > If there are any other kernel tweaks that I can try, please let me > know. However, in practice, my CPU is not fast enough to get through > the mainloop of rx_samples_to_file quickly enough. As discussed > above, buffering was the only way I was able to consistently ensure > that my data was saved and I didn't encounter an overflow. In conjunction with process priority 99 SCHED_FIFO, have you tried forcing (almost) all IRQ handlers to be threaded IRQ handlers with the "threadirqs" (IIRC) kernel command line option? Then use 'ps -eLo pid,cls,rtprio,pri,nice,comm | grep irq' to identify interrupt handlers an adjusting them up and down, with chrt, according to which hardware you need responding promptly vs. hardware that can wait. Also try disabling hyperthreading in your BIOS. A while ago I found that the Linux scheduler was not HT aware, and one could get RT priority inversions. Maybe things have changed with the scheduler since then, but it's easy enough to test. Regards, Andy > > If you would like, I can provide whatever benchmarks of the system > that you would like. Perhaps I am not using the correct tool when I > quote my minimum write speeds to disk. > > On Sat, Oct 4, 2014 at 9:14 AM, Marcus Müller > <usrp-users@lists.ettus.com> wrote: > Hi Peter, > > didn't mean to confuse you! Actually, my job is doing the > opposite (ie. providing useful information), and thus let me > just shortly follow up on this: > On 03.10.2014 17:44, Peter Witkowski via USRP-users wrote: > > > So I'm confused. > > > > You state that if I can't use rx_samples_to_file, my system is failing to > > perform as specified to write data out, then you give an example of several > > things that can happen to create a stochastic write speed (which I totally > > understand and agree with). Given that writes can be stochastic, why is > > there not a software buffer implemented in the UHD sample code to account > > for such issues? > Well, because that's, in my opinion, an operating system's > job. Being a code example, rx_samples_to_file just *musn't* > contain the complexity introduced when you try to implement > buffering functionality smarter than what your OS can do. And, > I do think it's nearly impossible to be smarter than the linux > kernel when optimizing writes -- *but* you'll have to tell > your kernel what you want, as a user. The kernel, as it is > configured by any modern distribution by default, won't do > enormous write buffers, because that's not what the user > usually wants, increasing the risk of data loss in case of > system failure, and because you usually don't want to spend > all of your RAM on filesystem buffers. In your 64GB RAM case, > though, default buffer sizes should suffice, I guess, so I'm a > bit out of clues here. > It is definitely not very hard to increase these buffers' > sizes[1], so I encourage you to try it and see if that solves > your problem. Now, I must admit that up to here I was always > assuming you hadn't already played around with these values, > if this is not the case, please accept my apologies! > > > I understand that it's meant to be an example, but I've > > also seen it referenced as being used effectively as a debugger or test for > > people having issues (i.e. recommendation to use the UHD programs in place > > of GNURadio to resolve issues). > ...and it's done many users and thus Ettus a great job of > supplying basic functionality! The fact that it works in > almost any situation with this very minimalistic approach > (repeated recv->write) proves that UHD is in fact a rather > nice driver interface, IMHO. The fact that GNU Radio sometimes > solves issues that rx_samples_to_file can't indicates exactly > the buffering approach to be helpful. But in that case, > buffering is not increased by increasing kernel buffer sizes, > but by introducing GNU Radio buffers between blocks. The USRP > source (Martin, scold me if I say something stupid) is not > really much smarter than rx_samples_to_file: It recv()s a > packet of samples, and returns these samples from the work > function, and then GNU Radio takes care of shuffling and > buffering that data. Basically, GNU Radio behaves much like an > operating system from the source block's point of view. > > Also, in terms of benchmarking, I'm quoting minimum values, not averages. > > I agree with you that average values are pointless, and in reality the disk > > subsystem needs to perform when called up. My minimum values for a 4 disk > > RAID0 with a dedicated controller are well within the data rate that I am > > pushing. > Well, I'll kind of disagree with you: If your minimum write > rate of your system was bigger than the rate > rx_samples_to_file causes, then you wouldn't see the problem. > The point, I believe, here is that the storage system does not > only consist of the hardware side of your RAID, but also on > your complete operating environment. Something slows down how > fast data is written to the RAID. > I think we both would expect the following to happen: > > repeatedly: > > rx_samples_to_file: > uhd::rx_streamer::recv > (blocks until a packet of samples has arrived. Instantly > returns if it has before the call) > write(file_handle, recv_buff) > (instantly returns, because writing should hit a buffer > that the operating system transparently pushes out to a disk. > If buffer is full, then block until enough space in buffer -- > unless your filesystem is mounted with some sync option...) > > Now, if your RAID is definitely fast enough, the write buffer > should never get full. My hypothesis here is that either, your > buffer size is just to small, and a block of samples doesn't > fit and has to be written out instantly (which is unlikely), > or something else occupies your system. That might be just the > fact that 400MB/s (are we talking about an X3x0?) inevitably > places a heavy load on things like PCIe busses and CPUs, and > that introduces a bottleneck in your storage chain which isn't > there if you "just" benchmark without the USRP. Also, the > rather smallish sizes of network packets dictate that > journalling file systems introduce a very bad overhead -- I > don't know if you benchmarked with files on a journaling file > system and a (network packet size - header) block size... > > Is there an example system that can handle sustained data capture from the > > USRP at (or near the limits) of 10GigE or the PCIe interfaces (maybe the > > requirement is enterprise class PCIe SSDs)? I'm running a two socket Xenon > > system (two hex core processors) with 64GB of RAM. How much more hardware > > should I throw at the problem to be able to sample/write at 100MS (half of > > what is quoted on the website for bandwidth for the 10GigE kit) using the > > provided code? > Definitely a nice system! I must admit that I don't have > access to a comparable setup, and thus I can't really offer > you any first-hand experience. Maybe others can. > > I think the issue here is that the code itself can't simply get through > > it's main loop fast enough. There's a difference between data bandwidth > > and CPU throughput. The sequential nature of the code means that if any > > weird stuff happens (your example was a good set of kernel related hilarity > > that can lead to stochastic timing) you will have overflows since you > > cannot read fast enough. This is why a 90% solution for my application was > > to just set the dirty_background_ratio to 0 and also why redirection to > > /dev/null makes overflows go away. > This is interesting, as dirty_background_ratio is the > percentage at which the kernel should start writing out dirty > pages in the background. Now, I'm the one who's confused, > because I would have expected this to negatively impact > performance. On the other hand, 0 (at least in my head) does > not make very much sense, maybe it's semantically identical to > 100%? Are you swapping (64GB would tell me you shouldn't have > swap or extremly low swappiness)? > On the other hand, it might really be that storage is not the > bottleneck here, and in fact maybe the CPU gets saturated. > Now, you said that writing to /dev/null solves your problem. > Do your RAID or filesystem consume a lot of CPU cycles? This > is an interesting mystery... > > With either method I didn't have to > > wait for a large write cache to flush before moving on to the next read > > from the USRP. Note that there can also be things that happen on the read > > side as well. Does this mean that I can only run the code on an RTOS? > No :) UHD has it's own incoming buffer handlers, but as you > already said, in this high performance scenario, you might be > totally right, and our single-threaded approach just doesn't > cut it. Maybe dropping in some asynchronous storage IO would > help -- but I hate seeing that blowing up in example users' > faces, so I guess the fact that it doesn't work with a system > as potent as yours with the sample rates as high as you demand > might actually be a shortcoming of the examples that isn't > going to be fixed. > > As a final note, my understanding is that GNURadio and the USRP were > > developed for domain experts in DSP to use. > These are SDR frameworks and devices, respectively. The idea > is to offer people with the opportunity to build awesome DSP > systems using universally usable SDR blocks (GNU Radio) and > universal software radio peripherals, so well, they certainly > address DSP people, but they shouldn't be hard to use. > > These users may or may not > > have prior experience in software. As a result, I'd recommend perhaps > > adding a buffered example or have the USRP GNURadio block allow for > > buffering. > That is something we might consider. On the other hand, when > someone goes as far as you do, maybe having an example that > does the buffering in a separate thread (or even process) > isn't worth that much -- in the end, one will want to write > one's own high performance application, and that will include > handling such data rates. > > Otherwise, I just don't see how you can advertise 200 MS/s > > (maybe even a simple "buffer" block in GNURadio would do the trick?). > Well, the devices support these rates, and our driver is able > to withstand these rates and sustain them without hitting CPU > barriers due to having too much overhead. That's awesome (ok, > I might be biased, but *I* think it's awesome). I don't feel > ashamed because on your specific setup, we can't find a way to > make any of our generic examples deliver the full rate of rx > streams to storage -- we sell RF hardware, and not storage > infrastructure, and the point of the examples is demonstrating > the usage of UHD, and not holding a lecture on high > performance storage handling. I wish, though, that we could > solve your problem. > > Now, GNU Radio/gr-uhd does in fact come with an application > called uhd_rx_cfile, which is more or less a clone of > rx_samples_to_file using gr-uhd and GNU Radio instead of raw > UHD. Does that work out for you? > > > I > > understand that this is theoretical limit of the bus, but if there doesn't > > exist a driver or other software to make use of this, the practical limit > > becomes much, much smaller. > Well, UHD seems to be able to sustain these rates, if you > write to /dev/null, right? So the practical limit for UHD is > definitely not being hit. > I have another --maybe even practical-- suggestion to make: > Roll your own buffer! > > mkfifo /tmp/mybuffer #assuming tmpfs is in ram > dd if=/tmp/mybuffer of=/mount/raid_volume/data.dat & #start in > background; you could play around with block sizes using the > bs= option of dd > rx_samples_to_file --file /tmp/mybuffer [all the other > options] > > By the way: Thanks for bringing this up! We know that > recording samples is a core concern of many users. > > Greetings, > Marcus > > [1] https://www.kernel.org/doc/Documentation/sysctl/vm.txt > > > On Fri, Oct 3, 2014 at 10:55 AM, Marcus Müller <usrp-users@lists.ettus.com> > > wrote: > > > > > I have to agree with Marcus on this. Also, keep in mind that storage is > > > really what an operating system should take care of in any "general > > > purpose" scenario, ie. that as long as I just write to a file, I'd expect > > > that the thing in charge of storage (my kernel / the filesystems / block > > > device drivers) does the best it can to keep up. If I find myself in a > > > situation where my specific storage needs dictate a huge write buffer, > > > changing the application might be one way, but as I'm responsible for my > > > won storage subsystem, I could just as well increase the cache buffer > > > sizes, and let the operating system handle storage operation. If your RAID > > > is really performing as well as it is benchmarked to, then this should not > > > be one of your problems. All rx_samples_to_file does is really sequentially > > > writing out data at a constant rate, which is the most basic write > > > benchmark I can think of. > > > > > > If your storage subsystem (filesystem + storage abstraction + raid driver > > > + interface driver + hard drive interface + hard drives + hardware caches) > > > can't keep up, it's failing to perform as specified, simple as that. In > > > this case, saying that the application needs to be smarter when dealing > > > with storage seems like a bit of a cop-out to me ;) > > > > > > I'd like to point out that most benchmarks use heavily averaged numbers > > > for write speeds etc. UHD on the other hand kind of demands soft real-time > > > performance of a write subsystem, which is a lot harder to fulfill. This > > > comes up rather frequently, but I have to stress it: you need a fast > > > guaranteed write rate, not only an average one, and as soon as your > > > operating system has to postpone writing data[1], it has to have enough > > > performance to catch up whilst still meeting continued demand. This is > > > general purpose hardware running general purpose OS with dozens of > > > processes, and you can't just say "every single component is up to my task, > > > thus my system suffices", because everything potentially blocks everything! > > > > > > Greetings, > > > Marcus > > > > > > [1] e.g. because the filesystems needs to calculate checksums, update > > > tables, another process gets scheduled, a device blocks your PCIe bus, your > > > platters randomly need a bit longer to seek, you reach the physical end of > > > an LVM volume and have to move across a disk, an interrupt does what an > > > interrupt does, some process is getting noticed on a changing file > > > descriptor, DBUS is happening in the kernel, token ring has run out of > > > tokens, thermal throttling, bitflips on SATA leading to retransmission, > > > some page getting fetched from swap... > > > > > > > > > On 03.10.2014 15:34, Marcus D. Leech via USRP-users wrote: > > > > > > > > > > > > One has to keep firmly in mind that programs like rx_samples_to_file are > > > *examples* that show how to use > > > > > > the underlying UHD API. They are not necessarily optimized for all > > > situations, and indeed, one could > > > > > > restructure rx_samples_to_file to decouple UHD I/O from filesystem I/O, > > > using a large buffer between them. > > > > > > The fact is that dynamic performance of high-speed, real-time, flows is > > > something that almost-invariably needs > > > > > > tweaking for any particular situation. There's no way for an example > > > application to meet all those requirements. > > > > > > But the fact also remains that for *some* systems, rx_samples_to_file > > > (and uhd_rx_cfile on the Gnu Radio side) > > > > > > are able to stream high-speed data just fine as-is. > > > > > > On 2014-10-03 09:26, Peter Witkowski via USRP-users wrote: > > > > > > > > > To say that the issue is just because the disk subsystem can't keep up is a bit of cop-out. > > > > > > I had issues writing to disk when the incoming stream was 400MB/s and my RAID0 system was benchmarked at being much higher than that. > > > > > > The issue that I've been seeing stems from the fact that it appears that you cannot concurrently read/write from the data stream as its coming in. In effect you have a main loop that reads from the device and then immediately tries to write that buffer to file. If you do not complete these operations in a timely fashion overflows occur. > > > > > > One way to solve (or at least band aid the issue) is to set your dirty_background_ratio to 0. I was able to get writing to disk working somewhat with this setting as it is more predictable to directly write to disk instead of having your write cache fill up and then having a large amount of data to push to disk. That said, my RAID0 array is capable of such speeds and even then I was getting a few (but much reduced) overflows. > > > > > > The one surefire way I know of getting this working (even on a slow disk system) is to buffer the data. The buffer can then be consumed by the disk writing process while being concurrently added onto by the device reader. The easiest way to test buffering (that I've found) is to simply set up a GNURadio Companion program with a stream-to-vector block between the USRP and file sink blocks. This is exactly what I am doing currently since even with a very powerful system, I could not get data saved to disk quickly enough given the aforementioned issues with the provided UHD software. > > > > > > On Thu, Oct 2, 2014 at 11:48 PM, gsmandvoip via USRP-users <usrp-users@lists.ettus.com> <usrp-users@lists.ettus.com> wrote: > > > > > > Thanks Marcus for your replies. Yes O gone away. > > > > > > On Thu, Oct 2, 2014 at 5:50 PM, Marcus D. Leech <mleech@ripnet.com> <mleech@ripnet.com> wrote: > > > > > > with rx_samples_to_file without _4rx.rbf, Initially I tried on my i3, 4GB ram, it gave me > > > some OOOO but was lesser than earlier, but I do not understand, my most of the ram capacity and processor was sitting idle while it shows OOOO, why is this strange behaviour The default format for uhd_rx_cfile is complex-float, thus doubling the amount of data written compared to rx_samples_to_file. > > > > > > You can't just use CPU usage as an indicator of loading--if you're writing to disk, the disk subsystem may be much slower than you think, so the > > > "rate limiting step" is writes to the disk, not computational elements. > > > > > > Try using /dev/null as the file that you write to. If the 'O' go away, even at higher sampling rates, then it's your disk subsystem. > > > > > > using uhd_rx_cfile getting similar result, but strangely, why it is low, at 4M sampling rate it was higher??? > > > > > > On Thu, Oct 2, 2014 at 9:27 AM, Marcus D. Leech <mleech@ripnet.com> <mleech@ripnet.com> wrote: > > > > > > On 10/01/2014 11:46 PM, gsmandvoip wrote: > > > > > > Yes I am running single channel, but when trying to achieve my desired sampling rate without _4rx.rbf, it says, requested sampling rate is not valid, adjusting to some 3.9M or so. sorry for misleading info I gave earlier, I have i3, with 32 bit and i7 with 64 bit, but getting same result on both machines > > > > > > Here is my command to capture signal: > > > > > > ./rx_samples_to_file --args="fpga=usrp1_fpga_4rx.rbf, subdev=DBSRX" --freq "$FC" --rate="$SR" $FILE --nsamps "$NSAMPLES" > > > > > > and here is its output: > > > > > > Creating the usrp device with: fpga=usrp1_fpga_4rx.rbf, subdev=DBSRX... > > > -- Loading firmware image: /usr/share/uhd/images/usrp1_fw.ihx... done > > > -- Opening a USRP1 device... > > > -- Loading FPGA image: /usr/share/uhd/images/usrp1_ > > > fpga_4rx.rbf... done > > > -- Using FPGA clock rate of 52.000000MHz... > > > ERROR: LOOKUPERROR: INDEXERROR: MULTI_USRP::GET_TX_SUBDEV_SPEC(0) FAILED TO MAKE DEFAULT SPEC - VALUEERROR: THE SUBDEVICE SPECIFICATION "A:0" IS TOO LONG. > > > The user specified 1 channels, but there are only 0 tx dsps on mboard 0. > > > > > > Don't use the _4rx image if you don't need it. > > > > > > The USRP1 only does strict-integer resampling, and with a master clock (NON STANDARD FOR USRP1) of 52.000MHz, 4Msps is not a sample rate > > > that it can produce. Try 5.2Msps or 4.3333Msps. > > > > > > At 5.2Msps, it's recording at roughly 20.8Mbytes/second, so your system needs to be able to sustain that for at least as long as the capture lasts. > > > > > > > > > > > > _______________________________________________ > > > USRP-users mailing listUSRP-users@lists.ettus.comhttp://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com > > > > > > > > > > > > _______________________________________________ > > > USRP-users mailing list > > > USRP-users@lists.ettus.com > > > http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com > > > > > > > > > > > > _______________________________________________ > > USRP-users mailing list > > USRP-users@lists.ettus.com > > http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com > > > > _______________________________________________ > USRP-users mailing list > USRP-users@lists.ettus.com > http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com > > > > > -- > Peter Witkowski > pwitkowski@gmail.com > _______________________________________________ > USRP-users mailing list > USRP-users@lists.ettus.com > http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com
MM
Marcus Müller
Mon, Oct 6, 2014 9:36 PM

Hi Peter,

nice discussion indeed :)!
Phew, it really seems you got the most out of your system; if you want a
quick list of things in my head you'd want to check:

  • NIC interrupt coalescing
  • PCIe switching load: possibly we're just maxing out the PCIe
    controller; sadly, I must admit I have no idea how likely that is, and
    how to check.

So, to my understanding, linux kernels since some time now basically do
multi-threading in the sense that if you have a multicore system, IO
handling, especially flushing buffers to disk, should occur concurrently
and transparently to your UHD application, and that's the point I was
after: Why do home-brew IO buffering with multiple threads, if your OS
just does the same, but better?

The strange thing here is, and that indicates that even user land
multithreading might not be able to fix this, is that the GNU Radio
example doesn't affect the number of overruns -- it should, because
modern GNU Radio (with the default thread-per-block scheduler) is
inherently multithreaded. Stream-to-vector doesn't actually do
something, so it shouldn't help, but just "redefines" the stream item
size. Maybe the magic here lies in the fact that it's an additional
block, adding another layer of buffers.

So, I'll give Robert's mail a few thoughts and head over to bed,

Greetings,
Marcus

On 06.10.2014 21:48, Peter Witkowski via USRP-users wrote:

Forgot to add:

I have all the recommended kernel tweaks for 10GigE running and found no
difference in number of overflows vs. the network buffer size once you pass
the point where UHD no longer throws a warning for your network buffer size
being too small (if I recall correctly I think this is at 32MB or so?).  I
can provide a copy of my sysctl.conf file if necessary.

I really hope that there's a kernel setting that I'm not using properly,
otherwise I just don't see how the single-threaded approach can work for
high data rates (in both the provided UHD programs and in user-built
GNURadio programs).  As I have mentioned, it is possible to get GNURadio to
do some multi-threading and concurrent producer/consumer behavior, but this
requires a re-purposing of the stream-to-vector block as it currently
stands (or "rolling your own").

On Mon, Oct 6, 2014 at 3:33 PM, Peter Witkowski pwitkowski@gmail.com
wrote:

Marcus,

Great discussion so far.

I think my point is that earlier I saw that it was alluded to that if
rx_samples_to_file fails, then your disk subsystem is to blame and you
should invest in more hardware.  My point was that the application in
general can benefit from multi-threading and internal buffering, otherwise
overflows are a fact of life for high data rates.  While the OS can do more
buffering, you are required to do periodic (at a rather high rate) reads of
the USRP, otherwise its buffers will overflow.  This is why it is critical
to minimize the time between two successive reads via threading and having
a buffer that can concurrently be added onto (from the device) and
processed (by writing to disk).

I think we can agree that there's a delta between what the I/O can sustain
(and what is displayed on a synthetic benchmark) versus what is actually
happening on the machine.  There's potentially a good deal of CPU cycles
that occur between two reads (a vast percentage of these are during disk
I/O, especially if the device is "busy") that seem to be causing the
overflow.  Stated differently, I can have a bunch of blocking reads occur
and sustain the needed rate.  Similarly (and this is what I see via my
benchmarks), I can have a series of blocking writes occur and sustain the
needed data rate.  However, the combination thereof (as well as the fact
that the kernel likes to preempt things) might be unsustainable in a single
threaded application.  That is, by the time I get to the next read, enough
time has passed to overwhelm the USRP read buffers.

My point with the dirty background ratio is that when set to zero, I have
a predictable amount of time that I block for each write.  My understanding
is that once triggered, background flushes will attempt to write at 100%
device speed.  During this time, additional device access (such as those
being called by the application) will be blocked (sequential writes are
therefore performance degraded).  When set to 0, I see in iotop that my
writes are a consistent value which is right about where resource monitor
shows the incoming 10GigE bandwidth to be.  Setting the
dirty_background_ratio any higher and the "Total Disk Write" and "Actual
Disk Write" differ wildly.  "Actual Disk Write" will spike at times, and
these spikes correlate with additional overflow errors.  When the device is
more stable (i.e. "Actual Disk Write" is consistent) I see a huge reduction
in overflows.  Stated differently, I'm OK with a flush, as long as its a
quick one.  In the cases of larger flushes (as is the case with a
dirty_background_ratio greater than 0), I can almost guarantee an overflow
(I can't get around the mainloop fast enough if the device is busy with a
long write).  Note that for my buffered application, I do run a default
background ratio.

The only thing that I got to work was buffering, and the easiest way I
know to accomplish this is to place a stream-to-vector block (in GNURadio)
between the USRP source and the file sink.  The vectorization effectively
works as a buffer, and was the only way I was able to get 100 MS/s working.

Other things I tried:

  1. Using CPU affinity and shielding.
  2. Increasing processes priority to 99 and SCHED_FIFO.

Also, I can confirm that rx_samples_to_file and uhd_rx_cfile (as well as
GNU Radio application that does effectively the same thing, but controls
CPU affinity) all behave the exact same way in terms of the number of
overflows I have encountered.

If there are any other kernel tweaks that I can try, please let me know.
However, in practice, my CPU is not fast enough to get through the mainloop
of rx_samples_to_file quickly enough.  As discussed above, buffering was
the only way I was able to consistently ensure that my data was saved and I
didn't encounter an overflow.

If you would like, I can provide whatever benchmarks of the system that
you would like.  Perhaps I am not using the correct tool when I quote my
minimum write speeds to disk.

On Sat, Oct 4, 2014 at 9:14 AM, Marcus Müller usrp-users@lists.ettus.com
wrote:

Hi Peter,

didn't mean to confuse you! Actually, my job is doing the opposite (ie.
providing useful information), and thus let me just shortly follow up on
this:
On 03.10.2014 17:44, Peter Witkowski via USRP-users wrote:

So I'm confused.

You state that if I can't use rx_samples_to_file, my system is failing to
perform as specified to write data out, then you give an example of several
things that can happen to create a stochastic write speed (which I totally
understand and agree with).  Given that writes can be stochastic, why is
there not a software buffer implemented in the UHD sample code to account
for such issues?

Well, because that's, in my opinion, an operating system's job. Being a
code example, rx_samples_to_file just musn't contain the complexity
introduced when you try to implement buffering functionality smarter than
what your OS can do. And, I do think it's nearly impossible to be smarter
than the linux kernel when optimizing writes -- but you'll have to tell
your kernel what you want, as a user. The kernel, as it is configured by
any modern distribution by default, won't do enormous write buffers,
because that's not what the user usually wants, increasing the risk of data
loss in case of system failure, and because you usually don't want to spend
all of your RAM on filesystem buffers. In your 64GB RAM case, though,
default buffer sizes should suffice, I guess, so I'm a bit out of clues
here.
It is definitely not very hard to increase these buffers' sizes[1], so I
encourage you to try it and see if that solves your problem. Now, I must
admit that up to here I was always assuming you hadn't already played
around with these values, if this is not the case, please accept my
apologies!

I understand that it's meant to be an example, but I've
also seen it referenced as being used effectively as a debugger or test for
people having issues (i.e. recommendation to use the UHD programs in place
of GNURadio to resolve issues).

...and it's done many users and thus Ettus a great job of supplying
basic functionality! The fact that it works in almost any situation with
this very minimalistic approach (repeated recv->write) proves that UHD is
in fact a rather nice driver interface, IMHO. The fact that GNU Radio
sometimes solves issues that rx_samples_to_file can't indicates exactly the
buffering approach to be helpful. But in that case, buffering is not
increased by increasing kernel buffer sizes, but by introducing GNU Radio
buffers between blocks. The USRP source (Martin, scold me if I say
something stupid) is not really much smarter than rx_samples_to_file: It
recv()s a packet of samples, and returns these samples from the work
function, and then GNU Radio takes care of shuffling and buffering that
data. Basically, GNU Radio behaves much like an operating system from the
source block's point of view.

Also, in terms of benchmarking, I'm quoting minimum values, not averages.
I agree with you that average values are pointless, and in reality the disk
subsystem needs to perform when called up.  My minimum values for a 4 disk
RAID0 with a dedicated controller are well within the data rate that I am
pushing.

Well, I'll kind of disagree with you: If your minimum write rate of your
system was bigger than the rate rx_samples_to_file causes, then you
wouldn't see the problem. The point, I believe, here is that the storage
system does not only consist of the hardware side of your RAID, but also on
your complete operating environment. Something slows down how fast data is
written to the RAID.
I think we both would expect the following to happen:

repeatedly:

rx_samples_to_file:
uhd::rx_streamer::recv
(blocks until a packet of samples has arrived. Instantly returns if
it has before the call)
write(file_handle, recv_buff)
(instantly returns, because writing should hit a buffer that the
operating system transparently pushes out to a disk. If buffer is full,
then block until enough space in buffer -- unless your filesystem is
mounted with some sync option...)

Now, if your RAID is definitely fast enough, the write buffer should
never get full. My hypothesis here is that either, your buffer size is just
to small, and a block of samples doesn't fit and has to be written out
instantly (which is unlikely), or something else occupies your system. That
might be just the fact that 400MB/s (are we talking about an X3x0?)
inevitably places a heavy load on things like PCIe busses and CPUs, and
that introduces a bottleneck in your storage chain which isn't there if you
"just" benchmark without the USRP. Also, the rather smallish sizes of
network packets dictate that journalling file systems introduce a very bad
overhead -- I don't know if you benchmarked with files on a journaling file
system and a (network packet size - header) block size...

Is there an example system that can handle sustained data capture from the
USRP at (or near the limits) of 10GigE or the PCIe interfaces (maybe the
requirement is enterprise class PCIe SSDs)?  I'm running a two socket Xenon
system (two hex core processors) with 64GB of RAM.  How much more hardware
should I throw at the problem to be able to sample/write at 100MS (half of
what is quoted on the website for bandwidth for the 10GigE kit) using the
provided code?

Definitely a nice system! I must admit that I don't have access to a
comparable setup, and thus I can't really offer you any first-hand
experience. Maybe others can.

I think the issue here is that the code itself can't simply get through
it's main loop fast enough.  There's a difference between data bandwidth
and CPU throughput.  The sequential nature of the code means that if any
weird stuff happens (your example was a good set of kernel related hilarity
that can lead to stochastic timing) you will have overflows since you
cannot read fast enough.  This is why a 90% solution for my application was
to just set the dirty_background_ratio to 0 and also why redirection to
/dev/null makes overflows go away.

This is interesting, as dirty_background_ratio is the percentage at
which the kernel should start writing out dirty pages in the background.
Now, I'm the one who's confused, because I would have expected this to
negatively impact performance. On the other hand, 0 (at least in my head)
does not make very much sense, maybe it's semantically identical to 100%?
Are you swapping (64GB would tell me you shouldn't have swap or extremly
low swappiness)?
On the other hand, it might really be that storage is not the bottleneck
here, and in fact maybe the CPU gets saturated. Now, you said that writing
to /dev/null solves your problem. Do your RAID or filesystem consume a lot
of CPU cycles? This is an interesting mystery...

With either method I didn't have to
wait for a large write cache to flush before moving on to the next read
from the USRP.  Note that there can also be things that happen on the read
side as well.  Does this mean that I can only run the code on an RTOS?

No :) UHD has it's own incoming buffer handlers, but as you already
said, in this high performance scenario, you might be totally right, and
our single-threaded approach just doesn't cut it. Maybe dropping in some
asynchronous storage IO would help -- but I hate seeing that blowing up in
example users' faces, so I guess the fact that it doesn't work with a
system as potent as yours with the sample rates as high as you demand might
actually be a shortcoming of the examples that isn't going to be fixed.

As a final note, my understanding is that GNURadio and the USRP were
developed for domain experts in DSP to use.

These are SDR frameworks and devices, respectively. The idea is to offer
people with the opportunity to build awesome DSP systems using universally
usable SDR blocks (GNU Radio) and universal software radio peripherals, so
well, they certainly address DSP people, but they shouldn't be hard to use.

These users may or may not
have prior experience in software.  As a result, I'd recommend perhaps
adding a buffered example or have the USRP GNURadio block allow for
buffering.

That is something we might consider. On the other hand, when someone
goes as far as you do, maybe having an example that does the buffering in a
separate thread (or even process) isn't worth that much -- in the end, one
will want to write one's own high performance application, and that will
include handling such data rates.

Otherwise, I just don't see how you can advertise 200 MS/s
(maybe even a simple "buffer" block in GNURadio would do the trick?).

Well, the devices support these rates, and our driver is able to
withstand these rates and sustain them without hitting CPU barriers due to
having too much overhead. That's awesome (ok, I might be biased, but I
think it's awesome). I don't feel ashamed because on your specific setup,
we can't find a way to make any of our generic examples deliver the full
rate of rx streams to storage -- we sell RF hardware, and not storage
infrastructure, and the point of the examples is demonstrating the usage of
UHD, and not holding a lecture on high performance storage handling. I
wish, though, that we could solve your problem.

Now, GNU Radio/gr-uhd does in fact come with an application called
uhd_rx_cfile, which is more or less a clone of rx_samples_to_file using
gr-uhd and GNU Radio instead of raw UHD. Does that work out for you?

I
understand that this is theoretical limit of the bus, but if there doesn't
exist a driver or other software to make use of this, the practical limit
becomes much, much smaller.

Well, UHD seems to be able to sustain these rates, if you write to
/dev/null, right? So the practical limit for UHD is definitely not being
hit.
I have another --maybe even practical-- suggestion to make: Roll your own
buffer!

mkfifo /tmp/mybuffer #assuming tmpfs is in ram
dd if=/tmp/mybuffer of=/mount/raid_volume/data.dat & #start in
background; you could play around with block sizes using the bs= option of
dd
rx_samples_to_file --file /tmp/mybuffer [all the other options]

By the way: Thanks for bringing this up! We know that recording samples
is a core concern of many users.

Greetings,
Marcus

[1] https://www.kernel.org/doc/Documentation/sysctl/vm.txt

On Fri, Oct 3, 2014 at 10:55 AM, Marcus Müller usrp-users@lists.ettus.com usrp-users@lists.ettus.com
wrote:

I have to agree with Marcus on this. Also, keep in mind that storage is
really what an operating system should take care of in any "general
purpose" scenario, ie. that as long as I just write to a file, I'd expect
that the thing in charge of storage (my kernel / the filesystems / block
device drivers) does the best it can to keep up. If I find myself in a
situation where my specific storage needs dictate a huge write buffer,
changing the application might be one way, but as I'm responsible for my
won storage subsystem, I could just as well increase the cache buffer
sizes, and let the operating system handle storage operation. If your RAID
is really performing as well as it is benchmarked to, then this should not
be one of your problems. All rx_samples_to_file does is really sequentially
writing out data at a constant rate, which is the most basic write
benchmark I can think of.

If your storage subsystem (filesystem + storage abstraction + raid driver

  • interface driver + hard drive interface + hard drives + hardware caches)
    can't keep up, it's failing to perform as specified, simple as that. In
    this case, saying that the application needs to be smarter when dealing
    with storage seems like a bit of a cop-out to me ;)

I'd like to point out that most benchmarks use heavily averaged numbers
for write speeds etc. UHD on the other hand kind of demands soft real-time
performance of a write subsystem, which is a lot harder to fulfill. This
comes up rather frequently, but I have to stress it: you need a fast
guaranteed write rate, not only an average one, and as soon as your
operating system has to postpone writing data[1], it has to have enough
performance to catch up whilst still meeting continued demand. This is
general purpose hardware running general purpose OS with dozens of
processes, and you can't just say "every single component is up to my task,
thus my system suffices", because everything potentially blocks everything!

Greetings,
Marcus

[1] e.g. because the filesystems needs to calculate checksums, update
tables, another process gets scheduled, a device blocks your PCIe bus, your
platters randomly need a bit longer to seek, you reach the physical end of
an LVM volume and have to move across a disk, an interrupt does what an
interrupt does, some process is getting noticed on a changing file
descriptor, DBUS is happening in the kernel, token ring has run out of
tokens, thermal throttling, bitflips on SATA leading to retransmission,
some page getting fetched from swap...

On 03.10.2014 15:34, Marcus D. Leech via USRP-users wrote:

One has to keep firmly in mind that programs like rx_samples_to_file are
examples that show how to use

the underlying UHD API. They are not necessarily optimized for all
situations, and indeed, one could

restructure rx_samples_to_file to decouple UHD I/O from filesystem I/O,
using a large buffer between them.

The fact is that dynamic performance of high-speed, real-time, flows is
something that almost-invariably needs

tweaking for any particular situation. There's no way for an example
application to meet all those requirements.

But the fact also remains that for some systems, rx_samples_to_file
(and uhd_rx_cfile on the Gnu Radio side)

are able to stream high-speed data just fine as-is.

On 2014-10-03 09:26, Peter Witkowski via USRP-users wrote:

To say that the issue is just because the disk subsystem can't keep up is a bit of cop-out.

I had issues writing to disk when the incoming stream was 400MB/s and my RAID0 system was benchmarked at being much higher than that.

The issue that I've been seeing stems from the fact that it appears that you cannot concurrently read/write from the data stream as its coming in. In effect you have a main loop that reads from the device and then immediately tries to write that buffer to file. If you do not complete these operations in a timely fashion overflows occur.

One way to solve (or at least band aid the issue) is to set your dirty_background_ratio to 0. I was able to get writing to disk working somewhat with this setting as it is more predictable to directly write to disk instead of having your write cache fill up and then having a large amount of data to push to disk. That said, my RAID0 array is capable of such speeds and even then I was getting a few (but much reduced) overflows.

The one surefire way I know of getting this working (even on a slow disk system) is to buffer the data. The buffer can then be consumed by the disk writing process while being concurrently added onto by the device reader. The easiest way to test buffering (that I've found) is to simply set up a GNURadio Companion program with a stream-to-vector block between the USRP and file sink blocks. This is exactly what I am doing currently since even with a very powerful system, I could not get data saved to disk quickly enough given the aforementioned issues with the provided UHD software.

On Thu, Oct 2, 2014 at 11:48 PM, gsmandvoip via USRP-users usrp-users@lists.ettus.com usrp-users@lists.ettus.com usrp-users@lists.ettus.com usrp-users@lists.ettus.com wrote:

Thanks Marcus for your replies. Yes O gone away.

On Thu, Oct 2, 2014 at 5:50 PM, Marcus D. Leech mleech@ripnet.com mleech@ripnet.com mleech@ripnet.com mleech@ripnet.com wrote:

with rx_samples_to_file without _4rx.rbf, Initially I tried on my i3, 4GB ram, it gave me
some OOOO but was lesser than earlier, but I do not understand, my most of the ram capacity and processor was sitting idle while it shows OOOO, why is this strange behaviour The default format for uhd_rx_cfile is complex-float, thus doubling the amount of data written compared to rx_samples_to_file.

You can't just use CPU usage as an indicator of loading--if you're writing to disk, the disk subsystem may be much slower than you think, so the
"rate limiting step" is writes to the disk, not computational elements.

Try using /dev/null as the file that you write to. If the 'O' go away, even at higher sampling rates, then it's your disk subsystem.

using uhd_rx_cfile getting similar result, but strangely, why it is low, at 4M sampling rate it was higher???

On Thu, Oct 2, 2014 at 9:27 AM, Marcus D. Leech mleech@ripnet.com mleech@ripnet.com mleech@ripnet.com mleech@ripnet.com wrote:

On 10/01/2014 11:46 PM, gsmandvoip wrote:

Yes I am running single channel, but when trying to achieve my desired sampling rate without _4rx.rbf, it says, requested sampling rate is not valid, adjusting to some 3.9M or so. sorry for misleading info I gave earlier, I have i3, with 32 bit and i7 with 64 bit, but getting same result on both machines

Here is my command to capture signal:

./rx_samples_to_file --args="fpga=usrp1_fpga_4rx.rbf, subdev=DBSRX" --freq "$FC" --rate="$SR" $FILE --nsamps "$NSAMPLES"

and here is its output:

Creating the usrp device with: fpga=usrp1_fpga_4rx.rbf, subdev=DBSRX...
-- Loading firmware image: /usr/share/uhd/images/usrp1_fw.ihx... done
-- Opening a USRP1 device...
-- Loading FPGA image: /usr/share/uhd/images/usrp1_
fpga_4rx.rbf... done
-- Using FPGA clock rate of 52.000000MHz...
ERROR: LOOKUPERROR: INDEXERROR: MULTI_USRP::GET_TX_SUBDEV_SPEC(0) FAILED TO MAKE DEFAULT SPEC - VALUEERROR: THE SUBDEVICE SPECIFICATION "A:0" IS TOO LONG.
The user specified 1 channels, but there are only 0 tx dsps on mboard 0.

Don't use the _4rx image if you don't need it.

The USRP1 only does strict-integer resampling, and with a master clock (NON STANDARD FOR USRP1) of 52.000MHz, 4Msps is not a sample rate
that it can produce. Try 5.2Msps or 4.3333Msps.

At 5.2Msps, it's recording at roughly 20.8Mbytes/second, so your system needs to be able to sustain that for at least as long as the capture lasts.


USRP-users mailing listUSRP-users@lists.ettus.comhttp://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com


USRP-users mailing listUSRP-users@lists.ettus.comhttp://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com


USRP-users mailing listUSRP-users@lists.ettus.comhttp://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com


USRP-users mailing list
USRP-users@lists.ettus.com
http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com

--
Peter Witkowski
pwitkowski@gmail.com

Hi Peter, nice discussion indeed :)! Phew, it really seems you got the most out of your system; if you want a quick list of things in my head you'd want to check: - NIC interrupt coalescing - PCIe switching load: possibly we're just maxing out the PCIe controller; sadly, I must admit I have no idea how likely that is, and how to check. So, to my understanding, linux kernels since some time now basically do multi-threading in the sense that if you have a multicore system, IO handling, especially flushing buffers to disk, should occur concurrently and transparently to your UHD application, and that's the point I was after: Why do home-brew IO buffering with multiple threads, if your OS just does the same, but better? The strange thing here is, and that indicates that even user land multithreading might not be able to fix this, is that the GNU Radio example doesn't affect the number of overruns -- it should, because modern GNU Radio (with the default thread-per-block scheduler) is inherently multithreaded. Stream-to-vector doesn't actually do something, so it *shouldn't* help, but just "redefines" the stream item size. Maybe the magic here lies in the fact that it's an additional block, adding another layer of buffers. So, I'll give Robert's mail a few thoughts and head over to bed, Greetings, Marcus On 06.10.2014 21:48, Peter Witkowski via USRP-users wrote: > Forgot to add: > > I have all the recommended kernel tweaks for 10GigE running and found no > difference in number of overflows vs. the network buffer size once you pass > the point where UHD no longer throws a warning for your network buffer size > being too small (if I recall correctly I think this is at 32MB or so?). I > can provide a copy of my sysctl.conf file if necessary. > > I really hope that there's a kernel setting that I'm not using properly, > otherwise I just don't see how the single-threaded approach can work for > high data rates (in both the provided UHD programs and in user-built > GNURadio programs). As I have mentioned, it is possible to get GNURadio to > do some multi-threading and concurrent producer/consumer behavior, but this > requires a re-purposing of the stream-to-vector block as it currently > stands (or "rolling your own"). > > On Mon, Oct 6, 2014 at 3:33 PM, Peter Witkowski <pwitkowski@gmail.com> > wrote: > >> Marcus, >> >> Great discussion so far. >> >> I think my point is that earlier I saw that it was alluded to that if >> rx_samples_to_file fails, then your disk subsystem is to blame and you >> should invest in more hardware. My point was that the application in >> general can benefit from multi-threading and internal buffering, otherwise >> overflows are a fact of life for high data rates. While the OS can do more >> buffering, you are required to do periodic (at a rather high rate) reads of >> the USRP, otherwise its buffers will overflow. This is why it is critical >> to minimize the time between two successive reads via threading and having >> a buffer that can concurrently be added onto (from the device) and >> processed (by writing to disk). >> >> I think we can agree that there's a delta between what the I/O can sustain >> (and what is displayed on a synthetic benchmark) versus what is actually >> happening on the machine. There's potentially a good deal of CPU cycles >> that occur between two reads (a vast percentage of these are during disk >> I/O, especially if the device is "busy") that seem to be causing the >> overflow. Stated differently, I can have a bunch of blocking reads occur >> and sustain the needed rate. Similarly (and this is what I see via my >> benchmarks), I can have a series of blocking writes occur and sustain the >> needed data rate. However, the combination thereof (as well as the fact >> that the kernel likes to preempt things) might be unsustainable in a single >> threaded application. That is, by the time I get to the next read, enough >> time has passed to overwhelm the USRP read buffers. >> >> My point with the dirty background ratio is that when set to zero, I have >> a predictable amount of time that I block for each write. My understanding >> is that once triggered, background flushes will attempt to write at 100% >> device speed. During this time, additional device access (such as those >> being called by the application) will be blocked (sequential writes are >> therefore performance degraded). When set to 0, I see in iotop that my >> writes are a consistent value which is right about where resource monitor >> shows the incoming 10GigE bandwidth to be. Setting the >> dirty_background_ratio any higher and the "Total Disk Write" and "Actual >> Disk Write" differ wildly. "Actual Disk Write" will spike at times, and >> these spikes correlate with additional overflow errors. When the device is >> more stable (i.e. "Actual Disk Write" is consistent) I see a huge reduction >> in overflows. Stated differently, I'm OK with a flush, as long as its a >> quick one. In the cases of larger flushes (as is the case with a >> dirty_background_ratio greater than 0), I can almost guarantee an overflow >> (I can't get around the mainloop fast enough if the device is busy with a >> long write). Note that for my buffered application, I do run a default >> background ratio. >> >> The only thing that I got to work was buffering, and the easiest way I >> know to accomplish this is to place a stream-to-vector block (in GNURadio) >> between the USRP source and the file sink. The vectorization effectively >> works as a buffer, and was the only way I was able to get 100 MS/s working. >> >> Other things I tried: >> 1. Using CPU affinity and shielding. >> 2. Increasing processes priority to 99 and SCHED_FIFO. >> >> Also, I can confirm that rx_samples_to_file and uhd_rx_cfile (as well as >> GNU Radio application that does effectively the same thing, but controls >> CPU affinity) all behave the exact same way in terms of the number of >> overflows I have encountered. >> >> If there are any other kernel tweaks that I can try, please let me know. >> However, in practice, my CPU is not fast enough to get through the mainloop >> of rx_samples_to_file quickly enough. As discussed above, buffering was >> the only way I was able to consistently ensure that my data was saved and I >> didn't encounter an overflow. >> >> If you would like, I can provide whatever benchmarks of the system that >> you would like. Perhaps I am not using the correct tool when I quote my >> minimum write speeds to disk. >> >> On Sat, Oct 4, 2014 at 9:14 AM, Marcus Müller <usrp-users@lists.ettus.com> >> wrote: >> >>> Hi Peter, >>> >>> didn't mean to confuse you! Actually, my job is doing the opposite (ie. >>> providing useful information), and thus let me just shortly follow up on >>> this: >>> On 03.10.2014 17:44, Peter Witkowski via USRP-users wrote: >>> >>> So I'm confused. >>> >>> You state that if I can't use rx_samples_to_file, my system is failing to >>> perform as specified to write data out, then you give an example of several >>> things that can happen to create a stochastic write speed (which I totally >>> understand and agree with). Given that writes can be stochastic, why is >>> there not a software buffer implemented in the UHD sample code to account >>> for such issues? >>> >>> Well, because that's, in my opinion, an operating system's job. Being a >>> code example, rx_samples_to_file just *musn't* contain the complexity >>> introduced when you try to implement buffering functionality smarter than >>> what your OS can do. And, I do think it's nearly impossible to be smarter >>> than the linux kernel when optimizing writes -- *but* you'll have to tell >>> your kernel what you want, as a user. The kernel, as it is configured by >>> any modern distribution by default, won't do enormous write buffers, >>> because that's not what the user usually wants, increasing the risk of data >>> loss in case of system failure, and because you usually don't want to spend >>> all of your RAM on filesystem buffers. In your 64GB RAM case, though, >>> default buffer sizes should suffice, I guess, so I'm a bit out of clues >>> here. >>> It is definitely not very hard to increase these buffers' sizes[1], so I >>> encourage you to try it and see if that solves your problem. Now, I must >>> admit that up to here I was always assuming you hadn't already played >>> around with these values, if this is not the case, please accept my >>> apologies! >>> >>> I understand that it's meant to be an example, but I've >>> also seen it referenced as being used effectively as a debugger or test for >>> people having issues (i.e. recommendation to use the UHD programs in place >>> of GNURadio to resolve issues). >>> >>> ...and it's done many users and thus Ettus a great job of supplying >>> basic functionality! The fact that it works in almost any situation with >>> this very minimalistic approach (repeated recv->write) proves that UHD is >>> in fact a rather nice driver interface, IMHO. The fact that GNU Radio >>> sometimes solves issues that rx_samples_to_file can't indicates exactly the >>> buffering approach to be helpful. But in that case, buffering is not >>> increased by increasing kernel buffer sizes, but by introducing GNU Radio >>> buffers between blocks. The USRP source (Martin, scold me if I say >>> something stupid) is not really much smarter than rx_samples_to_file: It >>> recv()s a packet of samples, and returns these samples from the work >>> function, and then GNU Radio takes care of shuffling and buffering that >>> data. Basically, GNU Radio behaves much like an operating system from the >>> source block's point of view. >>> >>> Also, in terms of benchmarking, I'm quoting minimum values, not averages. >>> I agree with you that average values are pointless, and in reality the disk >>> subsystem needs to perform when called up. My minimum values for a 4 disk >>> RAID0 with a dedicated controller are well within the data rate that I am >>> pushing. >>> >>> Well, I'll kind of disagree with you: If your minimum write rate of your >>> system was bigger than the rate rx_samples_to_file causes, then you >>> wouldn't see the problem. The point, I believe, here is that the storage >>> system does not only consist of the hardware side of your RAID, but also on >>> your complete operating environment. Something slows down how fast data is >>> written to the RAID. >>> I think we both would expect the following to happen: >>> >>> repeatedly: >>> >>> rx_samples_to_file: >>> uhd::rx_streamer::recv >>> (blocks until a packet of samples has arrived. Instantly returns if >>> it has before the call) >>> write(file_handle, recv_buff) >>> (instantly returns, because writing should hit a buffer that the >>> operating system transparently pushes out to a disk. If buffer is full, >>> then block until enough space in buffer -- unless your filesystem is >>> mounted with some sync option...) >>> >>> Now, if your RAID is definitely fast enough, the write buffer should >>> never get full. My hypothesis here is that either, your buffer size is just >>> to small, and a block of samples doesn't fit and has to be written out >>> instantly (which is unlikely), or something else occupies your system. That >>> might be just the fact that 400MB/s (are we talking about an X3x0?) >>> inevitably places a heavy load on things like PCIe busses and CPUs, and >>> that introduces a bottleneck in your storage chain which isn't there if you >>> "just" benchmark without the USRP. Also, the rather smallish sizes of >>> network packets dictate that journalling file systems introduce a very bad >>> overhead -- I don't know if you benchmarked with files on a journaling file >>> system and a (network packet size - header) block size... >>> >>> Is there an example system that can handle sustained data capture from the >>> USRP at (or near the limits) of 10GigE or the PCIe interfaces (maybe the >>> requirement is enterprise class PCIe SSDs)? I'm running a two socket Xenon >>> system (two hex core processors) with 64GB of RAM. How much more hardware >>> should I throw at the problem to be able to sample/write at 100MS (half of >>> what is quoted on the website for bandwidth for the 10GigE kit) using the >>> provided code? >>> >>> Definitely a nice system! I must admit that I don't have access to a >>> comparable setup, and thus I can't really offer you any first-hand >>> experience. Maybe others can. >>> >>> I think the issue here is that the code itself can't simply get through >>> it's main loop fast enough. There's a difference between data bandwidth >>> and CPU throughput. The sequential nature of the code means that if any >>> weird stuff happens (your example was a good set of kernel related hilarity >>> that can lead to stochastic timing) you will have overflows since you >>> cannot read fast enough. This is why a 90% solution for my application was >>> to just set the dirty_background_ratio to 0 and also why redirection to >>> /dev/null makes overflows go away. >>> >>> This is interesting, as dirty_background_ratio is the percentage at >>> which the kernel should start writing out dirty pages in the background. >>> Now, I'm the one who's confused, because I would have expected this to >>> negatively impact performance. On the other hand, 0 (at least in my head) >>> does not make very much sense, maybe it's semantically identical to 100%? >>> Are you swapping (64GB would tell me you shouldn't have swap or extremly >>> low swappiness)? >>> On the other hand, it might really be that storage is not the bottleneck >>> here, and in fact maybe the CPU gets saturated. Now, you said that writing >>> to /dev/null solves your problem. Do your RAID or filesystem consume a lot >>> of CPU cycles? This is an interesting mystery... >>> >>> With either method I didn't have to >>> wait for a large write cache to flush before moving on to the next read >>> from the USRP. Note that there can also be things that happen on the read >>> side as well. Does this mean that I can only run the code on an RTOS? >>> >>> No :) UHD has it's own incoming buffer handlers, but as you already >>> said, in this high performance scenario, you might be totally right, and >>> our single-threaded approach just doesn't cut it. Maybe dropping in some >>> asynchronous storage IO would help -- but I hate seeing that blowing up in >>> example users' faces, so I guess the fact that it doesn't work with a >>> system as potent as yours with the sample rates as high as you demand might >>> actually be a shortcoming of the examples that isn't going to be fixed. >>> >>> As a final note, my understanding is that GNURadio and the USRP were >>> developed for domain experts in DSP to use. >>> >>> These are SDR frameworks and devices, respectively. The idea is to offer >>> people with the opportunity to build awesome DSP systems using universally >>> usable SDR blocks (GNU Radio) and universal software radio peripherals, so >>> well, they certainly address DSP people, but they shouldn't be hard to use. >>> >>> These users may or may not >>> have prior experience in software. As a result, I'd recommend perhaps >>> adding a buffered example or have the USRP GNURadio block allow for >>> buffering. >>> >>> That is something we might consider. On the other hand, when someone >>> goes as far as you do, maybe having an example that does the buffering in a >>> separate thread (or even process) isn't worth that much -- in the end, one >>> will want to write one's own high performance application, and that will >>> include handling such data rates. >>> >>> Otherwise, I just don't see how you can advertise 200 MS/s >>> (maybe even a simple "buffer" block in GNURadio would do the trick?). >>> >>> Well, the devices support these rates, and our driver is able to >>> withstand these rates and sustain them without hitting CPU barriers due to >>> having too much overhead. That's awesome (ok, I might be biased, but *I* >>> think it's awesome). I don't feel ashamed because on your specific setup, >>> we can't find a way to make any of our generic examples deliver the full >>> rate of rx streams to storage -- we sell RF hardware, and not storage >>> infrastructure, and the point of the examples is demonstrating the usage of >>> UHD, and not holding a lecture on high performance storage handling. I >>> wish, though, that we could solve your problem. >>> >>> Now, GNU Radio/gr-uhd does in fact come with an application called >>> uhd_rx_cfile, which is more or less a clone of rx_samples_to_file using >>> gr-uhd and GNU Radio instead of raw UHD. Does that work out for you? >>> >>> I >>> understand that this is theoretical limit of the bus, but if there doesn't >>> exist a driver or other software to make use of this, the practical limit >>> becomes much, much smaller. >>> >>> Well, UHD seems to be able to sustain these rates, if you write to >>> /dev/null, right? So the practical limit for UHD is definitely not being >>> hit. >>> I have another --maybe even practical-- suggestion to make: Roll your own >>> buffer! >>> >>> mkfifo /tmp/mybuffer #assuming tmpfs is in ram >>> dd if=/tmp/mybuffer of=/mount/raid_volume/data.dat & #start in >>> background; you could play around with block sizes using the bs= option of >>> dd >>> rx_samples_to_file --file /tmp/mybuffer [all the other options] >>> >>> By the way: Thanks for bringing this up! We know that recording samples >>> is a core concern of many users. >>> >>> Greetings, >>> Marcus >>> >>> [1] https://www.kernel.org/doc/Documentation/sysctl/vm.txt >>> >>> On Fri, Oct 3, 2014 at 10:55 AM, Marcus Müller <usrp-users@lists.ettus.com> <usrp-users@lists.ettus.com> >>> wrote: >>> >>> >>> I have to agree with Marcus on this. Also, keep in mind that storage is >>> really what an operating system should take care of in any "general >>> purpose" scenario, ie. that as long as I just write to a file, I'd expect >>> that the thing in charge of storage (my kernel / the filesystems / block >>> device drivers) does the best it can to keep up. If I find myself in a >>> situation where my specific storage needs dictate a huge write buffer, >>> changing the application might be one way, but as I'm responsible for my >>> won storage subsystem, I could just as well increase the cache buffer >>> sizes, and let the operating system handle storage operation. If your RAID >>> is really performing as well as it is benchmarked to, then this should not >>> be one of your problems. All rx_samples_to_file does is really sequentially >>> writing out data at a constant rate, which is the most basic write >>> benchmark I can think of. >>> >>> If your storage subsystem (filesystem + storage abstraction + raid driver >>> + interface driver + hard drive interface + hard drives + hardware caches) >>> can't keep up, it's failing to perform as specified, simple as that. In >>> this case, saying that the application needs to be smarter when dealing >>> with storage seems like a bit of a cop-out to me ;) >>> >>> I'd like to point out that most benchmarks use heavily averaged numbers >>> for write speeds etc. UHD on the other hand kind of demands soft real-time >>> performance of a write subsystem, which is a lot harder to fulfill. This >>> comes up rather frequently, but I have to stress it: you need a fast >>> guaranteed write rate, not only an average one, and as soon as your >>> operating system has to postpone writing data[1], it has to have enough >>> performance to catch up whilst still meeting continued demand. This is >>> general purpose hardware running general purpose OS with dozens of >>> processes, and you can't just say "every single component is up to my task, >>> thus my system suffices", because everything potentially blocks everything! >>> >>> Greetings, >>> Marcus >>> >>> [1] e.g. because the filesystems needs to calculate checksums, update >>> tables, another process gets scheduled, a device blocks your PCIe bus, your >>> platters randomly need a bit longer to seek, you reach the physical end of >>> an LVM volume and have to move across a disk, an interrupt does what an >>> interrupt does, some process is getting noticed on a changing file >>> descriptor, DBUS is happening in the kernel, token ring has run out of >>> tokens, thermal throttling, bitflips on SATA leading to retransmission, >>> some page getting fetched from swap... >>> >>> >>> On 03.10.2014 15:34, Marcus D. Leech via USRP-users wrote: >>> >>> >>> >>> One has to keep firmly in mind that programs like rx_samples_to_file are >>> *examples* that show how to use >>> >>> the underlying UHD API. They are not necessarily optimized for all >>> situations, and indeed, one could >>> >>> restructure rx_samples_to_file to decouple UHD I/O from filesystem I/O, >>> using a large buffer between them. >>> >>> The fact is that dynamic performance of high-speed, real-time, flows is >>> something that almost-invariably needs >>> >>> tweaking for any particular situation. There's no way for an example >>> application to meet all those requirements. >>> >>> But the fact also remains that for *some* systems, rx_samples_to_file >>> (and uhd_rx_cfile on the Gnu Radio side) >>> >>> are able to stream high-speed data just fine as-is. >>> >>> On 2014-10-03 09:26, Peter Witkowski via USRP-users wrote: >>> >>> >>> To say that the issue is just because the disk subsystem can't keep up is a bit of cop-out. >>> >>> I had issues writing to disk when the incoming stream was 400MB/s and my RAID0 system was benchmarked at being much higher than that. >>> >>> The issue that I've been seeing stems from the fact that it appears that you cannot concurrently read/write from the data stream as its coming in. In effect you have a main loop that reads from the device and then immediately tries to write that buffer to file. If you do not complete these operations in a timely fashion overflows occur. >>> >>> One way to solve (or at least band aid the issue) is to set your dirty_background_ratio to 0. I was able to get writing to disk working somewhat with this setting as it is more predictable to directly write to disk instead of having your write cache fill up and then having a large amount of data to push to disk. That said, my RAID0 array is capable of such speeds and even then I was getting a few (but much reduced) overflows. >>> >>> The one surefire way I know of getting this working (even on a slow disk system) is to buffer the data. The buffer can then be consumed by the disk writing process while being concurrently added onto by the device reader. The easiest way to test buffering (that I've found) is to simply set up a GNURadio Companion program with a stream-to-vector block between the USRP and file sink blocks. This is exactly what I am doing currently since even with a very powerful system, I could not get data saved to disk quickly enough given the aforementioned issues with the provided UHD software. >>> >>> On Thu, Oct 2, 2014 at 11:48 PM, gsmandvoip via USRP-users <usrp-users@lists.ettus.com> <usrp-users@lists.ettus.com> <usrp-users@lists.ettus.com> <usrp-users@lists.ettus.com> wrote: >>> >>> Thanks Marcus for your replies. Yes O gone away. >>> >>> On Thu, Oct 2, 2014 at 5:50 PM, Marcus D. Leech <mleech@ripnet.com> <mleech@ripnet.com> <mleech@ripnet.com> <mleech@ripnet.com> wrote: >>> >>> with rx_samples_to_file without _4rx.rbf, Initially I tried on my i3, 4GB ram, it gave me >>> some OOOO but was lesser than earlier, but I do not understand, my most of the ram capacity and processor was sitting idle while it shows OOOO, why is this strange behaviour The default format for uhd_rx_cfile is complex-float, thus doubling the amount of data written compared to rx_samples_to_file. >>> >>> You can't just use CPU usage as an indicator of loading--if you're writing to disk, the disk subsystem may be much slower than you think, so the >>> "rate limiting step" is writes to the disk, not computational elements. >>> >>> Try using /dev/null as the file that you write to. If the 'O' go away, even at higher sampling rates, then it's your disk subsystem. >>> >>> using uhd_rx_cfile getting similar result, but strangely, why it is low, at 4M sampling rate it was higher??? >>> >>> On Thu, Oct 2, 2014 at 9:27 AM, Marcus D. Leech <mleech@ripnet.com> <mleech@ripnet.com> <mleech@ripnet.com> <mleech@ripnet.com> wrote: >>> >>> On 10/01/2014 11:46 PM, gsmandvoip wrote: >>> >>> Yes I am running single channel, but when trying to achieve my desired sampling rate without _4rx.rbf, it says, requested sampling rate is not valid, adjusting to some 3.9M or so. sorry for misleading info I gave earlier, I have i3, with 32 bit and i7 with 64 bit, but getting same result on both machines >>> >>> Here is my command to capture signal: >>> >>> ./rx_samples_to_file --args="fpga=usrp1_fpga_4rx.rbf, subdev=DBSRX" --freq "$FC" --rate="$SR" $FILE --nsamps "$NSAMPLES" >>> >>> and here is its output: >>> >>> Creating the usrp device with: fpga=usrp1_fpga_4rx.rbf, subdev=DBSRX... >>> -- Loading firmware image: /usr/share/uhd/images/usrp1_fw.ihx... done >>> -- Opening a USRP1 device... >>> -- Loading FPGA image: /usr/share/uhd/images/usrp1_ >>> fpga_4rx.rbf... done >>> -- Using FPGA clock rate of 52.000000MHz... >>> ERROR: LOOKUPERROR: INDEXERROR: MULTI_USRP::GET_TX_SUBDEV_SPEC(0) FAILED TO MAKE DEFAULT SPEC - VALUEERROR: THE SUBDEVICE SPECIFICATION "A:0" IS TOO LONG. >>> The user specified 1 channels, but there are only 0 tx dsps on mboard 0. >>> >>> Don't use the _4rx image if you don't need it. >>> >>> The USRP1 only does strict-integer resampling, and with a master clock (NON STANDARD FOR USRP1) of 52.000MHz, 4Msps is not a sample rate >>> that it can produce. Try 5.2Msps or 4.3333Msps. >>> >>> At 5.2Msps, it's recording at roughly 20.8Mbytes/second, so your system needs to be able to sustain that for at least as long as the capture lasts. >>> >>> >>> >>> _______________________________________________ >>> USRP-users mailing listUSRP-users@lists.ettus.comhttp://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com >>> >>> >>> >>> _______________________________________________ >>> USRP-users mailing listUSRP-users@lists.ettus.comhttp://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com >>> >>> >>> >>> _______________________________________________ >>> USRP-users mailing listUSRP-users@lists.ettus.comhttp://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com >>> >>> >>> >>> _______________________________________________ >>> USRP-users mailing list >>> USRP-users@lists.ettus.com >>> http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com >>> >>> >> >> -- >> Peter Witkowski >> pwitkowski@gmail.com >> > > > > > _______________________________________________ > USRP-users mailing list > USRP-users@lists.ettus.com > http://lists.ettus.com/mailman/listinfo/usrp-users_lists.ettus.com