Linux FTDI USB-to-Serial overlflow errors FTDI_RS_OE on 1Mbaud

Question

I am trying to use a genuine FTDI USB-to-serial in Linux at 1Mbaud (instead of the usual 115200). This all works fine, but sometimes I seem to lose data. When checking the counters with the TIOCGICOUNT ioctl call, I can see that I get overrun errors (not buf_overrun).

When looking in the source code of the FTDI linux driver (https://github.com/torvalds/linux/blob/master/drivers/usb/serial/ftdi_sio.c), this is done when the chip sends a USB packet containing a FTDI_RS_OE error flag.

(For reference, when I use the exact same userspace application, but with a totally different serial device (the imx6 mxc), I do not get these errors. It's really an FTDI-driver specific thing)

I find very little regarding this, and strangely enough, the windows driver does not seems to suffer from this problem. If anybody has gotten these FTDI chips working at high speeds in linux, feel free to help me out!

Kind regards, Arnout

Does Rasbian count as Linux? Actually I never experienced problems with the described issue as far as I can tell. I used FT2232H in conjunction with pylibftdi and I2C mode for port A (so slow speed) and fast opto for port B with around 25 Mbaud/s. — Christian B.
Raspbian counts:) Interesting feedback. Perhaps interesting to know is that in my usecase, I have near constant link utilization/traffic over the port in RX (not sure what your usecase is). — Arnout

Arnout Arnout · Accepted Answer · 2019-09-21T06:41:45

I think I figured it out. Bottom line was that this was a genuine RX overrun due to my code not reading the uart buffers fast enough. When I make sure to "read" the serial port fast enough, I can attain the constant Mbaud data rate. The reason I got totally misguided (and that the weird and unexpected FTDI_RS_OE is sent) I will explain below.

A few notes on the protocol I was using. I was sending some "request" packet on the serial line, and expecting a large reply. (and doing this in loop). "My bug" was that I expected the remote device to reply very quickly, and if not stop processing. This timeout was too short, but the actual reply did still come in. "Some" buffer then overflowed, causing the RX overruns.

But, this was not so clear. A few subtleties:

The rx overrun counter was only incremented upon the NEXT uart read syscall (e.g. minutes later if my code went to some idle state) (NOT immediately after the actual issue happened, which is very confusing)
I was under the assumption that, just like the imx6 driver, the linux kernel would always simply service the USB device if incoming data was available. And that the data would be sent into a 640kB buffer (defined https://elixir.bootlin.com/linux/v4.9.192/source/drivers/tty/tty_buffer.c). In the imx6 driver it can then clearly be seen what happens if that buffer overflows

But that turns out not to be the case. Instead, my best guess at what happens here (I haven't profiled/debugged the kernel to verify this) is that serial "throtteling" happens. When this 640kB is "getting" full, linux will issue a "throttle" callback to the FTDI driver. That then simply uses the generic usb_serial_generic_throttle, which sets a flag, and discards incoming urb data in https://elixir.bootlin.com/linux/latest/ident/usb_serial_generic_read_bulk_callback. This would explain why no overruns are "detected" when the incident actually occurs, but suddenly is detected when (e.g. after 1 minute of inactivity) I restart a read operation. The FTDI chip's internal buffer must be overflowing due to this mechanism, casuing this FTDI_RS_OE flag to be set, which is then only actually correctly parsed when throtteling is disabled again.

So conclusion: The main issue was at my side, but the FTDI driver does not correctly implement its overrun counters (they only show up 'late' or even never depending on the usecase) due to most likely the linux throtteling feature.

Linux FTDI USB-to-Serial overlflow errors FTDI_RS_OE on 1Mbaud

1 Answers