I've got a Cortex-M4 based MCU linked to a FPGA via a 16-bit parallel memory bus interface. In essence the FPGA behaves like an external memory mapped to the memory space of the MCU: the MCU presents an address followed by either a data word (write) or reading the word presented by the FPGA (read).
I want to protect both read and write against transmission errors both during addressing and data write/read. However, I don't expect many bit errors since the distance between both parts is short.
I can easily implement checking and generating of either parity, hamming codes or CRC inside the FPGA. However, doing the same (checking and generating) in the uC seems comparatively harder since I don't want to cripple the throughput. Without error detection, reading and writing of 16-bit words takes around 4-6 processor cycles and is thus rather fast. Consequently I don't want to spend hundred of cycles on protective measures.
In the end I am looking for a moderately efficient error detection method for 16-bit data that is implemented in a uC in as few cycles as possible.