All answers are writing single bytes only - what if you want to fill a byte array with words? Or floats? I find use for that now and then. So after having written similar code to 'memset' in a non-generic way a few times and arriving at this page to find good code for single bytes, I went about writing the method below.
I think PInvoke and C++/CLI each have their drawbacks. And why not have the runtime 'PInvoke' for you into mscorxxx? Array.Copy and Buffer.BlockCopy are native code certainly. BlockCopy isn't even 'safe' - you can copy a long halfway over another, or over a DateTime as long as they're in arrays.
At least I wouldn't go file new C++ project for things like this - it's a waste of time almost certainly.
So here's basically an extended version of the solutions presented by Lucero and TowerOfBricks that can be used to memset longs, ints, etc as well as single bytes.
public static class MemsetExtensions
{
static void MemsetPrivate(this byte[] buffer, byte[] value, int offset, int length) {
var shift = 0;
for (; shift < 32; shift++)
if (value.Length == 1 << shift)
break;
if (shift == 32 || value.Length != 1 << shift)
throw new ArgumentException(
"The source array must have a length that is a power of two and be shorter than 4GB.", "value");
int remainder;
int count = Math.DivRem(length, value.Length, out remainder);
var si = 0;
var di = offset;
int cx;
if (count < 1)
cx = remainder;
else
cx = value.Length;
Buffer.BlockCopy(value, si, buffer, di, cx);
if (cx == remainder)
return;
var cachetrash = Math.Max(12, shift); // 1 << 12 == 4096
si = di;
di += cx;
var dx = offset + length;
// doubling up to 1 << cachetrash bytes i.e. 2^12 or value.Length whichever is larger
for (var al = shift; al <= cachetrash && di + (cx = 1 << al) < dx; al++) {
Buffer.BlockCopy(buffer, si, buffer, di, cx);
di += cx;
}
// cx bytes as long as it fits
for (; di + cx <= dx; di += cx)
Buffer.BlockCopy(buffer, si, buffer, di, cx);
// tail part if less than cx bytes
if (di < dx)
Buffer.BlockCopy(buffer, si, buffer, di, dx - di);
}
}
Having this you can simply add short methods to take the value type you need to memset with and call the private method, e.g. just find replace ulong in this method:
public static void Memset(this byte[] buffer, ulong value, int offset, int count) {
var sourceArray = BitConverter.GetBytes(value);
MemsetPrivate(buffer, sourceArray, offset, sizeof(ulong) * count);
}
Or go silly and do it with any type of struct (although the MemsetPrivate above only works for structs that marshal to a size that is a power of two):
public static void Memset<T>(this byte[] buffer, T value, int offset, int count) where T : struct {
var size = Marshal.SizeOf<T>();
var ptr = Marshal.AllocHGlobal(size);
var sourceArray = new byte[size];
try {
Marshal.StructureToPtr<T>(value, ptr, false);
Marshal.Copy(ptr, sourceArray, 0, size);
} finally {
Marshal.FreeHGlobal(ptr);
}
MemsetPrivate(buffer, sourceArray, offset, count * size);
}
I changed the initblk mentioned before to take ulongs to compare performance with my code and that silently fails - the code runs but the resulting buffer contains the least significant byte of the ulong only.
Nevertheless I compared the performance writing as big a buffer with for, initblk and my memset method. The times are in ms total over 100 repetitions writing 8 byte ulongs whatever how many times fit the buffer length. The for version is manually loop-unrolled for the 8 bytes of a single ulong.
Buffer Len #repeat For millisec Initblk millisec Memset millisec
0x00000008 100 For 0,0032 Initblk 0,0107 Memset 0,0052
0x00000010 100 For 0,0037 Initblk 0,0102 Memset 0,0039
0x00000020 100 For 0,0032 Initblk 0,0106 Memset 0,0050
0x00000040 100 For 0,0053 Initblk 0,0121 Memset 0,0106
0x00000080 100 For 0,0097 Initblk 0,0121 Memset 0,0091
0x00000100 100 For 0,0179 Initblk 0,0122 Memset 0,0102
0x00000200 100 For 0,0384 Initblk 0,0123 Memset 0,0126
0x00000400 100 For 0,0789 Initblk 0,0130 Memset 0,0189
0x00000800 100 For 0,1357 Initblk 0,0153 Memset 0,0170
0x00001000 100 For 0,2811 Initblk 0,0167 Memset 0,0221
0x00002000 100 For 0,5519 Initblk 0,0278 Memset 0,0274
0x00004000 100 For 1,1100 Initblk 0,0329 Memset 0,0383
0x00008000 100 For 2,2332 Initblk 0,0827 Memset 0,0864
0x00010000 100 For 4,4407 Initblk 0,1551 Memset 0,1602
0x00020000 100 For 9,1331 Initblk 0,2768 Memset 0,3044
0x00040000 100 For 18,2497 Initblk 0,5500 Memset 0,5901
0x00080000 100 For 35,8650 Initblk 1,1236 Memset 1,5762
0x00100000 100 For 71,6806 Initblk 2,2836 Memset 3,2323
0x00200000 100 For 77,8086 Initblk 2,1991 Memset 3,0144
0x00400000 100 For 131,2923 Initblk 4,7837 Memset 6,8505
0x00800000 100 For 263,2917 Initblk 16,1354 Memset 33,3719
I excluded the first call every time, since both initblk and memset take a hit of I believe it was about .22ms for the first call. Slightly surprising my code is faster for filling short buffers than initblk, seeing it got half a page full of setup code.
If anybody feels like optimizing this, go ahead really. It's possible.