4
votes

I write quite a bit of code in 64-bit x86_64 assembly language, and I am about to begin another large function library to provide all conventional bitwise, shift, logical, arithmetic, math operators and functions for s0128, s0256, s0512, s1024 signed integer types and f0128, f0256, f0512, f1024 floating-point types.

I have AMD FX-8150 (bulldozer) CPUs in both my computers (ubuntu64 and win7-64). After reviewing the operations my code needs to perform, I find a great number of recent bit manipulation instructions will be extremely helpful.

However, when I read various documents, including the official AMD documents on their website, I find endless contradictions about whether certain instructions and instruction sets are supported by bulldozer CPUs (FX-8150) and/or piledriver (FX-8350). The confusion is especially common with regard to the various recent bit manipulation instructions and instruction sets, and the FMA3 and FMA4 instruction sets.

I know some of the AMD documents are wrong, because I've been programming with FMA3 and FMA4 instructions on my FX-8150 and they work just fine, while the AMD document comparing bulldozer and piledriver contradict this.

Given that ALL sources of documentation I can find appear to be wrong to some degree about this issue, does anyone out there know which instructions and/or instruction sets work on piledriver (FX-8350) but not bulldozer (FX-8150)?

Since my problem is the validity of documentation out there, please don't just point me at some document unless you know for sure it is correct. The best answers would come from programmers who have tested these instructions and instruction sets on their bulldozer [and piledriver] CPUs.

1
"because I've been programming with FMA3 and FMA4 instructions on my FX-8150 and they work just fine" - I doubt it. Are you sure you are using FMA3 on Bulldozer? Bulldozer does not have FMA3.Mysticial
About FMA3. Well, I remember programming with them, but when I realized FMA4 were available, I switched over. The FMA4 instructions are a lot more efficient for my purposes, because I didn't have to write over any operand. Plus, they're about 1000x easier to understand when programming. I'll go find a place where an FMA3 should work and see what happens.honestann
@Mystical: Oh, by the way, tell me what you see on page 2 of the following AMD document: developer.amd.com/wordpress/media/2012/10/…honestann
First discovery. Well, I guess my memory is indeed defective. When I tried to execute an FMA3 instruction, that generated a SIGILL (illegal instruction), while FMA4 instructions work fine (and indeed, I have dozens of FMA4 instructions in my code). Of course if you look at the AMD document I gave a link to above, it claims the bulldozer CAN execute FMA3 instructions (wrong), but CANNOT execute FMA4 instructions (wrong). Now onto bit-oriented instructions.honestann
Next discovery. The newer bit instructions don't work in my bulldozer (FX-8150). After laborious checks of the various CPUID bits, they seem mostly accurate (but a royal pain to understand). One strange one is a FMA bit (that contains false in my bulldozer FX-8150) while it does execute FMA4 but not FMA3 instructions. But I did find another FMA bit in the second set (with the 0x80000000 prefix) that's set to 1. Overall, CPUID does seem fairly reliable, while the documentation for bulldozer out there in the world is massively inconsistent and largely wrong.honestann

1 Answers

2
votes

As you have already figured, the official AMD release document (page 2) is indeed misleading. Specifically, the first line in this table is wrong: supported instructions (wrong)

Bulldozer supports FMA4, but not FMA3.

For completeness, the Piledriver instructions not present in Bulldozer are BMI, TBM, F16C (previously named CVT16) and FMA3 (2).

These should provide confirmation about FMA3 not being present in Bulldozer. But in addition, you can trust the GCC Manual. Architectures are named bdver1 and bdver2 for Bulldozer and Piledriver respectively.

Even more, you can trust the cpuid return value. For convenience I am reproducing screenshots here for Zambezi and Vishera (the desktop parts):

Zambezi and Vishera cores (screenshots from Aida64) Source: CPUID Dump List

Note that cpuid uses simply fma to designate both the FMA3 and FMA4 support. GCC follows the same semantics. From the Wikipedia link you can deduce that this is because the FMA4 variant was actually implemented before FMA3 (so the previously defined fma4 identifier couldn't simply be dropped or it would break existing applications).