TL:DR: you probably put fpuctl: .word 0
in the read-only .text
section along with your code. Store to some scratch space on the stack, or to the BSS if you really want to use static storage.
You're right, the only form of fnstcw
is memory-destination.
The more commonly-used fnstsw %ax
(the x87 status word) has an AX-destination form, but fnstcw
doesn't.
(Of course, x87 is obsolete except when you actually need 80-bit precision; modern x86 has at least SSE2 so you can and should do scalar FP math in XMM registers. The SSE FPU's control and status bits are all in the MXCSR, separate from the x87 control and status.)
Also note that if you're calling this from C and going to modify the x87 control word, you need to tell the compiler about it so it doesn't assume that the rounding mode is still round-to-nearest, or that the precision is still 80-bit (64-bit mantissa). This can matter for compile-time constant-propagation, and other optimizations. For example, gcc has -frounding-math
and -fsignaling-nans
. It may also support C99 #pragma STDC FENV ACCESS ON
, but I'm not sure gcc or clang are fully standards compliant for that. See also https://gcc.gnu.org/wiki/FloatingPointMath. (You should use standard C functions from fenv.h
to modify the FP environment, like fegetenv
/ fesetenv
.)
If you're going to use this from hand-written asm, make it a macro that takes a 2nd arg as a scratch memory location, instead of a function. This is too tiny to make sense as a non-inline function; it's 2 useful instructions (fnstcw
and a reload); the rest is overhead.
BTW, AX is a 16-bit register. AL is the low 8 bits of EAX.
Also, movzwl fpuctl, %eax
would do a zero-extending word load so you wouldn't need to xor-zero eax first.
You haven't provided a MCVE, but probably you put fpuctl: .word 0
in the read-only .text
section along with your code, so you get the same error as if you'd used mov %eax, getFPUControlState
.
Instead, put it in the BSS (zero-init static storage, where the zeros aren't stored in your executable, just a total size).
.bss
fpuctl: .zero 2 # 2 bytes of zeros in the BSS
.text # switch back to the .text section
.globl getFPUControlState # GAS does accept .global as a synonym for .globl
.type getFPUControlState, function
getFPUControlState:
# leave out the EBP stack-frame stuff, you're not using it for anything
fnstcw fpuctl
movzwl fpuctl, %eax # zero-extending word load into EAX
ret
.size getFPUControlState, . - getFPUControlState
As an alternative, use .lcomm
to reserve 2 bytes in the BSS and label it with fpuctl as a non-exported label:
.lcomm fpuctl, 2 # puts stuff in the BSS regardless of current section
But really, you don't need static storage at all for this
So you can save yourself a potential page fault by just using the stack.
.globl getFPUControlState
.type getFPUControlState, function
getFPUControlState:
sub $4, %esp # reserve space on the stack
fnstcw (%esp)
movzwl (%esp), %eax
add $4, %esp # restore ESP to point at the return address
ret
Or, if you'd rather optimize for code-size at the cost of a store-forwarding stall from a dword load after a word store:
push $0
fnstcw (%esp)
pop %eax # upper 2 bytes are zero from the PUSHed data
ret