1
votes

I'm making a function which returns the FPU Control Word register (16bit).
According to documentation, I have to use fstcw and a memory place.

My place in memory is:

fpuctl: .word 0

And my function is:

.global getFPUControlState
    .type getFPUControlState, function
    getFPUControlState:
    pushl %ebp
    movl %esp, %ebp 
    xorl %eax, %eax #clear eax (ax too)

    fnstcw fpuctl #store in fpuCTL
    mov fpuctl, %ax #put it in 8bit %ax

    leave
    ret

The console says:

Memory protection violation.

How to use fnstcw properly?

1

1 Answers

1
votes

TL:DR: you probably put fpuctl: .word 0 in the read-only .text section along with your code. Store to some scratch space on the stack, or to the BSS if you really want to use static storage.


You're right, the only form of fnstcw is memory-destination. The more commonly-used fnstsw %ax (the x87 status word) has an AX-destination form, but fnstcw doesn't.

(Of course, x87 is obsolete except when you actually need 80-bit precision; modern x86 has at least SSE2 so you can and should do scalar FP math in XMM registers. The SSE FPU's control and status bits are all in the MXCSR, separate from the x87 control and status.)

Also note that if you're calling this from C and going to modify the x87 control word, you need to tell the compiler about it so it doesn't assume that the rounding mode is still round-to-nearest, or that the precision is still 80-bit (64-bit mantissa). This can matter for compile-time constant-propagation, and other optimizations. For example, gcc has -frounding-math and -fsignaling-nans. It may also support C99 #pragma STDC FENV ACCESS ON, but I'm not sure gcc or clang are fully standards compliant for that. See also https://gcc.gnu.org/wiki/FloatingPointMath. (You should use standard C functions from fenv.h to modify the FP environment, like fegetenv / fesetenv.)

If you're going to use this from hand-written asm, make it a macro that takes a 2nd arg as a scratch memory location, instead of a function. This is too tiny to make sense as a non-inline function; it's 2 useful instructions (fnstcw and a reload); the rest is overhead.


BTW, AX is a 16-bit register. AL is the low 8 bits of EAX.

Also, movzwl fpuctl, %eax would do a zero-extending word load so you wouldn't need to xor-zero eax first.


You haven't provided a MCVE, but probably you put fpuctl: .word 0 in the read-only .text section along with your code, so you get the same error as if you'd used mov %eax, getFPUControlState.

Instead, put it in the BSS (zero-init static storage, where the zeros aren't stored in your executable, just a total size).

.bss
fpuctl: .zero 2         # 2 bytes of zeros in the BSS

.text                   # switch back to the .text section
.globl getFPUControlState         # GAS does accept .global as a synonym for .globl
.type getFPUControlState, function
getFPUControlState:
       # leave out the EBP stack-frame stuff, you're not using it for anything
    fnstcw   fpuctl
    movzwl   fpuctl, %eax       # zero-extending word load into EAX

    ret
.size getFPUControlState, . - getFPUControlState

As an alternative, use .lcomm to reserve 2 bytes in the BSS and label it with fpuctl as a non-exported label:

.lcomm  fpuctl, 2     # puts stuff in the BSS regardless of current section

But really, you don't need static storage at all for this

So you can save yourself a potential page fault by just using the stack.

.globl getFPUControlState
.type getFPUControlState, function
getFPUControlState:

    sub      $4, %esp          # reserve space on the stack
    fnstcw   (%esp)
    movzwl   (%esp), %eax
    add      $4, %esp          # restore ESP to point at the return address

    ret

Or, if you'd rather optimize for code-size at the cost of a store-forwarding stall from a dword load after a word store:

    push     $0
    fnstcw   (%esp)
    pop      %eax         # upper 2 bytes are zero from the PUSHed data
    ret