All CPU architectures, which I have encountered, have symmetric registers - i.e. the value you read is the value you wrote.
Is there a case, for register-limited 16-bit instructions, to have asymmetric registers?
e.g.
- Registers 0-6 are "local" to the function invocation. The value written in this function call is the value which will be read. Each level of function call has its own register hardware, so local registers are implicitly saved.
- Registers 7-9 are "global", perhaps "thread local" on a SMP CPU.
- Values written to "call" registers 10-13 do not affect what is read from them in this function call context.
- Values read from "call" registers 10-13 are the values written in the calling function, i.e. a function's register arguments are immutable.
- Values written to "return" registers 14-15 do not affect whet is read from them in this function call context.
- Values read from "return" registers 14-15 are the values written in the function which most recently returned to the current function.
Each functions-level's registers have their own hardware, spilling to the stack only when call-depth exceeded the hardware.
(local) (global) ( call ) (ret)
global regset 07 .. 09
.
.
.
. | | ^ ^
. v v | |
regsetN-1 00 .. 06 10 .. 13 14 15
|^ |^ | | ^ ^
v| v| v v | |
fnN-1 RW RW RW RW RW RW
| | ^ ^
v v | |
regsetN 00 .. 06 10 .. 13 14 15
|^ |^ | | ^ ^
v| v| v v | |
fnN RW RW RW RW RW RW
| | ^ ^
v v | |
regsetN+1 00 .. 06 10 .. 13 14 15
|^ |^ | | ^ ^
v| v| v v | |
Would a scheme like this reduce the register pressure within each function call by two or more registers?
I'm not expecting that this a new idea, but I am interested in whether it has been done, and if not, why not? If it isn't a mad idea, or already done, I may implement this on an FPGA CPU.
Is it just too complex to be worth the register savings?
Are llvm-difficulties a major reason that this is not done?
P.S. I am aware that super-scalar processors are already much more complex than this, with register-renaming schemes, etc. I'm just musing about microcontroller-class architectures.
Update: It looks like the SPARC architecture did this. Why was it not thought useful by later ISA designers?
When a procedure is called, the register window shifts by sixteen registers, hiding the old input registers and old local registers and making the old output registers the new input registers.