RISC-V allows mixing 16-bit, 32-bit, 48-bit, 64-bit instructions, and beyond!
RV32I defines a 32-bit computer architecture, where registers are 32-bits wide. Its instructions are all 32-bits wide. For example, it has lw
to load a 32-bit word into a register, and, add
to add two registers and target a third.
RV64I defines a 64-bit computer architecture, where registers are 64-bits wide (hence RV64) — its instructions are also 32-bits wide. The RV32 instructions still work, and there are some additional instructions to accommodate both 32-bit and 64-bit operations. For example, lw
still loads a 32-bit word (though now sign extends to fill the 64-bit register), and so a new instruction is used ld
to load a 64-bit word. add
still adds two registers and targets a third, but this same add
is now doing 64-bit addition instead of 32-bit addition, since the registers are 64-bits in RV64. A new instruction addw
does 32-bit addition, in case that was all you wanted.
RVC is an extension that can be added to either RV32I or RV64I. When present it allows for 16-bit instructions, and, its design is such that a 16-bit instruction expands 1:1 into a 32-bit wide instruction — because of this there are no changes to the register architecture (of either the RV32 or RV64 that RVC was added to), and in some sense, there's nothing new they can do that isn't already in the 32-bit wide instruction set. We should think of it more a space saving technique rather than some new capabilities.
The base architecture (that is, without RVC) allows for branches to 16 bit boundaries. The PC and return addresses and all branching instructions support any even byte value, so when RVC is added to something, the other instructions don't change. This artifact also supports 48-bit and 64-bit instructions, though there are no extensions defined for those sizes as yet.
However, the instruction set reserves enough opcode space to make it possible to differentiate between 16-bit, 32-bit, 48-bit, and 64-bit instructions. Instructions that start with binary 11 (in the lowest bit position of the instruction) are 32-bit sized instructions (but one pattern is reserved: so they cannot start with 11111). The compact instructions use 00, 01, and 10 in that same position. 48-bit instructions use starting sequence 011111, and 64-bit instructions start with 0111111.
The base architecture also uses pc-relative branching for everything, so you can build an executable image with a code section as large as 4GB (and when loaded, it could be located anywhere in the 64-bit address space).
It seems both RV32I & RV64I use 32 bits instruction size and the difference relates to the size of sign extension.
RV32 vs. RV64, the registers expand from 32-bits to 64-bits, so, yes, when sign extension happens on RV64, it goes out to 64-bits.
I think large instruction size allows you to have large immediate number encoded inside the instruction, which should be better than smaller instruction size since it is very easy to run out of space.
The RISC V instruction set was designed after years of research with MIPS (an earlier RISC design). By comparison with x86, which has a variable length instruction size, MIPS did not leave enough opcode space for 40+ years of evolution. A fixed sized instruction set is a trade off between code space and capabilities — the larger the instruction size, the more can be encoded, at the expense of code density. Code density has a huge effect on performance, so cannot be ignored. So, RISC V allows for variable sized instructions, and if you like, you can create 256-bit instructions in your implementation!
For risc-v, RV64I, if it only use 32 bits instruction length, with 64 bits register file and memory address, how it could sufficiently use the hardware resource. (ex. jump direct to a large memory address.)
The code for a executable program image can be up to 4GB in size and still use pc-relative branching — it would use what we refer to as far branches, where the branch sequence is composed of two instructions (auipc
and jal
). To be clear, 4GB is a very large code segment. Most of the value of a 64-bit architecture is being able to work with over 4GB of data, not over 4GB of code. To reach code sizes over 4GB, you would use pointers (e.g. stored in tables), since pointers can be full 64-bits wide. This technique is already used for DLLs (even though they generally won't come close to exceeding 4GB of code when the size of each is added together) since they are usually loaded independently (and thus while pc-relative branches will work inside a single code section, it won't work to go in between code sections).
And in general, should the nameing of RV64I indicate the length of instruction is 64 bits?
Since, what ever architecture we have (e.g. 16-bit, 32-bit, 64-bit) we tend to run out of space for data before we run out of space for code, the dominant feature of a 64-bit architecture is its support for a 64-bit address space, allowing large amounts of memory for data. This support of a large address bus also comes with the ability use 64-bit addresses and of course also to manipulate 64-bit values. So, what's important about RV64 is the 64-bit registers and the ability to use 64-bit values to address memory. (The instruction size is an orthogonal issue.)