0
votes

I am currently trying to implement a matrix multiplication in ARM assembler. I already read some tutorials about matrix calculations in assembler and they all use the NEON instructions on a 3x3 or 4x4 float matrix. But that's totally different to what i want to do. I have two symmetrical int matrices, so row and column number of both matrices are the same and my assembler function gets the size of the matrices as parameter, so i cant write a matMul function for just 3x3 or 4x4 matrices like in the tutorials.

So my question is what is the best and easiest way to do that? Is it even possible to use the NEON instructions with a 100x100 or even bigger matrix? And also the single- and double-precision registers are unnecessary, because i only have int numbers.

Another problem is, that I'm almost totally new to ARM assembler and so i don't understand the NEON instructions completely.

1
You should take a look at some of the tutorials linked on this blog post. cv4mar.blogspot.com/2011/06/arm-neon-basic-tutorials.html They appear to help some with NEON instructions a bit. I would take some time to understand NEON a bit more completely and it should help your find your answer.Timothy Randall
You'd also multiply large matrices by recursively working on smaller (e.g. 4x4) blocks. I don't know a good link to an implementation explanation off-hand, but studying some of the general algorithms would be a start.Notlikethat
OK, that sounds already too complicated for the beginning, so i will start with the naive implementation and use 3 loops. Is there a way to allocate memory in assembler? or should i allocate it in the c code and give the assembler function 3 parameter, 2 pointer for the matrices to multiply and 1 pointer for the result matrix?user3262883
You can use NEON, but SIMD are generally optimized for 3x3 or 4x4 matrices. Ie, the 'multiple data' part handles 3 or 4 values at a time (nice for graphics, etc). To use SIMD with larger matrixes, you need to sub-divide it into smaller 4x4 operations/sub-matrixes.artless noise
Thanks for your comments, but i can't really imagine how that should work with dividing the matrix into smaller ones. Does anyone have a description or some infos about that method?user3262883

1 Answers

0
votes

MAIN PROC NEAR

  MOV DI,OFFSET M_RESULT
  MOV SI,OFFSET M_A
  MOV BX,OFFSET M_B

BACK1: MOV CH,N SUB CH,LINE MOV COLUMN,N

BACK2:

  MOV AL,CH  
  MOV X,N
  MUL X
  ADD SI,AX
  MOV AH,0
  MOV AL,P
  ADD BX,AX
  MOV COUNTER,N 
  MOV TEMP,0

BACK3:
  MOV AL,[SI]
  MUL [BX] 
  ADD TEMP,AX
  INC SI
  ADD BX,N
DEC COUNTER
JNZ BACK3


  MOV AX,TEMP
  MOV [DI],AX
  ADD DI,2    
  MOV SI,OFFSET M_A 
  MOV BX,OFFSET M_B 
  INC P

DEC COLUMN JNZ BACK2

  MOV P,0

DEC LINE JNZ BACK1

;
MOV AX,4C00H   ;8- End of 
INT 21H        ;9- processing

MAIN ENDP ; End of procedure CODESG ENDS ; End of segment END START ; End of program ;By : Mojtaba Alizadeh
INT 21H ;9- processing MAIN ENDP ; End of procedure CODESG ENDS ; End of segment END START