1
votes

I am working on a matrix summation kind of design. The compiler takes 4+hours to generate 1+million lines of codes. Every line is "assign....." I don't know if this is the inefficiency of the compiler or my coding style is bad. If someone could suggest some alternatives that will be great!

Here is the description of the code The input will be AND with a random matrix element by element and summed up using .reduce, so the result matrix should be 140X6 vec, cat them together gives me a 840 bits output

(rndvec, which is supposed to be a 140x840x6 bits random matrix. since I don't know how to generate random value so I started with a fixed 140x6 to represent one row and feed it with input over and over again)

This following is my code

import Chisel._
import scala.collection.mutable.HashMap
import util.Random

class LBio(n: Int) extends Bundle {
   var myinput = UInt(INPUT,840)
   var myoutput = UInt (OUTPUT,840)

}


class Lbi(q: Int,n:Int,m :Int ) extends Module{
   def mask(orig: Vec[UInt],maska:UInt,mi:Int)={
   val result = Vec.fill(840){UInt(width =6)}
    for (i<-0 until 840 ){
         result(i) := orig(i)&Fill(6,maska(i))  //every bits of input AND with random vector 
      }


     result
   }

  val io= new LBio(840)

   val rndvec =  Vec.fill(840){UInt("h13",6)}       //random vector, for now its just replication of 0x13....
   val resultvec = Vec.fill(140){UInt(width = 6)}

  for (i<-0 until 140){

       resultvec(i) := mask(rndvec,io.myinput,m).reduce(_+_)  //add the entire row of 6 bits element together with reduce

  }

 io.myoutput := resultvec.toBits


}

The terminal report:

started inference
finished inference (4)
start width checking
finished width checking
started flattenning
finished flattening (941783)
resolving nodes to the components
finished resolving
started transforms
finished transforms
checking for combinational loops
NO COMBINATIONAL LOOP FOUND
COMPILING class TutorialExamples.Lbi 0 CHILDREN (0,0)
[success] Total time: 33453 s, completed Oct 16, 2013 10:32:10 PM
1

1 Answers

2
votes

There's nothing obviously wrong with your Chisel code, but I should point out that if rndvec is 140x840x6 bits, that's ~689kB of state! And your reduce operation is on 5kB of state.

Chisel uses "assign" statements because your code is entirely combinational and Chisel produces a very structural form of Verilog.

I suspect the part that is killing the compile time (aside from the huge amount of state) is that you are generating and manipulating 140 Vecs with the mask() function.

I tried my hand at rewriting your code and got it down from 941,783 nodes to 202,723 (takes about 10-15 minutes to compile, but generates 11MB of Verilog code). I'm pretty sure this does what your code was doing:

class Hello(q: Int, dim_n:Int) extends Module
{
    val io = new LBio(dim_n)

    val rndvec = Vec.fill(dim_n){UInt("h13",6)}
    val resultvec = Vec.fill(dim_n/6){UInt(width=6)}

    // lift this work outside of the for loop
    val padded_input = Vec.fill(dim_n){UInt(width=6)}
    for (i <- 0 until dim_n)
    {
       padded_input(i) := Fill(6,io.myinput)
    }  

    for (i <- 0 until dim_n/6)
    {
       val result = Bits(width=dim_n*6)
       result := rndvec.toBits & padded_input.toBits

       var sum = UInt(0) //advanced Chisel - be careful with the use of var!
       for (j <- 0 until dim_n by 6)
       {
          sum = sum + result(j+6,j)
       }  
       resultvec(i) := sum
    }  

    io.myoutput := resultvec.toBits
}  

What I did was avoid doing the same work over and over again - like padding out the myinput Vec inside of the for loop's mask() function. I also kept everything in Bits() instead of Vecs. Sadly it means I lose the awesome .reduce() function.

I think maybe the answer is "be cognizant of how much state you're creating" and "Vecs are awesome, but use carefully".

Do you have a Verilog version that's short and concise? It'd be interesting to see if there are areas where Chisel is losing out efficiency wise.