3
votes

I'm researching distributed file system architectures and designs. Quite a few DFS(s) I've come across usually have the following architecture:

  • A namenode or metadata server used to manage the location of data blocks / chunks as well as the hierarchy of the filesystem.
  • A data node or data server used to store chunks or blocks of data belonging to one or more logical files
  • A client that talks to a namenode to find appropriate data nodes to read/write from/to.

Many of these systems have two primary variants, a block size and a replication factor.

My question is:

Are Replication Factor and Forward Error Correction like Reed Solomon Erasure Encoding compatible here? Does it makes sense to use both techniques to ensure high availability of data? Or is it enough to use one or hte other (what are the trade offs?)

1

1 Answers

0
votes

Whether you can mix and match plain old replication and erasure codes is dependent on what the distributed file system in question offers in its feature set but they are usually mutually exclusive.

Replication is simple in the sense that the file/object is replicated as a whole to 'n' (the replication factor) data nodes. Writes go to all nodes. Reads can be served from any one of the nodes individually since they host the whole file. So you can distribute different reads among multiple nodes. There is no intermediate math involved and is mostly I/O bound. Also, for a given file size, the disk usage is more (since there are 'n' copies).

Erasure codes are complex in the sense that parts of the file/object are encoded and spread among the 'n' data nodes during writes. Reads need to fetch data from more than one node, decode it and reconstruct the data. So math is involved and can become CPU bound. Compared to replication, the disk usage is less but so is the ability to tolerate faults.