2
votes

Is Spark using Map Reduce internally ? (his own map reduce)

The first time I hear somebody tell me, "Spark use map-reduce", I was so confused, I always learned that spark was the great adversary against Hadoop-Map Reduce.

After check in Google I just found a web-site that make some too short explanation about that : https://dzone.com/articles/how-does-spark-use-mapreduce

But the rest of Internet is about Spark vs Map Reduce.

Than somebody explain me that when spark make a RDD the data is split in different datasets and if you are using for example SPAR.SQL a query that should not be a map reduce like:

select student 
from Table_students 
where name = "Enrique"

Internally Spark is doing a map reduce to retrieve the Data( from the different datasets).

It´s that true ?

If I'm using Spark Mlib, to use machine learning, I always heard that machine learning is not compatible with map reduce because it need so many interactions and map reduce use batch processing..

In Spark Mlib, is Spark Internally using Map reduce too ?

1
I think you are confused between Hadoop's MapReduce and MapRecude Algorithm. Spark Mlib still use map reduce . So the question may change to [Why spark is faster or better for machine learning] (stackoverflow.com/questions/32572529/…)howie
Hi @howie , well ... "Why spark is better for machine learnirng" is not my question, the question that I really have is: "its true, that spark use map reduce internally" like you say "Spark Mlib still use map reduce", well, how ? Where can I find more Info about that. Thanks for answerEnrique Benito Casado
Let me clarify your question, Do you know the different between Spark's MapReduce and Hadoop's MapReduce? If not, this video have good answer youtube.com/watch?v=iaw5kG9q6xw . Or your question is how Spark Mlib use map reduce for machine learning program?howie
Sorry ~ I would like to make a correction of my answer. Actually spark use DAG(Directed Acyclic Graph) not tradicational mapreduce. You can think of it as an alternative to Map Reduce. While MR has just two steps (map and reduce), DAG can have multiple levels that can form a tree structure. So you can write mapreduce like program in spark, but internal spark run on DAGhowie

1 Answers

5
votes

Spark features an advanced Directed Acyclic Graph (DAG) engine supporting cyclic data flow. Each Spark job creates a DAG of task stages to be performed on the cluster. Compared to MapReduce, which creates a DAG with two predefined stages - Map and Reduce, DAGs created by Spark can contain any number of stages. DAG is a strict generalization of MapReduce model. This allows some jobs to complete faster than they would in MapReduce, with simple jobs completing after just one stage, and more complex tasks completing in a single run of many stages, rather than having to be split into multiple jobs.

So, Spark can write map-reduce program, but actually use DAG inside.

Reference: