1
votes

I have installed Hadoop - 2.6.0 in my machine and started all the service.

When I compare with my old version,this version does not start the job tracker and task tracker jobs instead it starts the nodemanager and resourcemanager.

QUestion:-

  1. I believe this version of Hadoop uses YARN for running the jobs. Can't I run a map reduce job anymore?
  2. Should I write a job thats tailored to fit the YARN resource manager and application manager.
  3. Is there a sample Python job that I can submit?
1

1 Answers

1
votes
  1. I believe this version of Hadoop uses YARN for running the jobs. Can't I run a map reduce job anymore?

It's still fine to run MapReduce jobs. YARN is a rearchitecture of the cluster computing internals of a Hadoop cluster, but that rearchitecture maintained public API compatibility with classic Hadoop 1.x MapReduce. The Apache Hadoop documentation on Apache Hadoop NextGen MapReduce (YARN) discusses the rearchitecture in more detail. There is a relevant quote at the end of the document:

MRV2 maintains API compatibility with previous stable release (hadoop-1.x). This means that all Map-Reduce jobs should still run unchanged on top of MRv2 with just a recompile.


  1. Should I write a job thats tailored to fit the YARN resource manager and application manager.

If you're already accustomed to writing MapReduce jobs or higher-level abstractions like Pig scripts and Hive queries, then you don't need to change anything you're doing as the end user. API compatibility as per above means that all of those things continue to work fine. You are welcome to write custom distributed applications that specifically target the YARN framework, but this is more advanced usage that isn't required if you just want to stick to Hadoop 1.x-style data processing jobs. The Apache Hadoop documentation contains a page on Writing YARN Applications if you're interested in exploring this.


  1. Is there a sample Python job that I can submit?

I recommend taking a look at the Apache Hadoop documentation on Hadoop Streaming. Hadoop Streaming allows you write MapReduce jobs based simply on reading stdin and writing to stdout. This is a very general pardigm, so it means you can code in pretty much anything you want, including Python.

In general, it sounds like you would benefit from exploring the Apache Hadoop documentation site. There is a lot of helpful information there.