1
votes

I am trying to find out what are the right answers for these 2 questions, both dealing with loading data to your Hadoop that I found on the web as part of my studies the materials and case studies for Hadoop development.

The first question is:

You have user profile records in your OLPT database, that you want to join with web logs you have already ingested into the Hadoop file system. How will you obtain these user records?

A. HDFS command

B. Pig LOAD command

C. Sqoop import

D. Hive LOAD DATA command

E. Ingest with Flume agents

F. Ingest with Hadoop Streaming

I found different resources indicating of different answers as the correct answers (B / C)

The second question is:

You want to understand more about how users browse your public website, such as which pages they visit prior to placing an order. You have a farm of 200 web servers hosting your website. How will you gather this data for your analysis?

A. Ingest the server web logs into HDFS using Flume.

B. Write a MapReduce job, with the web servers for mappers, and the Hadoop cluster nodes for reduces.

C. Import all users’ clicks from your OLTP databases into Hadoop, using Sqoop.

D. Channel these clickstreams inot Hadoop using Hadoop Streaming.

E. Sample the weblogs from the web servers, copying them into Hadoop using curl.

Here the suggested answers were either A or B, depending on the source...

EDIT:

This is an explanation to the answer B that I found that made me think twice about the answer:

Hadoop MapReduce for Parsing Weblogs Here are the steps for parsing a log file using Hadoop MapReduce:

Load log files into the HDFS location using this Hadoop command: hadoop fs -put

The Opencsv2.3.jar framework is used for parsing log records.

Below is the Mapper program for parsing the log file from the HDFS location.

public static class ParseMapper     
   extends Mapper<Object, Text, NullWritable,Text > {

   private Text word = new Text();

   public void map(Object key, Text value, Context context)
       throws IOException, InterruptedException {

      CSVParser parse = new CSVParser(' ','\"');  
      String sp[]=parse.parseLine(value.toString()); 
      int spSize=sp.length;    
      StringBuffer rec= new StringBuffer();

      for(int i=0;i<spSize;i++){    
         rec.append(sp[i]);    

         if(i!=(spSize-1))    
          rec.append(",");    
      }    
   word.set(rec.toString());    
   context.write(NullWritable.get(), word);   
   }    
} 

<\EDIT>

Will be thrilled to get absolute answers for those question, Gday, D.

2
I too have addressed some fake answers given by some of the sources.Dont depend on the answers they provide.unmeshasreeveni.blogspot.in/2014/09/…USB

2 Answers

2
votes

Question 1:

Answer: C

Explanation: You need to join User profile records and weblogs. Weblogs is already ingested into HDFS.So inorder to join weblogs with user profile,we need to bring user profile also into HDFS.User profile is residing in OLPT database so to import that to HDFS we need the help of SQOOP.


Question 2:

Answer: A

Explanation: You need to gather the information of web servers hosting your website. And you have a farm of 200 web servers hosting your website.so to bring them to HDFS you need the help of FLUME.

1
votes

1 is C. Sqoop is the way to go with ingest from a database into hdfs. It is built and optimized for this purpose. 2 is A. Similarly, collecting high volume weblogs in real time is a common Flume use case.