2
votes

I have a Hbase table which has a unique rowkey and one column family and one column.. I have a TSV file which has around 300+ columns. The rowkey in this file is a combined value of two columns. So now I need to compare the rowkey from the table and file, if rowkey matches then I need to insert the table column value as the last column in the TSV file for that respective rowkey. I have written the following code, but it always executes the else part.

package mapReduce;

import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.FileReader;
import java.io.FileWriter;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;

public class Tsv_read{

    private static Configuration conf = null;

    static {
        conf = HBaseConfiguration.create();
    }

    @SuppressWarnings("resource")
    public static void main(String[] arg) throws Exception {

        BufferedReader TSVFile = 
                new BufferedReader(new FileReader("Path/to/file/.tsv"));

        String dataRow = TSVFile.readLine();
        List<String> list = new ArrayList<String>();


        while (dataRow != null){
            list.clear();
            String[] dataArray = dataRow.split("\t");




            for (String item:dataArray) { 

                HTable table = new HTable(conf, "Table name"); //Hbase table name
                Scan s = new Scan();
                ResultScanner ss = table.getScanner(s);
                for(Result r:ss){
                    for(KeyValue kv : r.raw()){
                        //System.out.println("Rowkey :" +dataArray[12]+"-"+dataArray[13]);
                        //System.out.print(new String(kv.getRow()) + " ");
                        if((dataArray[12]+"-"+dataArray[13]).equals(new String(kv.getRow()))){  //Comparing the rowkeys from file and table  (doesn't work)
                            System.out.println("File Rowkey :"+dataArray[12]+"-"+dataArray[13]);
                            System.out.println("Table Row key"+new String(kv.getRow()));
                            //dataArray[392]=new String(kv.getValue());
                            FileWriter fstream = new FileWriter("/path/to/the/file/*.tsv",true);
                            BufferedWriter fbw = new BufferedWriter(fstream);
                            fbw.write(new String(kv.getValue())); //inserting the value to the tsv file
                            fbw.newLine();
                            fbw.close();
                            System.out.println("Column value written succesfully");
                        }
                        else //always executes this part
                        {
                            System.out.println("RowKey not found :" +new String(kv.getRow()));
                        }
                        /*System.out.print(new String(kv.getFamily()) + ":");
                       System.out.print(new String(kv.getQualifier()) + " ");
                       System.out.print(kv.getTimestamp() + " ");*/
                        //System.out.println(new String(kv.getValue()));


                list.add(item);
                    }
                }
            } 
            Iterator<String> it = list.iterator();
            while (it.hasNext()) {
                String txt = it.next();
                System.out.print(txt+" ");
            } 
            System.out.println(); // Print the data line.
            dataRow = TSVFile.readLine(); 
        }

        TSVFile.close();

        System.out.println();

    } //main()
} 

Sample Record :

dataArray[12]+"-"+dataArray[13] = 3049620139673452544-5172983457411783096

In the Hbase table, the rowkey also has the values in the same format.

I can't share the whole record as it has 300+ columns.

TSV File Size: around 10GB

Hbase table : around 10254950 rows.

Any help is appreciated. Thanks in advance.

1
Would you mind sharing sample record?Rajesh
Share output for : System.out.println("Rowkey1 :" + dataArray[12] + "-" + dataArray[13]); System.out.print("Rowkey2 :" + new String(kv.getRow()) + " ");Rajesh
did you debug it? If yes, what are the findings from debug? It is hard to debug your if condition without seeing the actual data. What is dataArray[12]+"-"+dataArray[13] ? and what does new String(kv.getRow()) return? for a sample case?aviad
have updated my question. I can't share the whole record as it has 300+ columns.Shash

1 Answers

1
votes

instead of writing it as this

if((dataArray[12]+"-"+dataArray[13]).equals(new String(kv.getRow()))){ //Comparing the rowkeys from file and table (doesn't work)

Try this

if((dataArray[12]+"-"+dataArray[13]).equals(Bytes.toString(kv.getRow()))){

You have not get the row value correctly.

Try this updated code, it uses the Get instead of scan from the hbase and it takes less time to run

    while (dataRow != null) {
        list.clear();
        String[] dataArray = dataRow.split("\t");

        for (String item : dataArray) {

            String key = dataArray[12] + "-" + dataArray[13];
            HTable table = new HTable(conf, "Table name"); // Hbase table
                                                            // name
            Get get = new Get(Bytes.toBytes(key));
            Result r = table.get(get);
            if (r != null && r.size() > 0) {
                for (KeyValue kv : r.raw()) {
                    System.out.println("File Rowkey :" + key);
                    System.out.println("Table Row key"
                            + Bytes.toString(kv.getRow()));
                    FileWriter fstream = new FileWriter(
                            "/path/to/the/file/*.tsv", true);
                    BufferedWriter fbw = new BufferedWriter(fstream);
                    fbw.write(new String(kv.getValue())); // inserting the
                                                            // value to the
                                                            // tsv file
                    fbw.newLine();
                    fbw.close();
                    System.out.println("Column value written succesfully");
                }
            } else {
                System.out.println("RowKey not found :" + key);
            }
            list.add(item);
        }
    }