1
votes

I want to print the programming languages in file1 that appears in file2, its corresponding line number in file2 and the complete line of file2.

file1 is like this:

Ruby
Visual Basic
Objective-C
C
R
C++
Basic

file2 is like this:

5. ab cde fg Java hij kl
2. ab PHP dddf llf 
4. cde fg z o Objective-C oode
8. a12b cde JavaScript kdk
6. ab99r cde Visual Basic llso dkd
1. lkd dsk Ruby kksdk
3. Python dsdls
9. CSS dkdsk
4. Jdjdj C Jjd Kkd
12. Iiii Jjd R Hhd
5. Jjjff C++ jdjejd
7. Jfjfjdoo Uueye Basic Jje Tasdk

I´d like to get this output:

 6|Ruby|1. lkd dsk Ruby kksdk
 5|Visual Basic|6. ab99r cde Visual Basic llsodkd            
 3|Objective-C|4. cde fg z o Objective-C oode
 9|C|4. Jdjdj C Jjd Kkd  
 10|R|12. Iiii Jjd R Hhd 
 11|C++|5. Jjjff C++ jdjejd
 12|Basic|7. Jfjfjdoo Uueye Basic Jje Tasdk 

where 6,5 and 3 are the line number where "Ruby", "Visual Basic" and "Objective-C" appears within file2.

I've tried so far with the code below, but this code works only if file 2 has a list of exact matches when comparing with file1.

awk 'NR == FNR{a[$0];next} ($0 in a)' file1 file2

In this case the programming languages in file2 have some text before and after and I'm stuck in how to get the output i want.

Thanks in advance for any help.

2
As test cases you should include C and C++ in file1 to verify that the programming languages are being treated as full strings, not partial strings or regexps, and you should include lines in file2 that contain multiple programming languages to make sure the script doesn't just find one. Also add just Basic to file1 and show what the expected output should be if file1 contains both Basic and Visual BasicEd Morton
@EdMorton Thanks so much for your suggestion to add C, C++ and Basic. I also included R in both files and Ravinder's code it prints correctly for all except for Basic, since it printing the lines which appears Visual Basic and Basic itself. For Basic should print only lines where Basic appears not Visual BasicGer Cas
You're welcome. You should add the cases mentioned to the example in your question, though, not just to the files on your desktop.Ed Morton
@EdMorton I've edited file1, file2 and output. Thanks for your suggestionsGer Cas

2 Answers

3
votes

Could you please try following(changed index use in code as per @Ed Morton sir's suggestions).

awk -v OFS='|' '
FNR==NR{
  a[$0]
  next
}
{
  for(i in a){
     if(index(" "$0" "," "i" ")){
         print FNR,i,$0
     }
  }
}
'  Input_file1  Input_file2 | sort -t'|' -nr

Output will be as follows.

6|Ruby|1. lkd dsk Ruby kksdk
5|Visual Basic|6. ab99r cde Visual Basic llso dkd
3|Objective-C|4. cde fg z o Objective-C oode

Explanation: Adding explanation for above code now.

awk -v OFS='|"' '                           ##Starting awk program here.
FNR==NR{                                   ##Checking condition FNR==NR which will be TRUE when first Input_file is being read.
  a[$0]                                 ##creating an array named a whose index is $0 and value is $0.
}
{                                          ##Starting block here.
  for(i in a){                             ##Starting a for loop here.
     if(index(" "$0" "," "i" ")){                   ##checking if value of a[i] array present in current line.
         print FNR,i,$0             ##If above is TRUE then print FNR"|"i"|"$0 as per OP need.
     }
  }
}
'  file1  file2 | sort -t'|' -nr           ##Mentioning Input_files names here and passing its output into sort command and sorting it with reverse order.
1
votes

With GNU awk for sorted_in to search for the longest languages (e.g. Visual Basic) first and remove those from the current line as they're found so the shorter languages that are part of them (e.g. Basic) can't be found within them:

$ cat tst.awk
BEGIN { OFS="|" }
NR==FNR {
    lengths[$0] = length($0)
    next
}
{
    line = " " $0 " "
    PROCINFO["sorted_in"] = "@val_num_desc"
    for (lang in lengths) {
        if ( s = index(line," "lang" ") ) {
            print FNR, lang, $0
            line = substr(line,1,s) substr(line,s+1+lengths[lang])
        }
    }
}

$ awk -f tst.awk file1 file2
3|Objective-C|4. cde fg z o Objective-C oode
5|Visual Basic|6. ab99r cde Visual Basic llso dkd
6|Ruby|1. lkd dsk Ruby kksdk

$ cat file1
Ruby
Visual Basic
Objective-C
C
C++
Basic