2
votes

This is the bash executible command of mine:

while read line
do
./ngram -order 1 -lm path1/$line -ppl path2/$line -debug 4 > path3/$line
done < input_list_of_files

So, I have two folders one in path1 and the other in path2. Path 1 and Path 2 have same file names but with different extensions. For example, Path1 has many files with extension ".txt" (file1.txt) and path2 has many files with extension ".title" (file1.title).

That is, path1 has folder folder1 which has files file1.txt, file2.txt, file3.txt and so on.. Similarly, path 2 has folder folder2 which has files like, file1.title, file2.title, file3.title and so on..

The list_of_files has the data:

file1.txt
file2.txt
file3.txt

and so on...

I want to input file1.txt after the "-lm" option and input file1.title after the "-ppl" option. This works fine when I operate it for one single file at a time.

That is, when file1.txt is entered after "-lm", then at the same time, we should have file1.title after "-ppl" .

I want to do a batch computation for all the files in the folder simultaneously by inputting same file names but different extensions at the same time. How do I do it? Please help!

The example I have used:

./ngram -order 1 -lm Path1/Army_recruitment.txt -ppl Path2/Army_recruitment.title -debug 4 > Path3/Army_recruitment.txt

Output file looks like:

 military troop deployment number need
p( military | <s> )     = [1gram] 0.00426373 [ -2.37021 ]
p( troop | military ...)    = [1gram] 0.00476793 [ -2.32167 ]
p( deployment | troop ...)  = [1gram] 0.00045413 [ -3.34282 ]
p( number | deployment ...)     = [1gram] 0.0015224 [ -2.81747 ]
p( need | number ...)   = [1gram] 0.000778574 [ -3.1087 ]
p( </s> | need ...)     = [OOV] 0 [ -inf ]
1 sentences, 5 words, 0 OOVs
1 zeroprobs, logprob= -13.9609 ppl= 619.689 ppl1= 3091.84 
5 words, rank1= 0 rank5= 0 rank10= 0
6 words+sents, rank1wSent= 0 rank5wSent= 0 rank10wSent= 0 qloss=    0.998037 absloss= 0.998036

file Army_recruitment_title.txt: 1 sentences, 5 words, 0 OOVs
1 zeroprobs, logprob= -13.9609 ppl= 619.689 ppl1= 3091.84
5 words, rank1= 0 rank5= 0 rank10= 0
6 words+sents, rank1wSent= 0 rank5wSent= 0 rank10wSent= 0 qloss=   0.998037 absloss= 0.998036 

This output is generated as per the executable ./ngram. This is from a package.

2
Could you clarify the question by providing sample contents of input_list_of_files and the directory listing of the path1 and path2 directories?200_success
I have edited it. Please have a look.Ana_Sam
sorry please check now... edited it clearly nowAna_Sam
This question still has no expected output.merlin2011
pluse-uno for improving your question. Good luck to all.shellter

2 Answers

4
votes
# As suggested by @CharlesDuffy: use read -r to ensure that text is taken literally
while read -r line ; do
    name="${line%.txt}"     # Strip off .txt extension
    ./ngram -order 1 -lm "path1/$name.txt" -ppl "path2/$name.title" -debug 4 > "path3/$name"
done < input_list_of_files
2
votes

You can use the command basename to strip path suffixes in addition to the directory name. So:

while read line
do
path2file=$(basename $line .txt).title
./ngram -order 1 -lm path1/$line -ppl path2/$path2file -debug 4 > path3/$line
done < input_list_of_files

(That assumes you still want .txt at the end of the output file)