0
votes

My requirement is to read a CSV file with around million lines by grepping particular string and displaying the output line.

Example CSV File:

Rob,school,oxford,tennis
James,school,cambridge,cricket
lucy,college,harvard,football
lily,hotel,novotel,golf
..
..
..
1 million lines.

Requirement:

When i invoke the AHK script , it prompts a user input through InputBox and takes input as James school and it should output as cambridge. Similarly input as lily hotel and it should output as novotel.

I am able to achieve the desired output through below script , but the problem is when i search for a string which is for example at 1 millionth line it takes around 5-10 mins to give me the output.

Script i wrote:

#SingleInstance, force
#Include C:\Users\mpechett\Desktop\ahk\tf.ahk

InputBox, Name, Search for Name


StringSplit, word_array, Name, %A_Space%, .  ; Omits periods.

pattern = %word_array1%,%word_array2%


Outputline = % TF_Find("C:\Users\mpechett\example.csv", "","", pattern, 1, 1)


MsgBox,%Outputline%

Please help me in improving the performance of my script.

2
I would sort the contents in alphabetical order and then split this big file into several smaller files (a.csv, b.csv etc). - user3419297
That is one way of doing it. Instead of splitting files, isnt any way to improve the speed? - Manoj Kumar

2 Answers

0
votes

This is pseudocode for a indexed database type solution:

make_index() {
  global file := FileOpen( "database.csv", "r" )
  for each line in database {
    position  := file.pos
    line      := file.readline()
    values    := StrSplit(line)
    key       := make_a_unique_key(values)
    hash[key] := position
  }
  save hash to "database.index"
}

lookup(values) {
  global file, hash
  file.seek( hash[ make_a_unique_key( values )])
  return file.readline()
}

See: Manual entry for File object

0
votes

If You use a RAMDISK you can speedup your Search Results. I can not see what your tf.ahk Script is, but 5-10 mins is to long, 1 - the loop code in the tf.ahk File is not good 2 - or it does do everytime a search from out your HardDisk c: and not from out your ram memory.

You can Download the Freeware From Here Imdisk

  • The RAMDISK is a virtual Harddisk, that is placed into Your Ram Memory.

  • a RAMDISK is +-100x Faster then a Harddisk.

First Install IMDISK on your Windows System, then you can simple install/put/copy any Application or put/copy any csv file to the Ramdisk [example - z:\example.csv]

z:\example.csv

Rob,school,oxford,tennis
James,school,cambridge,cricket
lucy,college,harvard,football
lily,hotel,novotel,golf
..
..
..
1 million lines.

Note : With this AHK Keyboard Shortcut Macro Script, you can type - for Example : college,harvard and then it will do a search in NotePad and it will give the result Value football (This is only a example to test it out, for bigger files you will need to Chance the code a little bit and use a other Application(SpeadSheat program) that can Handle Bigger Files)

Search.ahk

; this Script works on Windows 10 system.
; You can Click key, F1 to EXIT

#SingleInstance, force

run notepad.exe z:\example.csv
WinWaitActive,example.csv, , 2



loop
{

InputBox,Clipboard,Search for Name
sleep 100
send ^{Home} ;goto Top of the Page
sleep 100
send ^f ;goto the Find box 
sleep 100
send ^v ;paste Clipboard Value
sleep 100
send {enter}
sleep 1500 ;You can change this sleep codeline - How bigger the search, how larger the sleep must be.
send {esc}
sleep 100

;If you want to Select the Whole Search Line - you can use this code.
;send {Home}
;sleep 100
;send +{End}

;If you want to Select the Rigth Site of the Line - you can use this code. 
send {Right}
sleep 100
send +{End}

sleep 100
send ^c ;copy the Search LineValue to Clipboard 
LineValue = %Clipboard%
sleep 100
word_array := StrSplit(LineValue, ",")
sleep 100
SearchValue := word_array[1]" "       ;word_array[2]" "word_array[3]
sleep 100
MsgBox "SearchValue",%SearchValue%
}

F1::ExitApp