1
votes

When i tried extracting bunch of data using Regular Expression Extractor from a site in Jmeter, I found something strange, that Jmeter taking long time (>50 minutes)

Reference Name: dataId

RegEx used: <strong><a href="(.+?)=(.+?)&(.+?)">

Template: $2$

Match No: -1 --> to get all the matches

This regular expression is running on a 250 lines HTML source page, So potentially it could find more than 100 match for this expression on that page(as i said i'm extracting bunch of data)

I checked the CPU usage of Jmeter in Task Manager and it was 25% for java.exe

My PC has an i5 quad core processor ,But java.exe is using only one core of them and it is taking very long time(literally more than an hour)

How to speed this process of extracting data? Where is the actual problem?

1

1 Answers

2
votes

Your regex is too generalized. Try to use something like <strong><a href="([^"=&]+)=([^"=&]+)&([]^"=&+?)">. Java uses backtracking algorithm in its regex implementation and it could be very slow with some inputs.

As to processor loading, it's normal. Regex implementation does not use multiple threads so it loads one processor core. If you want to leverage full processor power, you have to implement multithreading in some way. E.g. process 4 different HTML pages in parallel.