1
votes

hi I am a Linux programmer

I have an order that monitor process cpus usage, so I use data on /proc/[pid]/stat № 14 and 15. That values are called utime and stime.

Example [/proc/[pid]/stat]

30182 (TTTTest) R 30124 30182 30124 34845 30182 4218880 142 0 0 0 5274 0 0 0 20 0 1 0 55611251 17408000 386 18446744073709551615 4194304 4260634 140733397159392 140733397158504 4203154 0 0 0 0 0 0 0 17 2 0 0 0 0 0 6360520 6361584 33239040 140733397167447 140733397167457 140733397167457 140733397168110 0

State after 5 sec

30182 (TTTTest) R 30124 30182 30124 34845 30182 4218880 142 0 0 0 5440 0 0 0 20 0 1 0 55611251 17408000 386 18446744073709551615 4194304 4260634 140733397159392 140733397158504 4203154 0 0 0 0 0 0 0 17 2 0 0 0 0 0 6360520 6361584 33239040 140733397167447 140733397167457 140733397167457 140733397168110 0

In test environment, this file refreshed 1 ~ 2 sec, so I assume this file often updated by system at least 1 sec.

So I use this calculation

process_cpu_usage = ((utime - old_utime) + (stime - old_stime))/ period

In case of above values

33.2 = ((5440 - 5274) + (0 - 0)) / 5

But, In commercial servers environment, process run with high load (cpu and file IO), /proc/[pid]/stat file update period increasing even 20~60 sec!!

So top/htop utility can't measure correct process usage value.

Why is this phenomenon occurring??

Our system is [CentOS Linux release 7.1.1503 (Core)]

1

1 Answers

2
votes

Most (if not all) files in the /proc filesystem are special files, their content at any given moment reflect the actual OS/kernel data at that very moment, they're not files with contents periodically updated. See the /proc filesystem doc.

In particular the /proc/[pid]/stat content changes whenever the respective process state changes (for example after every scheduling event) - for processes mostly sleeping the file will appear to be "updated" at slower rates while for active/running processes at higher rates on lightly loaded systems. Check, for example, the corresponding files for a shell process which doesn't do anything and for a browser process playing some video stream.

On heavily loaded systems with many processes in the ready state (like the one mentioned in this Q&A, for example) there can be scheduling delays making the file content "updates" appear less often despite the processes being ready/active. Such conditions seem to be more often encountered in commercial/enterprise environments (debatable, I agree).