问题描述
部署在Azure上的VM资源,偶尔CPU飙高,但是发现的时候已经恢复,无法判断当时High CPU原因。
在Windows系统中,有什么方式能记录CPU被进程占用情况,查找出当时是排名前列的进程信息,用于后期分析。
问题解答
方式一:Performance Monitor
可以通过Windows系统自带的 Performance Monitor 来获取高CPU的情况。(缺点:Performance Monitor需要长期运行,对于没有规律且短时间无法重现的情况,不太适用)
图形版的操作步骤可以参考博文: 【Azure微服务 Service Fabric 】在SF节点中开启Performance Monitor及设置抓取进程的方式
也可以通过CMD直接调用Performance Monitor的进程(Logman.exe )创建Monitor和开启,停止
第一步:创建Performance Monitor 指标 (Counter)
Logman.exe create counter Perf-1min -f bin -max 500 -c "\LogicalDisk(*)\*" "\Memory\*" "\Network Interface(*)\*" "\Paging File(*)\*" "\PhysicalDisk(*)\*" "\Server\*" "\System\*" "\Process(*)\*" "\Processor(*)\*" "\Cache\*" "\GPU Adapter Memory(*)\*" "\GPU Engine(*)\*" "\GPU Local Adapter Memory(*)\*" "\GPU Non Local Adapter Memory(*)\*" "\GPU Process Memory(*)\*" -si 00:01:00 -cnf 24:00:00 -v mmddhhmm -o C:\PerfMonLogs\Perf-1min.blg |
注:counter的名称,-si 间隔时间, -cnf 固定的时间间隔, -o文件输出路径都可以自定义修改。
第二步:开启
Logman start Perf-1min
第三步:停止
Logman stop Perf-1min
CMD运行效果:
方式二:Powershell Get-Counter
The Get-Counter
cmdlet gets performance counter data directly from the performance monitoring instrumentation in the Windows family of operating systems. Gets performance counter data from local and remote computers.
Get-Counter 直接从 Windows 系列操作系统中的性能监视检测中获取性能计数器数据。
You can use the Get-Counter
parameters to specify one or more computers, list the performance counter sets and the instances they contain, set the sample intervals, and specify the maximum number of samples. Without parameters, Get-Counter
gets performance counter data for a set of system counters.
可以使用 Get-Counter 参数指定一台或多台计算机,列出性能计数器集及其包含的实例,设置采样间隔,以及指定最大样本数。如果不带参数,Get-Counter 将获取一组系统计数器的性能计数器数据。
Many counter sets are protected by access control lists (ACL). To see all counter sets, open PowerShell with the Run as administrator option.
许多计数器集受访问控制列表 (ACL) 的保护。若要查看所有计数器集,请使用“以管理员身份运行”选项打开 PowerShell。
本文中示例为:(保存脚本为 getcpu.ps1,直接运行输入间隔时间(秒)和CPU阈值,脚本会长时间运行)
#获取总的cpu function all_cpu(){ $total = Get-Counter "\Process(*)\% Processor Time" -ErrorAction SilentlyContinue | select -ExpandProperty CounterSamples | where InstanceName -eq _total $idle = Get-Counter "\Process(*)\% Processor Time" -ErrorAction SilentlyContinue | select -ExpandProperty CounterSamples | where InstanceName -eq idle $cpu_total = ($total.cookedvalue-$idle.cookedvalue)/100/$env:NUMBER_OF_PROCESSORS return $cpu_total.tostring("P") } #获取前五cpu占用 function get_top_5(){ Get-Counter "\Process(*)\% Processor Time" -ErrorAction SilentlyContinue ` | select -ExpandProperty CounterSamples ` | where {$_.Status -eq 0 -and $_.instancename -notin "_total", "idle"} ` | sort CookedValue -Descending ` | select TimeStamp, @{N="Name";E={ $friendlyName = $_.InstanceName try { $procId = [System.Diagnostics.Process]::GetProcessesByName($_.InstanceName)[0].Id $proc = Get-WmiObject -Query "SELECT ProcessId, ExecutablePath FROM Win32_Process WHERE ProcessId=$procId" $procPath = ($proc | where { $_.ExecutablePath } | select -First 1).ExecutablePath $friendlyName = [System.Diagnostics.FileVersionInfo]::GetVersionInfo($procPath).FileDescription } catch { } $friendlyName }}, @{N="ID";E={ $friendlyName = $_.InstanceName [System.Diagnostics.Process]::GetProcessesByName($_.InstanceName)[0].Id}} , @{N="CPU";E={($_.CookedValue/100/$env:NUMBER_OF_PROCESSORS).ToString("P")}} -First 5 ` | ft -a } #主函数 #入参,间隔时间, CPU阈值 function main{ param( [parameter(Mandatory=$true)] [ValidateNotNullOrEmpty()][int] $sleeptime, [parameter(Mandatory=$true)] [ValidateNotNullOrEmpty()][int] $cpu_set ) $count = 0 while(1){ $count += 1 echo "Check CPU times : $count" if ($(all_cpu)*100 -gt $cpu_set){ echo "===== ===== Start Logging ====== =====" >> C:\checkcpu_all_cpu.log echo "CPU : $(all_cpu)" >> C:\checkcpu_all_cpu.log echo $(get_top_5) >> C:\checkcpu_top_5.log } #每隔多少秒运行一次 start-sleep -s $sleeptime } } #执行主函数 main
执行效果:
参考资料
Get-Counter: https://docs.microsoft.com/zh-cn/powershell/module/microsoft.powershell.diagnostics/get-counter?view=powershell-7.2
[END]