nvprof --query-events

简介: nvprof --query-events

Available Events:

Name Description

Device 0 (GeForce GTX 970M):

 

Domain domain_a:

tex0_cache_sector_queries: Number of texture cache 0 requests. This increments by 1 for each 32-byte access.
tex1_cache_sector_queries: Number of texture cache 1 requests. This increments by 1 for each 32-byte access.
tex0_cache_sector_misses: Number of texture cache 0 misses. This increments by 1 for each 32-byte access.
tex1_cache_sector_misses: Number of texture cache 1 misses. This increments by 1 for each 32-byte access.
elapsed_cycles_sm: Elapsed clocks

Domain domain_b:

fb_subp0_read_sectors: Number of DRAM read requests to sub partition 0, increments by 1 for 32 byte access.
fb_subp1_read_sectors: Number of DRAM read requests to sub partition 1, increments by 1 for 32 byte access.
fb_subp0_write_sectors: Number of DRAM write requests to sub partition 0, increments by 1 for 32 byte access.
fb_subp1_write_sectors: Number of DRAM write requests to sub partition 1, increments by 1 for 32 byte access.

Domain domain_c:

gld_inst_8bit: Total number of 8-bit global load instructions that are executed by all the threads across all thread blocks.
gld_inst_16bit: Total number of 16-bit global load instructions that are executed by all the threads across all thread blocks.
gld_inst_32bit: Total number of 32-bit global load instructions that are executed by all the threads across all thread blocks.
gld_inst_64bit: Total number of 64-bit global load instructions that are executed by all the threads across all thread blocks.
gld_inst_128bit: Total number of 128-bit global load instructions that are executed by all the threads across all thread blocks.
gst_inst_8bit: Total number of 8-bit global store instructions that are executed by all the threads across all thread blocks.
gst_inst_16bit: Total number of 16-bit global store instructions that are executed by all the threads across all thread blocks.
gst_inst_32bit: Total number of 32-bit global store instructions that are executed by all the threads across all thread blocks.
gst_inst_64bit: Total number of 64-bit global store instructions that are executed by all the threads across all thread blocks.
gst_inst_128bit: Total number of 128-bit global store instructions that are executed by all the threads across all thread blocks.

Domain domain_d:

prof_trigger_00: User profiled generic trigger that can be inserted in any place of the code to collect the related information. Increments per warp.
prof_trigger_01: User profiled generic trigger that can be inserted in any place of the code to collect the related information. Increments per warp.
prof_trigger_02: User profiled generic trigger that can be inserted in any place of the code to collect the related information. Increments per warp.
prof_trigger_03: User profiled generic trigger that can be inserted in any place of the code to collect the related information. Increments per warp.
prof_trigger_04: User profiled generic trigger that can be inserted in any place of the code to collect the related information. Increments per warp.
prof_trigger_05: User profiled generic trigger that can be inserted in any place of the code to collect the related information. Increments per warp.
prof_trigger_06: User profiled generic trigger that can be inserted in any place of the code to collect the related information. Increments per warp.
prof_trigger_07: User profiled generic trigger that can be inserted in any place of the code to collect the related information. Increments per warp.
warps_launched: Number of warps launched.
inst_issued0: Number of cycles that did not issue any instruction, increments per warp.
inst_issued1: Number of cycles that issued single instruction, increments per warp.
inst_issued2: Number of cycles that issued dual instructions, increments per warp.
inst_executed: Number of instructions executed per warp.
thread_inst_executed: Number of instructions executed by the active threads. For each instruction it increments by number of threads, including predicated-off threads, that execute the instruction. It does not include replays.
not_predicated_off_thread_inst_executed: Number of instructions executed by active and not predicated off threads, does not include replays. For each instruction it increments by the number of not predicated off threads that execute this instruction.
local_store: Number of executed store instructions where state space is specified as local, increments per warp on a multiprocessor.
local_load: Number of executed load instructions where state space is specified as local, increments per warp on a multiprocessor.
shared_load: Number of executed load instructions where state space is specified as shared, increments per warp on a multiprocessor.
shared_store: Number of executed store instructions where state space is specified as shared, increments per warp on a multiprocessor.
shared_atom_cas: Number of ATOMS.CAS instructions executed per warp.
shared_atom: Number of ATOMS instructions executed per warp.
global_atom_cas: Number of ATOM.CAS instructions executed per warp.
atom_count: Number of ATOM instructions executed per warp.
global_load: Number of executed load instructions where state space is specified as global, increments per warp on a multiprocessor.
global_store: Number of executed store instructions where state space is specified as global, increments per warp on a multiprocessor.
gred_count: Number of reduction operations performed per warp.
divergent_branch: Number of divergent branches within a warp. This counter will be incremented by one if at least one thread in a warp diverges (that is, follows a different execution path) via a conditional branch.
branch: Number of branch instructions executed per warp on a multiprocessor.
active_cycles: Number of cycles a multiprocessor has at least one active warp.
active_warps: Accumulated number of active warps per cycle. For every cycle it increments by the number of active warps in the cycle which can be in the range 0 to 64.
active_ctas: Accumulated number of active blocks per cycle. For every cycle it increments by the number of active blocks in the cycle which can be in the range 0 to 32.
sm_cta_launched: Number of blocks launched
shared_ld_bank_conflict: Number of shared load bank conflict generated when the addresses for two or more shared memory load requests fall in the same memory bank.
shared_st_bank_conflict: Number of shared store bank conflict generated when the addresses for two or more shared memory store requests fall in the same memory bank.
shared_ld_transactions: Number of transactions for shared load accesses. Maximum transaction size in maxwell is 128 bytes, any warp accessing more that 128 bytes will cause multiple transactions for a shared load instruction. This also includes extra transactions caused by shared bank conflicts.
shared_st_transactions: Number of transactions for shared store accesses. Maximum transaction size in maxwell is 128 bytes, any warp accessing more that 128 bytes will cause multiple transactions for a shared store instruction. This also includes extra transactions caused by shared bank conflicts.

Domain domain_e:

l2_subp0_write_sector_misses: Number of write requests sent to DRAM from slice 0 of L2 cache. This increments by 1 for each 32-byte access.
l2_subp0_read_sector_misses: Number of read requests sent to DRAM from slice 0 of L2 cache. This increments by 1 for each 32-byte access.
l2_subp1_write_sector_misses: Number of write requests sent to DRAM from slice 1 of L2 cache. This increments by 1 for each 32-byte access.
l2_subp1_read_sector_misses: Number of read requests sent to DRAM from slice 1 of L2 cache. This increments by 1 for each 32-byte access.
l2_subp0_read_tex_sector_queries: Number of read requests from Texture cache to slice 0 of L2 cache. This increments by 1 for each 32-byte access.
l2_subp0_write_tex_sector_queries: Number of write requests from Texture cache to slice 0 of L2 cache. This increments by 1 for each 32-byte access.
l2_subp0_read_tex_hit_sectors: Number of read requests from Texture cache that hit in slice 0 of L2 cache. This increments by 1 for each 32-byte access.
l2_subp0_write_tex_hit_sectors: Number of write requests from Texture cache that hit in slice 0 of L2 cache. This increments by 1 for each 32-byte access.
l2_subp0_read_sysmem_sector_queries: Number of system memory read requests to slice 0 of L2 cache. This increments by 1 for each 32-byte access.
l2_subp0_write_sysmem_sector_queries: Number of system memory write requests to slice 0 of L2 cache. This increments by 1 for each 32-byte access.
l2_subp0_total_read_sector_queries: Total read requests to slice 0 of L2 cache. This includes requests from L1, Texture cache, system memory. This increments by 1 for each 32-byte access.
l2_subp0_total_write_sector_queries: Total write requests to slice 0 of L2 cache. This includes requests from L1, Texture cache, system memory. This increments by 1 for each 32-byte access.
l2_subp1_read_tex_sector_queries: Number of read requests from Texture cache to slice 1 of L2 cache. This increments by 1 for each 32-byte access.
l2_subp1_write_tex_sector_queries: Number of write requests from Texture cache to slice 1 of L2 cache. This increments by 1 for each 32-byte access.
l2_subp1_read_tex_hit_sectors: Number of read requests from Texture cache that hit in slice 1 of L2 cache. This increments by 1 for each 32-byte access.
l2_subp1_write_tex_hit_sectors: Number of write requests from Texture cache that hit in slice 1 of L2 cache. This increments by 1 for each 32-byte access.
l2_subp1_read_sysmem_sector_queries: Number of system memory read requests to slice 1 of L2 cache. This increments by 1 for each 32-byte access.
l2_subp1_write_sysmem_sector_queries: Number of system memory write requests to slice 1 of L2 cache. This increments by 1 for each 32-byte access.
l2_subp1_total_read_sector_queries: Total read requests to slice 1 of L2 cache. This includes requests from L1, Texture cache, system memory. This increments by 1 for each 32-byte access.
l2_subp1_total_write_sector_queries: Total write requests to slice 1 of L2 cache. This includes requests from L1, Texture cache, system memory. This increments by 1 for each 32-byte access.


目录
相关文章
|
网络协议 Swift iOS开发
iOS CocoaPods
iOS CocoaPods
160 0
|
存储 Serverless 数据库
科普文:云计算服务类型IaaS, PaaS, SaaS, BaaS, Faas说明
本文介绍了云计算服务的几种主要类型,包括IaaS(基础设施即服务)、PaaS(平台即服务)、SaaS(软件即服务)、BaaS(后端即服务)和FaaS(函数即服务)。每种服务模式提供了不同的服务层次和功能,从基础设施的提供到应用的开发和运行,再到软件的交付使用,满足了企业和个人用户在不同场景下的需求。文章详细阐述了每种服务模式的特点、优势和缺点,并列举了相应的示例。云计算服务的发展始于21世纪初,随着互联网技术的普及,这些服务模式不断演进,为企业和个人带来了高效、灵活的解决方案。然而,使用这些服务时也需要注意服务的稳定性、数据安全性和成本等问题。
10226 5
|
12月前
|
敏捷开发 缓存 中间件
.NET技术的高效开发模式,涵盖面向对象编程、良好架构设计及高效代码编写与管理三大关键要素
本文深入探讨了.NET技术的高效开发模式,涵盖面向对象编程、良好架构设计及高效代码编写与管理三大关键要素,并通过企业级应用和Web应用开发的实践案例,展示了如何在实际项目中应用这些模式,旨在为开发者提供有益的参考和指导。
123 3
|
关系型数据库 MySQL 应用服务中间件
配置docker阿里云镜像地址
配置docker阿里云镜像地址
|
NoSQL Shell 网络安全
MongoDB连接指南:从基础到进阶
MongoDB连接指南:从基础到进阶
649 1
|
Java
Java一分钟之-方法定义与调用基础
【5月更文挑战第8天】本文介绍了Java编程中的方法定义和调用,包括基本结构、常见问题和避免策略。方法定义涉及返回类型、参数列表和方法体,易错点有返回类型不匹配、参数错误和忘记返回值。在方法调用时,要注意参数传递、静态与非静态方法的区分,以及重载方法的调用。避免错误的策略包括明确返回类型、参数校验、理解值传递、区分静态和非静态方法以及合理利用重载。通过学习和实践,可以提升编写清晰、可维护代码的能力。
454 0
|
JavaScript 知识图谱
【冷知识】获取网页所有的监听事件类型、方法。请认准getEventListeners
【冷知识】获取网页所有的监听事件类型、方法。请认准getEventListeners
|
安全 Linux
Linux 系统安全 - 近期发现的 polkit pkexec 本地提权漏洞(CVE-2021-4034)修复方案
Linux 系统安全 - 近期发现的 polkit pkexec 本地提权漏洞(CVE-2021-4034)修复方案
1683 2
|
SQL Java 数据库连接
Hibernate - 基础入门详解
Hibernate - 基础入门详解
347 1
|
Ubuntu Linux 网络安全
kali linux 配置 xrdp 远程桌面服务
今天误打误撞完成了 kali 下的 xrdp 配置
1159 0