Clang AST parsing for automated code generation

简介: 原文地址:http://www.seethroughskin.com/blog/?p=2172 Syntax traversal is a powerful tool.

原文地址:http://www.seethroughskin.com/blog/?p=2172

Syntax traversal is a powerful tool. With it you can automate repetitive tasks, search for semantic errors, generate wrappers, and so much more.  A few months ago I hit a hump (read: a f***ing mountain) of an issue with some legacy code that has been on my plate for awhile now.

Having killed a small forest’s worth of paper I decided that manually tracing paths through code was an inefficient use of my time.  Instead I went in search an automatic method for generating an abstract syntax tree(AST) for C++ code.  My idea was that I could use the AST to generate something like a direct graph to better visualize code flow.

There are a few flavors of readable syntax generation out there (and likely more):

I’ve been a fan of Clang for awhile now and they have a very robust and active community making it a natural choice for my AST generation needs.  Clang also has decent articles on getting started in both Windows and Linux.  If you don’t have Clang installed, I suggest reading that linked article.  You’ll need compiled versions of clang.exe and libclang.dll to follow along with the Python binding below.

[Caveat]

Clang at revision 183352 (2013-06-05)  has a slight issue in that it won’t identify Linkage specifications (e.g. extern “C” void foo()).  To fix this issue, follow these steps from my SO answer:

//Bit of a necroanswer but if you go in to \llvm\tools\clang\lib\Sema\SemaCodeComplete.cpp and add the following line:
 
case Decl::LinkageSpec:  return CXCursor_LinkageSpec;
 
//To the switch in:
CXCursorKind clang::getCursorKindForDecl(const Decl *D)
 
//It should resolve the issue of clang's Python binder 
//returning UNEXPOSED_DECL instead of the correct LINKAGE_SPEC.
//This change was made at revision 183352(2013-06-05).
 
//Example from my version:
CXCursorKind clang::getCursorKindForDecl(const Decl *D) {
if (!D)
    return CXCursor_UnexposedDecl;
 
switch (D->getKind()) {
    case Decl::Enum:               return CXCursor_EnumDecl; 
    case Decl::LinkageSpec:  return CXCursor_LinkageSpec;
   // ......

 

[Libclang]

Libclang is Clang’s dynamic binding that is used in conjunction w/ Python to allow for interpreted code evaluation.  Eli Bendersky has a great post on using libclang that I referenced frequently while writing code.  Clang documentation can be very lacking in some areas and Eli’s post does a good job of explaining the steps to getting libclang working with Python.  If you follow his steps the basic pipeline is:

  • Compile libclang
  • Add libclang to your PATH environment variable
    • On *Nix it’s LD_LIBRARY_PATH
    • On Windows it’s the standard PATH
    • Or do it in python: os.environ['PATH'] = ‘/path/to/libclang’
  • Copy the Clang/Python bindings from /llvm/tools/clang/bindings/python to your python installation or however you’d prefer to install it.
  • Verify it works by opening a python console and typing: improt clang.cindex
  • Squee when it works

[Example]

Once libclang is tied to Python it’s time to test your code.  When I got to this step I had trouble finding any good examples.  There are really only 2 and they can be found in your Clang installation folder: llvm\tools\clang\bindings\python\examples\cindex.  Others can be gleaned from blog posts and StackOverflow.  Here is a simple example I adapted that looks specifically for the LINKAGE_SPEC cursor type. LINKAGE_SPEC refers to code like `extern “C”`

#!/usr/bin/env python
 
import os
import sys
from pprint import pprint
import clang.cindex
os.environ['PATH'] = os.environ['PATH']  + os.getcwd()
 
def get_info(node, depth=0):
	return { 'kind' : node.kind,
             'usr' : node.get_usr(),
             'spelling' : node.spelling,
             'location' : node.location,
             'extent.start' : node.extent.start,
             'extent.end' : node.extent.end,
             'is_definition' : node.is_definition()}
 
def output_cursor_and_children(cursor, level=0):
 
	#LINKAGE_SPEC (http://clang.llvm.org/doxygen/classclang_1_1LinkageSpecDecl.html)
	#Represents code of the type:  extern "C" void foo()
	if cursor.kind == clang.cindex.CursorKind.LINKAGE_SPEC:
		pprint(('nodes', get_info(cursor)))
 
	# Recurse for children of this cursor
	has_children = False;
	for c in cursor.get_children():
		if not has_children:
			has_children = True
		output_cursor_and_children(c, level+1)
 
def main():
	from clang.cindex import Index
	from pprint import pprint
 
	from optparse import OptionParser, OptionGroup
 
	global opts
 
	parser = OptionParser("usage: %prog {filename} [clang-args*]")
	parser.disable_interspersed_args()
	(opts, args) = parser.parse_args()
 
	if len(args) == 0:
		print 'invalid number arguments'
 
	index = Index.create()
	tu = index.parse(None, args)
 
	if not tu:
		print "unable to load input"
 
	output_cursor_and_children(tu.cursor)
 
if __name__ == '__main__':
    main()
#include "test.h"
 
int main(){
	Foo f;
	return 0;
}
#ifndef TEST_H
#define TEST_H
 
class Foo 
{
	int data_;
public:
	Foo(){}
 
	void bar(int data){data_ = data;}
};
 
extern "C" __declspec( dllexport )void test1(){}
 
#endif

How to run:

python linkage_dump.py test.cpp

clangarang

[Conclusion]

There are so many other ways to make use of ASTs and I wish I had more time to include some of them.  Suffice it to say I’ll probably end up posting about ASTs a few more times.  At least until I work through enough examples to meet my immediate needs.

目录
相关文章
|
机器学习/深度学习 人工智能 自然语言处理
五分钟带你了解ChatGPT的基本原理
五分钟带你了解ChatGPT的基本原理
1294 0
五分钟带你了解ChatGPT的基本原理
|
存储 固态存储 索引
搜索和推荐统一存储层的新进展和思考
我们在2017年统一了搜索和推荐场景下的HA3、iGraph、RTP和DII四大引擎的存储层(参见统一之战),帮助它们取得了的更迅速的迁移能力、更快速的数据恢复能力和更丰富的数据召回能力。 最近一年来,我们在统一的存储框架上又做了进一步的演进,下面将分别从架构、Build服务以及存储模型角度介绍我们的新进展和思考。   1.架构   在我们的传统架构(参见统一之战)中,
3061 0
|
8月前
|
机器学习/深度学习 传感器 算法
《DeepSeek赋能工业互联网:大幅提升设备故障诊断准确率》
DeepSeek技术通过多源数据融合、深度学习算法和实时在线监测,大幅提升工业互联网中设备故障诊断的准确性和及时性。它整合振动、温度、压力等多类型数据,构建精准故障模型,支持钢铁、化工、电力等行业的设备状态全面感知。DeepSeek还具备持续学习能力,适应复杂多变的工业场景,确保长期稳定的高精度故障诊断,助力企业实现高效、安全的生产运营。
685 3
|
11月前
|
自动驾驶 物联网 5G
|
8月前
|
机器学习/深度学习 人工智能 自然语言处理
BioMedGPT-R1:生物医药ChatGPT诞生!蒸馏DeepSeek R1突破人类专家水平,分子解析+靶点预测一键搞定
BioMedGPT-R1 是清华大学与水木分子联合开发的多模态生物医药大模型,支持跨模态问答、药物分子理解与靶点挖掘,性能显著提升。
381 5
|
8月前
|
人工智能 云计算 数据中心
阿里云当选UALink联盟董事会成员,推进新一代GPU互连技术!
阿里云当选UALink联盟董事会成员,推进新一代GPU互连技术!
323 2
|
12月前
|
存储 SQL NoSQL
数据库的演变
【10月更文挑战第6天】
205 2
|
机器学习/深度学习 人工智能 搜索推荐
深度学习在医学影像诊断中的应用与未来展望
本文探讨了深度学习在医学影像诊断中的重要应用,分析了其优势和挑战,并展望了未来发展方向。通过对当前技术进展和研究成果的详细分析,揭示了深度学习在提高诊断精度、加快影像分析速度和个性化医疗中的潜力,以及面临的数据隐私、模型可解释性等挑战。最后,展望了结合多模态数据、引入自监督学习等新兴技术可能带来的新突破,为未来医学影像诊断领域的进一步发展提供了思路和展望。
359 27
|
机器学习/深度学习 人工智能 供应链
💰钱途无量!掌握AI Prompt在商业数据分析中的5大赚钱技巧
【8月更文挑战第1天】在数据驱动的商业时代,掌握AI Prompt技术为企业开启财富之门。本文探讨通过AI Prompt实现商业数据分析中的五大赚钱技巧:1)精准市场预测,利用历史数据预测未来趋势;2)个性化营销,分析客户行为提高转化率;3)优化库存管理,智能调整采购计划降低成本;4)风险预警,实时监测并提出应对策略;5)数据洞察驱动创新,挖掘深层规律引领市场。掌握这些技巧,企业将在竞争中脱颖而出,实现商业价值最大化。
236 2
|
存储 IDE API
最佳实践:通过 FastAPI APIRouter 提升开发效率
FastAPI 是一个现代的、高性能的 Python Web 框架,它提供了 APIRouter 来帮助组织和管理路由。APIRouter 是一个可用于组织和分组路由的类,使得代码结构更加清晰和可维护。本文将介绍 FastAPI APIRouter 的用法,包括实践案例以及在 IDE 编辑器中的运行步骤。