前言
图是一种用于对象之间的成对关系进行建模的数学结构。
它由两个主要元素组成:节点和关系。
节点:节点可以看作是传统数据库中的记录。每个节点代表一个对象或实体,例如一个人或一个地方。节点按标签分类,这有助于根据其角色对其进行分类和查询,例如“客户”或“产品”。
关系:这些是节点之间的连接,定义不同实体之间的交互或关系。例如,一个人可以通过“EMPLOYED_BY
”关系与公司建立联系;或者通过“LIVES_IN
”关系与某个地方建立联系。 为了以类似的结构存储数据,引入了一个新的数据库系列:图形数据库。
图形数据库是一种旨在将数据之间的关系视为与数据本身同等重要的数据库。
它们经过优化,可以高效处理互连数据和复杂查询。
Neo4j
,它采用灵活的图形结构,除了节点和关系之外,还包括属性、标签和路径特征来表示和存储数据。
Neo4j
还支持向量搜索,这使其非常适合混合 GraphRAG
场景。
一、安装Neo4j
这里创建 Neo4j
数据库有两种方式,可以本地创建数据库实例,或者选择云数据库实例。
比如 Neo4j Aura
或者 Neo4j Sandbox
,它们都提供了免费实例。
我这里选择的是 Neo4j Sandbox
进入 https://sandbox.neo4j.com/
官网,注册自己的账号,新建自己的项目,出现下列界面:
点击 Add to Neo4j Desktop project
,然后下载 Neo4j Desktop
应用程序。
这里可以看到,需要我们链接远程数据库或者本地数据库,进入我们刚刚打开的数据库控制台,找到我们的远程地址和用户名密码。
当然也可以不用本地 desktop
链接,直接就在控制台查看也可以,但是建议最好本地下载一个。
用户名和密码在下列第二个链接详情界面:
二、django中使用neo4j图数据库
进入我们的项目 testsite
根目录,第一步安装 neo4j
库,还有 langchain
中的langchain_experimental
,它提供了 LLMGraphTransformer
模块,可以调用图数据进行数据增删改查:
pip install neo4j pip install langchain_experimental
打开 testsite/members/views.py
视图文件,添加一个视图:
from langchain_community.graphs import Neo4jGraph from langchain_openai import ChatOpenAI from langchain_experimental.graph_transformers import LLMGraphTransformer from langchain.schema import Document os.environ["NEO4J_URI"] = "" os.environ["NEO4J_USERNAME"] = "" os.environ["NEO4J_PASSWORD"] = "" os.environ["OPENAI_API_KEY"] = "" def saveGraphRag(request): graph = Neo4jGraph(refresh_schema=False)
视图里面实例化了一个 Neo4jGraph
,环境变量填好我们的 Neo4j
数据库的地址,账号,还有密码,还有我们的 OpenAiKey
。
这里方便大家学习,我直接使用的 OpenAi
的 key
,这里还可以使用 ollama
的本地模型,我后面再详细讲解本地模型和图数据库的交互。
这个视图里面添加数据导入:
text = """ Mr. and Mrs. Dursley, of number four, Privet Drive, were proud to say that they were perfectly normal, thank you very much. They were the last people you'd expect to be involved in anything strange or mysterious, because they just didn't hold with such nonsense. Mr. Dursley was the director of a firm called Grunnings, which made drills. He was a big, beefy man with hardly any neck, although he did have a very large mustache. Mrs. Dursley was thin and blonde and had nearly twice the usual amount of neck, which came in very useful as she spent so much of her time craning over garden fences, spying on the neighbors. The Dursleys had a small son called Dudley and in their opinion there was no finer boy anywhere. The Dursleys had everything they wanted, but they also had a secret, and their greatest fear was that somebody would discover it. They didn't think they could bear it if anyone found out about the Potters. Mrs. Potter was Mrs. Dursley's sister, but they hadn't met for several years; in fact, Mrs. Dursley pretended she didn't have a sister, because her sister and her good-for-nothing husband were as unDursleyish as it was possible to be. The Dursleys shuddered to think what the neighbors would say if the Potters arrived in the street. The Dursleys knew that the Potters had a small son, too, but they had never even seen him. This boy was another good reason for keeping the Potters away; they didn't want Dudley mixing with a child like that. """ documents = [Document(page_content=text)]
这里引入的是哈利波特小说的其中某一段落。
建立我们的 LLMGraphTransformer
实例:
llm = ChatOpenAI(temperature=0, model_name="gpt-4o-mini") llm_transformer = LLMGraphTransformer(llm=llm)
我这里为了节省资源,选择了 gpt-4o-mini
,性能并不会差太多。
接下来的重点是提取图数据,并且存储到图数据里面:
# 提取图数据 graph_documents = llm_transformer.convert_to_graph_documents(documents) # 存储到 neo4j graph.add_graph_documents( graph_documents, baseEntityLabel=True, include_source=True )
整个存储视图创完毕,打开我们的 testsite/members/url.py
,创建路由地址:
path('save_graph_rag/', views.saveGraphRag, name='graphrag'),
然后访问地址 http://127.0.0.1:8000/polls/save_graph_rag/
, 执行保存逻辑,我们可以看到 终端的输出:
看到 DBMS server
返回的创建数据库语句,说明我们的 graph
已经保存成功。
打开我们 Neo4j DeskTop
,链接我们的远程数据库:
这里选择在浏览器打开:
这里的数据库信息可以看到我们创建的各个节点标签:
选择默认创建的 __Entity__
:
便可以看到我们创建的 graph
了,它会对我们刚刚的段落里,每个角色和关系被清晰的创建和连接。
我们的 LLMGraphTransformer
能够从数据中捕获相关实体和关系,而无需我们指定任何内容。
三、总结
在第一部分,我们着重讲述了 graphrag
的生成过程。
而第二部分中,讲述了 Neo4j
的概念,我们看到基于图形方法的实际实现过程,并且和项目结合起来。
而即将到来的第三部分,将添加向量搜索功能,图检索技术,图数据库和 graphrag
的结合方式,会充分展现非结构化数据检索器和图数据检索器强大的原因。