Langchain and RAG Best Practices

本文涉及的产品
阿里云百炼推荐规格 ADB PostgreSQL,4核16GB 100GB 1个月
简介: This is a quick-start essay for LangChain and RAG which mainly refers to the Langchain chat with your data course which are taught by Harrison Chase and Andrew Ng.You can check the entire code in the rag101 repository. This article is also posted on my blog, feel free to check it.

This is a quick-start essay for LangChain and RAG which mainly refers to the Langchain chat with your data course which are taught by Harrison Chase and Andrew Ng.

You can check the entire code in the rag101 repository. This article is also posted on my blog, feel free to check it.

LangChain and RAG best practices

Introduction

LangChain

LangChain is an Open-source developer framework for building LLM applications.

It components are as below:

Prompt

  • Prompt Templates: used for generating model input.
  • Output Parsers: implementations for processing generated results.
  • Example Selectors: selecting appropriate input examples.

Models

  • LLMs
  • Chat Models
  • Text Embedding Models

Indexes

  • Document Loaders
  • Text Splitters
  • Vector Stores
  • Retrievers

Chains

  • Can be used as a building block for other chains.
  • Provides over 20 types of application-specific chains.

Agents

  • Supports 5 types of agents to help language models use external tools.
  • Agent Toolkits: provides over 10 implementations, agents execute tasks through specific tools.

RAG process

The whole RAG process lays on the Vector Store Loading and Retrieval-Augmented Generation.

Vector Store Loading

Load the data from different sources, split and convert them into vector embeddings.

Retrieval-Augmented Generation

  1. After the user's input Query, the system will retrieve the most relevant document fragments (Relevant Splits) from the vector store.
  2. The retrieved relevant fragments will be combined into a Prompt, which will be passed along with the context to the large language model (LLM).
  3. Finally, the language model will generate an answer based on the retrieved fragments and return it to the user.

Loaders

You can use loaders to deal with different kind and format of data.

Some are public and some are proprietary. Some are structured and some are not.

Some useful lib:

  • pdf: pypdf
  • youtube audio: yt_dlp pydub
  • web page: beautifulsoup4

For more loaders, you can check the official docs.

You can check the entire code here.

PDF

Now, we can practice:

First, install the lib:

pip install langchain-community 
pip install pypdf

You can check the demo in the

from langchain.document_loaders import PyPDFLoader

# In fact, the langchain calls the pypdf lib to load the pdf file
loader = PyPDFLoader("ProbRandProc_Notes2004_JWBerkeley.pdf")
pages = loader.load()

print(type(pages))
# <class 'list'>
print(len(pages))
# Print the total num of pages

# Using the first page as an example
page = pages[0]
print(type(page))
# <class 'langchain_core.documents.base.Document'>

# What is inside the page:
# 1. page_content
# 2. meta_data: the description of the page

print(page.page_content[0:500])
print(page.metadata)

Web Base Loader

Also we install the lib first:

pip install beautifulsoup4

The WebBaseLoader is based on the beautifulsoup4 lib.

from langchain_community.document_loaders import WebBaseLoader

loader = WebBaseLoader("https://zh.d2l.ai/")
pages = loader.load()
print(pages[0].page_content[:500])

# You can also use json as the post processing
# import json
# convert_to_json = json.loads(pages[0].page_content)

Splitters

Splitting Documents into smaller chunks. Retaining the meaningful relationships.

Why split?

  • The limitation of GPU: the GPT model with more than 1B parameters. The forward propagation cannot process such a large parameters. So the split is necessary.
  • More efficient computation.
  • Some fixed size of sequence.
  • Better generalization.

However, the split points may lose some information. So we split should consider the semantic.

Type of splitters

  • CharacterTextSplitter
  • MarkdownHeaderTextSplitter
  • TokenTextsplitter
  • SentenceTransformersTokenTextSplitter
  • RecursiveCharacterTextSplitter: Recursively tries to split by different characters to find one that works.
  • Language: for CPP, Python, Ruby, Markdown etc
  • NLTKTextSplitter: sentences using NLTK(Natural Language Tool Kit)
  • SpacyTextSplitter: sentences using Spacy

For more, check the docs.

Example CharacterTextSplitter and RecursiveCharacterTextSplitter

You can check the entire code here.

from langchain.text_splitter import RecursiveCharacterTextSplitter, CharacterTextSplitter

example_text = """When writing documents, writers will use document structure to group content. \
This can convey to the reader, which idea's are related. For example, closely related ideas \
are in sentances. Similar ideas are in paragraphs. Paragraphs form a document. \n\n  \
Paragraphs are often delimited with a carriage return or two carriage returns. \
Carriage returns are the "backslash n" you see embedded in this string. \
Sentences have a period at the end, but also, have a space.\
and words are separated by space."""

c_splitter = CharacterTextSplitter(
    chunk_size=450, # the size of the chunk
    chunk_overlap=0, # the overlap of the chunk, which can be shared with the previous chunk
    separator = ' '
)
r_splitter = RecursiveCharacterTextSplitter(
    chunk_size=450,
    chunk_overlap=0, 
    separators=["\n\n", "\n", " ", ""] # priority of the separators
)

print(c_splitter.split_text(example_text))
# split at 450 characters
print(r_splitter.split_text(example_text))
# split at first \n\n

Vectorstores and Embeddings

Review the RAG process:

Benefits:

  1. Improve the accuracy of the query. When query the similar chunks, the accuracy will be higher.
  2. Improve the efficiency of the query. Minimize the computation when query the similar chunks.
  3. Improve the coverage of the query. The chunks can cover every point of the document.
  4. Facilitate the Embeddings.

Embeddings


If two sentences have similar meanings, then they will be closer in the high-dimensional semantic space.

Vector Stores

Store every chunk in a vector store. When customer query, the query will be embedded and then find the most similar vectors which means the index of these chunks, and then return the chunks.

Practice

Embeddings

You can check the entire code here.

First, install the lib:

The chromadb is a lightweight vector database.

pip install chromadb

What we need is a good embedding model, you can select what you like. Refer to the docs.

Here I use the ZhipuAIEmbeddings. So you should install the lib:

pip install zhipuai

Here is the test code:

from langchain_community.embeddings import ZhipuAIEmbeddings

embed = ZhipuAIEmbeddings(
    model="embedding-3",
    api_key="Entry your own api key"
)

input_texts = ["This is a test query1.", "This is a test query2."]
print(embed.embed_documents(input_texts))

Vector Stores

You can check the entire code here.

pip install langchain-chroma

Then we can use the Chroma to store the embeddings.

from langchain_chroma import Chroma
from langchain_community.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.embeddings import ZhipuAIEmbeddings

# load the web page
loader = WebBaseLoader("https://en.d2l.ai/")
docs = loader.load()

# split the text into chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1500,
    chunk_overlap = 150
)
splits = text_splitter.split_documents(docs)
# print(len(splits))

# set the embeddings models
embeddings = ZhipuAIEmbeddings(
    model="embedding-3",
    api_key="your own api key"
)

# set the persist directory
persist_directory = r'.'

# create the vector database
vectordb = Chroma.from_documents(
    documents=splits,
    embedding=embeddings,
    persist_directory=persist_directory
)
# print(vectordb._collection.count())

# query the vector database
question = "Recurrent"
docs = vectordb.similarity_search(question, k=3)
# print(len(docs))
print(docs[0].page_content)

Then you can find the chorma.sqlite3 file in the specific directory.

Retrieval

This part is the core part of the RAG.

Last part we have already used the similarity_search method. On top of that, we also have other methods.

  • Basic semantic similarity
  • Maximum Marginal Relevance(MMR)
  • Metadata
  • LLM Aided Retrieval

Similarity Search

Similarity Search calculates the similarity between the query vector and all document vectors in the database to find the most relevant document.

The similarity measurement methods include cosine similarity and Euclidean distance, which can effectively measure the closeness of two vectors in a high-dimensional space.

However, relying solely on similarity search may result in insufficient diversity, as it only focuses on the match between the query and the content, ignoring the differences between different pieces of information. In some applications, especially when it is necessary to cover multiple different aspects of information, the extended method of Maximum Marginal Relevance (MMR) can better balance relevance and diversity.

Practice

The practice part is on the pervious part.

Maximum Marginal Relevance (MMR)

Retrieving only the most relevant documents may overlook the diversity of information. For example, if only the most similar response is selected, the results may be very similar or even contain duplicate content. The core idea of MMR is to balance relevance and diversity, that is, to select the information most relevant to the query while ensuring that the information is diverse in content. By reducing the repetition of information between different pieces, MMR can provide a more comprehensive and diverse set of results.

The process of MMR is as follows:

  1. Query the Vector Store: First convert the query into vectors using the embedding model.
  2. Choose the fetch_k most similar responses. Find the top k most similar vectors from the vector store.
  3. Within those responses choose the k most diverse. By calculating the similarity between each response, MMR will prefer results that are more different from each other, thus increasing the coverage of information. This process ensures that the returned results are not only "most similar", but also "complementary".

The key parameter is the lambda which is the weight of the relevance and diversity.

  • When lambda is close to 1, MMR will be more like the similarity search.
  • When lambda is close to 0, MMR will be more like the random search.

Practice

We can adjust the code in Vector stores part to use the MMR method. The full code is in the retrieval/mmr.py file.

# query the vector database with MMR
question = "How the neural network works?"
# fetch the 8 most similar documents, and then choose the 2 most relevant documents
docs_mmr = vectordb.max_marginal_relevance_search(question, fetch_k=8, k=2)
print(docs_mmr[0].page_content[:100])
print(docs_mmr[1].page_content[:100])

Metadata

When our query is under some specific conditions, we can use the metadata to filter the results.

For example, the information such as page numbers, authors, timestamps, etc. These information can be used as filtering conditions during retrieval, thus improving the accuracy of the query.

Practice

You can check the entire code here.

Add new documents from another website, and then filter the results from the specific website.

from langchain_chroma import Chroma
from langchain_community.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.embeddings import ZhipuAIEmbeddings

# load the web page
loader = WebBaseLoader("https://en.d2l.ai/")
docs = loader.load()

# split the text into chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1500,
    chunk_overlap = 150
)
splits = text_splitter.split_documents(docs)
# print(len(splits))

# set the embeddings models
embeddings = ZhipuAIEmbeddings(
    model="embedding-3",
    api_key="your_api_key"
)

# set the persist directory
persist_directory = r'.'

# create the vector database
vectordb = Chroma.from_documents(
    documents=splits,
    embedding=embeddings,
    persist_directory=persist_directory
)
# print(vectordb._collection.count())

# add new documents from another website
new_loader = WebBaseLoader("https://www.deeplearning.ai/")
new_docs = new_loader.load()

# split the text into chunks
new_splits = text_splitter.split_documents(new_docs)

# add to the existing vector database
vectordb.add_documents(new_splits)

# Get all documents
all_docs = vectordb.similarity_search("What is the difference between a neural network and a deep learning model?", k=20)

# Print the metadata of the documents
for i, doc in enumerate(all_docs):
    print(f"Document {i+1} metadata: {doc.metadata}")
# Document 1 metadata: {'language': 'en', 'source': 'https://en.d2l.ai/', 'title': 'Dive into Deep Learning — Dive into Deep Learning 1.0.3 documentation'}
# Document 2 metadata: {'language': 'en', 'source': 'https://en.d2l.ai/', 'title': 'Dive into Deep Learning — Dive into Deep Learning 1.0.3 documentation'}
# Document 3 metadata: {'language': 'en', 'source': 'https://en.d2l.ai/', 'title': 'Dive into Deep Learning — Dive into Deep Learning 1.0.3 documentation'}
# Document 4 metadata: {'description': 'DeepLearning.AI | Andrew Ng | Join over 7 million people learning how to use and build AI through our online courses. Earn certifications, level up your skills, and stay ahead of the industry.', 'language': 'en', 'source': 'https://www.deeplearning.ai/', 'title': 'DeepLearning.AI: Start or Advance Your Career in AI'}

question = "how the neural network works?"
# filter the documents from the specific website
docs_meta = vectordb.similarity_search(question, k=1, filter={
   "source": "https://www.deeplearning.ai/"})
print(docs_meta[0].page_content[:100])

LLM Aided Retrieval

It uses language models to automatically parse sentence semantics, extract filtering information.

SelfQueryRetriever

LangChain provides the SelfQueryRetriever module, which can analyze the semantics of the question sentence from the language model, and extract the search term and filter conditions.

  • The search term of the vector search
  • The filter conditions of the document metadata

For example, for the question "Besides Wikipedia, which health websites are there?", SelfQueryRetriever can infer that "Wikipedia" represents the filter condition, that is, to exclude the documents from Wikipedia.

Practice

from langchain.llms import OpenAI
from langchain.retrievers.self_query.base import SelfQueryRetriever
from langchain.chains.query_constructor.base import AttributeInfo

llm = OpenAI(temperature=0)

metadata_field_info = [
    AttributeInfo(
        name="source", #  source is to tell the LLM the data is from which document
        description="The lecture the chunk is from, should be one of `docs/loaders.pdf`, `docs/text_splitters.pdf`, or `docs/vectorstores.pdf`",
        type="string",
    ),
    AttributeInfo(
        name="page", # page is to tell the LLM the data is from which page
        description="The page from the lecture",
        type="integer",
    ),
]

document_content_description = "the lectures of retrieval augmentation generation"
retriever = SelfQueryRetriever.from_llm(
    llm,
    vectordb,
    document_content_description,
    metadata_field_info,
    verbose=True
)

question = "What is the main topic of second lecture?"

Compression

When using vector retrieval to get relevant documents, directly returning the entire document fragment may lead to resource waste, as the actual relevant part is only a small part of the document. To improve this, LangChain provides a "compression" retrieval mechanism.

Its working principle is to first use standard vector retrieval to obtain candidate documents, and then use a language model to compress these documents based on the semantic meaning of the query sentence, only retaining the relevant part of the document.

For example, for the query "the nutritional value of mushrooms", the retrieval may return a long document about mushrooms. After compression, only the sentences related to "nutritional value" are extracted from the document.

Practice

from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor

def pretty_print_docs(docs):
    print(f"\n{'-' * 100}\n".join([f"Document {i+1}:\n\n" + d.page_content for i, d in enumerate(docs)]))

llm = OpenAI(temperature=0)
# initialize the compressor
compressor = LLMChainExtractor.from_llm(llm)
# initialize the compression retriever
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, # llm chain extractor
    base_retriever=vectordb.as_retriever() # vector database retriever
)
# compress the source documents
question = "What is the main topic of second lecture?"
compressed_docs = compression_retriever.get_relevant_documents(question)
pretty_print_docs(compressed_docs)

Question Answering

  1. Multiple relevant documents have been retrieved from the vector store
  2. Potentially compress the relevant splits to fit into the LLM context. The system will generate the necessary background information (System Prompt) and keep the user's question (Human Question), and then integrate all the information into a complete context input.
  3. Send the information along with our question to an LLM to select and format an answer

RetrievalQA Chain

We need to use Langchain to combine the prompts into the desired format and pass them to the large language model to generate the desired reply. This solution is better than the traditional method of inputting the question into the large language model because:

  • Enhance the accuracy of the answer: By combining the retrieval results with the generation ability of the large language model, the relevance and accuracy of the answer are greatly improved.
  • Support real-time update of the knowledge base: The retrieval process depends on the data in the vector store, which can be updated in real time according to needs, ensuring that the answer reflects the latest knowledge.
  • Reduce the memory burden of the model: By using the information in the knowledge base as the input context, the dependence on the model's internal parameters for storing knowledge is reduced.

In addition to the RetrievalQA Chain, there are other methods, such as Map_reduce, Refine and Map_rerank.

Map_reduce

Map_reduce method divides the documents into multiple chunks, and then passes each chunk to the language model (LLM) to generate an independent answer. After that, all the generated answers will be merged into the final answer, and the merging process (reduce) may include summarizing, voting, etc.

This method is suitable for the large amount of documents parallel processing, also with quick response.

Refine

Refine method generates an initial answer from the first document chunk, and then processes each subsequent document one by one. Each block will supplement or correct the existing answer, and finally obtain an optimized and improved answer after all chunks are processed.

This method is suitable for the most quality answer.

Map_rerank

Map_rerank divides the documents into multiple chunks, and then generates an independent answer for each chunk. The scoring is based on the relevance and quality of the answer. Finally, the answer with the highest score will be selected as the final output.

This method is suitable for the most match answer rather than combine with all the information.

Practice

You can check the entire code here.

First, install the lib:

pip install pyjwt

You can use the demo to check the model performance.

from langchain_community.chat_models import ChatZhipuAI
from langchain_core.messages import AIMessage, HumanMessage, SystemMessage

chat = ChatZhipuAI(
    model="glm-4-flash",
    temperature=0.5, # the temperature of the model
    api_key="your_api_key"
)

messages = [
    AIMessage(content="Hi."),  # AI generated message
    SystemMessage(content="Your role is a poet."),  # the role of the model
    HumanMessage(content="Write a short poem about AI in four lines."),  # the message from the user
]

# get the answer from the model
response = chat.invoke(messages)
print(response.content)

Then we can use the RetrievalQA chain to get the answer from the model.

from langchain_community.chat_models import ChatZhipuAI
from langchain_core.messages import AIMessage, HumanMessage, SystemMessage
from langchain_community.embeddings import ZhipuAIEmbeddings
from langchain_chroma import Chroma
from langchain_community.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate # You can also import the PromptTemplate

loader = WebBaseLoader("https://en.d2l.ai/")
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1500,
    chunk_overlap = 150
)
splits = text_splitter.split_documents(docs)

persist_directory = '.'

# initialize the embeddings
embeddings = ZhipuAIEmbeddings(
    model="embedding-3",
    api_key="your_api_key"
)

# initialize the vector database
vectordb = Chroma.from_documents(
    documents=splits,
    embedding=embeddings,
    persist_directory=persist_directory
)

chat = ChatZhipuAI(
    model="glm-4-flash",
    temperature=0.5,
    api_key =  "your_api_key"
)

# Now you can ask the question about the web to the model
question = "What is this book about?"

# You can also create a prompt template
template = """
Please answer the question based on the following context.
If you don't know the answer, just say you don't know, don't try to make up an answer.
Answer in at most three sentences. Please answer as concisely as possible. Finally, always say "Thank you for asking!"
Context: {context}
Question: {question}
Helpful answer:
"""
QA_CHAIN_PROMPT = PromptTemplate.from_template(template)

qa_chain = RetrievalQA.from_chain_type(
    chat,
    retriever=vectordb.as_retriever(),
    return_source_documents=True, # Return the source documents(optional)
    chain_type_kwargs={
   "prompt": QA_CHAIN_PROMPT} # Add the prompt template to the chain
)
result = qa_chain({
   "query": question})
print(result["result"])
print(result["source_documents"][0]) # If you set return_source_documents to True, you can get the source documents

Conversational Retrieval Chain

The whole process of RAG is as follows:

Conversational Retrieval Chain is a technical architecture that combines dialogue history and intelligent retrieval capabilities.

  1. Chat History: The system will record the user's dialogue context as an important input for subsequent question processing.
  2. Question: The user's question is sent to the retrieval module.
  3. Retriever: The system retrieves the content related to the question from the vector database through the retriever.
  4. System & Human: The system integrates the user's question and the extracted relevant information into the Prompt, providing structured input to the language model.
  5. LLM: The language model generates the answer based on the context, and then returns the answer to the user.

Memory

ConversationBufferMemory is a memory module in the LangChain framework, which is used to manage the dialogue history. Its main function is to store the dialogue content between users and AI in the form of a buffer, and then return these records when needed, so that the model can generate responses in a consistent context.

The demo of it is as follows:

from langchain.memory import ConversationBufferMemory
memory = ConversationBufferMemory(
    memory_key="chat_history", # This key can be referenced in other modules (such as chains or tools).
    return_messages=True # whether to return the messages in list, otherwise return the messages in block.
)

Besides, we also need the corresponding RA module. Then we can test the memory.

from langchain.chains import ConversationalRetrievalChain
retriever=vectordb.as_retriever()
qa = ConversationalRetrievalChain.from_llm(
    chat,
    retriever=retriever,
    memory=memory
)
question = "What is the main topic of this book?"
result = qa.invoke({
   "question": question})
print(result['answer'])

question = "What is my last question?"
result = qa.invoke({
   "question": question})
print(result['answer'])

Practice

You can check the entire code here.

The best practice is as follows:

from langchain_community.chat_models import ChatZhipuAI
from langchain_core.messages import AIMessage, HumanMessage, SystemMessage
from langchain_community.embeddings import ZhipuAIEmbeddings
from langchain_chroma import Chroma
from langchain_community.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain

loader = WebBaseLoader("https://en.d2l.ai/")
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1500,
    chunk_overlap = 150
)
splits = text_splitter.split_documents(docs)

persist_directory = '.'

# initialize the embeddings
embeddings = ZhipuAIEmbeddings(
    model="embedding-3",
    api_key="your_api_key"
)

# initialize the vector database
vectordb = Chroma.from_documents(
    documents=splits,
    embedding=embeddings,
    persist_directory=persist_directory
)

chat = ChatZhipuAI(
    model="glm-4-flash",
    temperature=0.5,
    api_key =  "your_api_key"
)

# initialize the memory
memory = ConversationBufferMemory(
    memory_key="chat_history",
    return_messages=True
)

# create the ConversationalRetrievalChain
retriever = vectordb.as_retriever()
qa = ConversationalRetrievalChain.from_llm(
    chat,
    retriever=retriever,
    memory=memory
)

# First question
question = "What is the main topic of this book?"
result = qa.invoke({
   "question": question})
print(result['answer'])

# Second question
question = "Can you tell me more about it?"
result = qa.invoke({
   "question": question})
print(result['answer'])
相关实践学习
阿里云百炼xAnalyticDB PostgreSQL构建AIGC应用
通过该实验体验在阿里云百炼中构建企业专属知识库构建及应用全流程。同时体验使用ADB-PG向量检索引擎提供专属安全存储,保障企业数据隐私安全。
AnalyticDB PostgreSQL 企业智能数据中台:一站式管理数据服务资产
企业在数据仓库之上可构建丰富的数据服务用以支持数据应用及业务场景;ADB PG推出全新企业智能数据平台,用以帮助用户一站式的管理企业数据服务资产,包括创建, 管理,探索, 监控等; 助力企业在现有平台之上快速构建起数据服务资产体系
目录
相关文章
|
11月前
|
数据库
Langchain中改进RAG能力的3种常用的扩展查询方法
有多种方法可以提高检索增强生成(RAG)的能力,其中一种方法称为查询扩展。我们这里主要介绍在Langchain中常用的3种方法
549 0
|
17天前
|
存储 人工智能 自然语言处理
LangChain RAG入门教程:构建基于私有文档的智能问答助手
本文介绍如何利用检索增强生成(RAG)技术与LangChain框架构建基于特定文档集合的AI问答系统。通过结合检索系统和生成机制,RAG能有效降低传统语言模型的知识局限与幻觉问题,提升回答准确性。文章详细展示了从环境配置、知识库构建到系统集成的全流程,并提供优化策略以改进检索与响应质量。此技术适用于专业领域信息检索与生成,为定制化AI应用奠定了基础。
122 5
LangChain RAG入门教程:构建基于私有文档的智能问答助手
|
6月前
|
存储 人工智能 搜索推荐
解锁AI新境界:LangChain+RAG实战秘籍,让你的企业决策更智能,引领商业未来新潮流!
【10月更文挑战第4天】本文通过详细的实战演练,指导读者如何在LangChain框架中集成检索增强生成(RAG)技术,以提升大型语言模型的准确性与可靠性。RAG通过整合外部知识源,已在生成式AI领域展现出巨大潜力。文中提供了从数据加载到创建检索器的完整步骤,并探讨了RAG在企业问答系统、决策支持及客户服务中的应用。通过构建知识库、选择合适的嵌入模型及持续优化系统,企业可以充分利用现有数据,实现高效的商业落地。
294 6
|
6月前
|
机器学习/深度学习 人工智能 开发框架
解锁AI新纪元:LangChain保姆级RAG实战,助你抢占大模型发展趋势红利,共赴智能未来之旅!
【10月更文挑战第4天】本文详细介绍检索增强生成(RAG)技术的发展趋势及其在大型语言模型(LLM)中的应用优势,如知识丰富性、上下文理解和可解释性。通过LangChain框架进行实战演练,演示从知识库加载、文档分割、向量化到构建检索器的全过程,并提供示例代码。掌握RAG技术有助于企业在问答系统、文本生成等领域把握大模型的红利期,应对检索效率和模型融合等挑战。
401 14
|
6月前
|
存储 人工智能 搜索推荐
揭秘LangChain+RAG如何重塑行业未来?保姆级实战演练,解锁大模型在各领域应用场景的神秘面纱!
【10月更文挑战第4天】随着AI技术的发展,大型语言模型在各行各业的应用愈发广泛,检索增强生成(RAG)技术成为推动企业智能化转型的关键。本文通过实战演练,展示了如何在LangChain框架内实施RAG技术,涵盖金融(智能风控与投资决策)、医疗(辅助诊断与病历分析)及教育(个性化学习推荐与智能答疑)三大领域。通过具体示例和部署方案,如整合金融数据、医疗信息以及学生学习资料,并利用RAG技术生成精准报告、诊断建议及个性化学习计划,为企业提供了切实可行的智能化解决方案。
255 5
|
6月前
|
存储 搜索推荐 数据库
运用LangChain赋能企业规章制度制定:深入解析Retrieval-Augmented Generation(RAG)技术如何革新内部管理文件起草流程,实现高效合规与个性化定制的完美结合——实战指南与代码示例全面呈现
【10月更文挑战第3天】构建公司规章制度时,需融合业务实际与管理理论,制定合规且促发展的规则体系。尤其在数字化转型背景下,利用LangChain框架中的RAG技术,可提升规章制定效率与质量。通过Chroma向量数据库存储规章制度文本,并使用OpenAI Embeddings处理文本向量化,将现有文档转换后插入数据库。基于此,构建RAG生成器,根据输入问题检索信息并生成规章制度草案,加快更新速度并确保内容准确,灵活应对法律与业务变化,提高管理效率。此方法结合了先进的人工智能技术,展现了未来规章制度制定的新方向。
200 3
|
6月前
LangChain-06 RAG With Source Doc 通过文档进行检索增强
LangChain-06 RAG With Source Doc 通过文档进行检索增强
68 3
|
6月前
|
存储 自然语言处理
LangChain-04 RAG Retrieval-Augmented Generation 检索增强生成
LangChain-04 RAG Retrieval-Augmented Generation 检索增强生成
87 3
|
6月前
LangChain-05 RAG Conversational 增强检索会话
LangChain-05 RAG Conversational 增强检索会话
62 2
|
11月前
|
存储 机器学习/深度学习 人工智能
RAG:AI大模型联合向量数据库和 Llama-index,助力检索增强生成技术
RAG:AI大模型联合向量数据库和 Llama-index,助力检索增强生成技术
RAG:AI大模型联合向量数据库和 Llama-index,助力检索增强生成技术

热门文章

最新文章

下一篇
oss创建bucket