Introduction: Use large language models (such as OpenAI, DeepSeek, etc.) and vector databases (such as Milvus, Zilliz Cloud, etc.) to efficiently search, evaluate, and reason private data, which is especially suitable for enterprise knowledge management, intelligent question and answer systems and information retrieval scenarios.
DeepSearcher is an open source project that aims to combine reasoning large language models (LLMs) and vector databases to search, evaluate and reason private data to provide high-precision answers and comprehensive reports. This project is mainly used in scenarios such as enterprise knowledge management, intelligent question and answer systems and information retrieval.
project overview
DeepSearcher enables in-depth search and analysis of private data by integrating multiple inference LLMs (such as OpenAI’s o1, o3-mini, DeepSeek, Grok3, etc.) and vector databases (such as Milvus, Zilliz Cloud, etc.). Its core functions include:
- Private data search: Maximize the use of internal data while ensuring data security. If necessary, online content can be integrated to provide more accurate answers.
- Vector database management: Supports vector databases such as Milvus and allows data partitioning to improve retrieval efficiency.
- Flexible embedding options: Compatible with multiple embedding models, users can choose the optimal model according to their needs.
- Various LLM supports: Supports large-scale models such as DeepSeek and OpenAI for intelligent Q & A and content generation.
- Document Loader: Supports local file loading, and web page crawling function is under development.
quick start
installation
You can install DeepSearcher using the following steps:
1. Clone warehouse:
git clone https://github.com/zilliztech/deep-searcher.git
2. Create a Python virtual environment and activate:
bash
cd deep-searcher
python3 -m venv .venv
source .venv/bin/activate
3. Installation dependencies:
pip install -e .
4. Set environment variables:
Add your OPENAI_API_KEY to the environment variable. If you change LLM in your configuration, make sure you have the appropriate API key ready.
example
The following is a simple example of usage:
from deepsearcher.configuration import Configuration, init_config
from deepsearcher.online_query import query
#Initialize configuration
config = Configuration()
config.set_provider_config("llm", "OpenAI", {"model": "gpt-4o-mini"})
init_config(config=config)
#Load local data
from deepsearcher.offline_loading import load_from_local_files
load_from_local_files(paths_or_directory="your_local_path")
#Query
result = query("Please write a report on XXX. ")
module supports
DeepSearcher supports multiple modules, including:
- Embedded model: Support open source embedded models, OpenAI (requires OPENAI_API_KEY), VoyageAI (requires VOYAGE_API_KEY), Amazon Bedrock (requires AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY), etc.
- LLM support: Support OpenAI, DeepSeek, Grok3 (coming soon), SiliconFlow inference service, TogetherAI inference service, Google Gemini, SambaNova cloud inference service, etc.
- Document Loader: Supports local file loading (such as PDF, TXT, MD), and web crawling function is under development.
- Vector database support: Milvus (same as Zilliz) is currently supported.
future plans
DeepSearcher plans to enhance web crawling capabilities in the future, support more vector databases (such as FAISS), increase support for more large models, and provide a RESTful API interface (completed). We welcome community contributions to build a stronger DeepSearcher!
GitHub:https://github.com/zilliztech/deep-searcher
Oil tubing: