Start by retrieving the most relevant information from a vast library of documents, and then use the language model to generate clear, accurate answers based on that information – meaning you get answers that are not only up-to-date but also backed by real literature, rather than relying solely on memorized content from AI training data.
In the age of information explosion, hundreds and thousands of papers are published on arXiv every day.
Whether you’re focusing on AI, math, physics, or computer systems, manually sifting through papers, writing abstracts, and putting together reading lists can be extremely time-consuming.
The GitHub project arxiv-paper-curator provides an elegant solution:
Automatically scrape arXiv’s latest papers with AI → Automatically summarize → Automatically generate Markdown daily → Automatically publish to GitHub.
It is essentially a “research assistant automation toolkit”.
What can the project do?
The core functionality of arxiv-paper-curator can be summed up in one sentence:
Automatically fetch your areas of concern from arXiv every day and generate summaries, highlights, and recommended lists with large models.
More specifically, it includes:
1. Automatically scrape the latest papers
- By topic (e.g., AI, CV, NLP, Math, Physics, etc.)
- Get title, author, summary, PDF link
- Support custom keywords, categories, and number of papers
2. Summarize papers automatically with LLMs
The project will call the large model you configured (such as GPT-4) to generate for each paper:
- Refined Summary
- Main contributions
- Keywords / Tags
- Whether it is worth paying attention to
It’s like having the AI “read through” the paper and tell you the key points.
3. Automatically generate Markdown daily/weekly reports
All summaries are organized into a clearly structured Markdown document, similar to:
## 今日推荐论文
- [论文标题](PDF 链接)
- 摘要:……
- 亮点:……
You can publish it directly as a “paper daily”.
4. Automated running via GitHub Actions
Automate the following processes every day (or at a cycle you set):
- Grab the paper
- Invoke AI summarization
- Generate reports
- Automatically submit to the repository
Introduction to the project structure
The warehouse is roughly composed of the following modules:
- src/:
The core logic of paper scraping, abstract generation, and Markdown output - workflows/:
GitHub Actions, which automatically triggers the processing process every day - config.yaml:
Customize themes, keywords, update frequency, and more - outputs/ (or README auto-update):
Place the generated paper list and abstract
Why is it worth using?
Save a lot of time
Dozens of new papers every day, let AI automatically read and filter, and you only need to read the selected content.
Suitable for content creators
If you want to do a “paper daily/weekly report”, it can generate content fully automatically, allowing you to save 90% of your editing time.
Scalable
You can extend the script to push the result to:
- Notion
- Telegram
- RSS
Essentially a customizable automation pipeline
Helps to learn:
- arXiv API
- LLM workflow design
- GitHub Actions automation
- Information Filtering System
Very developer-friendly.
Epilogue
If you often track academic frontiers, like efficient learning, or want to build your own “AI Paper Daily System”, this project is well worth a try.
With a simple configuration, it automatically generates a list of papers that are clearly structured, concisely contented, and ready for reading or publishing every day.
Github:https://github.com/jamwithai/arxiv-paper-curator
Tubing: