Open source “self-hosted web archiving” tool

What is ArchiveBox?

ArchiveBox is an open source, self-hosted web archiving solution, it helps individuals or organizations save web content for offline browsing and ensures long-term data accessibility

The goal is to allow users to proactively save the web content they care about, so as to avoid losing important information due to link failures, content changes, or service offline. Can be archived: Bookmarks, social media content (such as Facebook photos, YouTube videos), research papers, legal evidence, etc.

Core functions and technical characteristics

Various input methods

You can enter what you want to save into the ArchiveBox from multiple sources, including:

  • Individual URL
  • Browser bookmarks or history
  • RSS feed
  • Pocket, Pinboard and other collection services

Automatically grab and save content in multiple formats

ArchiveBox generates multiple archive formats for each page, such as:

  • Original HTML, SingleFile HTML, Screenshots PNG, PDF, WARC, etc.
  • Social media content: TXT text, comments, authors, pictures, etc.
  • Media content: MP3/MP4, subtitles, metadata, thumbnails, etc.
  • Code hosting services (GitHub/GitLab): Clone code, README, etc.

Multiple access methods

  • Command Line Tool (CLI): Complete control and automated script integration
  • Web application interface: Intuitive operation and preview
  • Python libraries/ REST APIs/ Webhooks: Convenient for secondary development and integration

data storage mode

  • Save using a file system without requiring proprietary formats
  • Archive content is stored in a local folder for long-term use or migration

Installation and deployment methods

ArchiveBox supports multiple installation methods, and the following are recommended:

  1. Docker / Docker Compose (recommended)
    Contains all dependencies for easy deployment and upgrade.
  2. Command line installation (for Linux / macOS / Debian, etc.)pip install archivebox archivebox installor use curl | bash one-click script.
  3. supported platforms: Linux, macOS, BSD (native), Windows can be used through Docker or WSL2
  4. resource requirements: Minimum 500 MB RAM, recommended ≥2 GB; file systems that support compressed storage (such as ZFS, BTRFS) are more efficient

Working principle and design concept

  • ArchiveBox uses a variety of tools (such as wget, headless Chrome) to grab content.
  • The author believes that the core advantage lies in “decentralization”, avoiding relying on a single service (such as archive.org) for all network archives, saving them by users themselves and sharing them in the future
  • The project uses the Django framework to build the backend and uses SQLite as the local database; the plug-in system is based on Pluggy; and the REST API uses django-ninja and Pydantic

Quick Get Started Example

  1. initialize project directorymkdir my_archive && cd my_archive archivebox init --setup
  2. Add the URL to archivearchivebox add https://example.com
  3. Launch a local Web service previewarchivebox server
  4. Import history or bookmarks
    Support the import of Pocket, Pinboard, browser bookmarks, RSS feeds, etc.

Community feedback and usage scenarios

Developers mentioned in the Reddit discussion that ArchiveBox is a complex but feature-rich Django project that can replace archive.org and enable more formats for grabbing (screenshots, PDF, etc.)

Other users emphasize that it can enhance the autonomy and redundant backup capabilities of network content preservation

Summary list

characteristicsdescribed
typeOpen source, self-hosted web archiving tool
support inputURL, bookmark, history, RSS, favorite services
save formatHTML, PDF, PNG, WARC, audio and video, text, code, etc.
useCLI / Web Interface/ API
Recommended installation methodsDocker or pip + install script
applicable platformNative to Linux/macOS/BSD;Windows via Docker or WSL2
technology stackPython、Django、SQLite、Pluggy、django-ninja
design conceptDistributed, data control, autonomous long-term archiving

Github:https://github.com/ArchiveBox/archivebox

Oil tubing:

Scroll to Top