

ArchiveBox - 自托管网站存档服务
source link: https://azhuge233.com/archivebox-%e8%87%aa%e6%89%98%e7%ae%a1%e7%bd%91%e7%ab%99%e5%ad%98%e6%a1%a3%e6%9c%8d%e5%8a%a1/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

开源自托管的网站存档服务,自动对输入的 URL 进行信息爬取,将其中的 HTML、媒体文件、JS、PDF 文件等归档,方便离线查看
以下引用官网的 Background & Motivation
The aim of ArchiveBox is to enable more of the internet to be archived by empowering people to self-host their own archives. The intent is for all the web content you care about to be viewable with common software in 50 – 100 years without needing to run ArchiveBox or other specialized software to replay it.
Vast treasure troves of knowledge are lost every day on the internet to link rot. As a society, we have an imperative to preserve some important parts of that treasure, just like we preserve our books, paintings, and music in physical libraries long after the originals go out of print or fade into obscurity.
Whether it’s to resist censorship by saving articles before they get taken down or edited, or just to save a collection of early 2010’s flash games you love to play, having the tools to archive internet content enables to you save the stuff you care most about before it disappears.
The balance between the permanence and ephemeral nature of content on the internet is part of what makes it beautiful. I don’t think everything should be preserved in an automated fashion–making all content permanent and never removable, but I do think people should be able to decide for themselves and effectively archive specific content that they care about.
Because modern websites are complicated and often rely on dynamic content, ArchiveBox archives the sites in several different formats beyond what public archiving services like Archive.org/Archive.is save. Using multiple methods and the market-dominant browser to execute JS ensures we can save even the most complex, finicky websites in at least a few high-quality, long-term data formats.
下文将展示如何在 Debian 10 下使用包管理搭建 ArchiveBox 服务
- Debian 10
以下未特殊说明的指令均在 root 用户下执行,使用其他用户请酌情添加 sudo
安装依赖环境
官方安装方法中未说明要单独安装 npm,但使用时需要 npm
直接安装 Debian 10 包管理默认版本即可,会顺带安装 node
apt update apt install npm
安装 ArchiveBox
echo "deb http://ppa.launchpad.net/archivebox/archivebox/ubuntu focal main" | tee /etc/apt/sources.list.d/archivebox.list apt-key adv --keyserver keyserver.ubuntu.com --recv-keys C258F79DCC02E369 apt update
安装 ArchiveBox
官方提供了两种方法,推荐使用 pip 安装
apt install archivebox # 或者 python3 -m pip install --upgrade --ignore-installed archivebox
使用包管理安装可能会无法运行,出现此状况后直接输入 pip 安装指令即可
包管理安装不成功是因为其提供的 Django 版本过低
设置 ArchiveBox
- 切换到非 root 用户(此步骤下的指令均在非 root 用户下执行)
- 执行
su - [Username]
su - [Username]
- 执行
- 新建 ArchiveBox 空目录
- 执行
mkdir ~/archivebox
mkdir ~/archivebox
- 执行
- 初始化 ArchiveBox
- 执行
cd ~/archiveboxarchivebox init --setup
cd ~/archivebox archivebox init --setup
- 安装过程中会提示新建 Web 界面的管理员账户,输入密码和邮箱
- 安装完毕
- 执行
- 启用 ArchiveBox WebUI
- 安装完毕后开启 WebUI,执行
archivebox server 0.0.0.0:[port]
archivebox server 0.0.0.0:[port]
注意 ArchiveBox 没有权限监听 1 – 1024 端口
- 安装完毕后开启 WebUI,执行
Recommend
-
49
README.md
-
48
ArchiveBox The open-source self-hosted web archive. :arrow_forward:
-
43
-
25
ArchiveBoxOpen-source self-hosted web archiving. Qui...
-
3
V2EX › 云计算 CODING 网站托管服务倒闭 mercury233 · 2 天前 · 2279 次点击
-
4
Install and use ArchiveBox self-hosted internet archivingArchiveBox is a self-hosted and powerful internet archiving solution written in Python. It enables one to collect, save and view sites you want to save offline. Archive...
-
8
<?xml encoding="utf-8" ??>Introduction ArchiveBox is a self-hosted internet archiving solution to preserve and view sites offline. This guide explains how to...
-
5
最近洒家的 GitHub Pro 教育优惠到期了。洒家担心无法再从私有仓库发布 GitHub Pages 网站,于是提前研究了其他几款主流的静态网站托管平台,发现它们的 Web 服务配置和 GitHub Pages 有一些差别,服务器的行为特性各不相同。于是洒家设计了一套测试样例进行了更深入...
-
5
Umami: 更简单的自托管网站统计服务 2023.6.14 2023.6.14 Posts 此前博客都是用 Google Analytics 来提供网站统计服务,统计博客中各个页面的访问情...
-
5
ArchiveBox is Super Cool January 13, 2024 4-minute readHave you ever used archive.org’s Internet Wayback Machine? It’s...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK