Local Node.js app to save everything you browse and serve it offline

22120

- An archivist browser controller that caches everything you browse, a library server with full text search to serve your archive.

Running

Save your stuff

--remote-debugging-port=9222
npm i && npm run save

Serve your stuff

--remote-debugging-port=9222
npm run serve

Initial goal

Proof of concept of the ability to browse and transparently save everything, then switch off internet and browse it later as if you were still online.

Inspired by people talking about enriching bookmarks and browser history with the ability to save all your browsing data and search it, even independent of you being online or the site being online.

How it works

Uses Chrome DevTools to intercept all requests, and caches responses against a key made of (METHOD and URL) into an in memory map which it saves to disk every 10 seconds.

So far

The library server hasn't been implemented.
Only saving and serving with the archivist works.
You can use it by opening your browser with --remote-debugging-port=9222 then running npm run save . Everything you browser will be saved to cache.json
You can switch off your internet and run npm run serve (also with your browser on remote debugging) and browser everything you just saved as normal.

Future

Implement library server so we can actually save the responses to disk in the "file tree structure" of the site you browse, then serve it, and also index and search it. This will involve also serving request/response metadata and converting between the request/response format and a file format.
The idea is that you can browse a site and end up with a static directory structure of assets that you can then serve on a local static server and browse it basically as normal.
Generally improve code and efficiency.

The goal

To build a personal archive that you can search and use that does not depend on the continued existence of those sites, or on having internet, but that works just like you are browsing them.

Stuff that will probably be hard (and I haven't thought much about)

Streaming content (audio, video)
"Impure" request response pairs (such as if you call GET /endpoint 1 time you get "A", if you call it a second time you get "AA", and other examples like this).
WebSockets (how to capture and replay that faithfully?)

There are probably "good enough" solutions to all these, and likely some or all of them already exist and have been thought up by other smart people.

Higher level description

Basically this is like a "full spectrum record" of your browsing history, with all assets and their content saved. It's like going on holiday and taking a GoPro that saves everything you look at, except that the quality is such that when you replay it, it's actually the same as experiencing it the first time.

22120

Running

Save your stuff

Serve your stuff

Initial goal

How it works

So far

Future

The goal

Stuff that will probably be hard (and I haven't thought much about)

Higher level description

Recommend

Mac 不得不提的 iTerm2

ES6+转ES5（webpack+babel、指定多个js文件、自动注入）

孙正义再出手！印度最大眼镜电商Lenskart获软银2.75亿美元投资

微信的平台之路 | 十年复盘 EP01

相册管家类应用与百度网盘相比的优势是什么呢？

安全学习笔记之安全基础（一）

2019第三季度某地网络安全监测报告

文档分类太繁杂？MIT 和 IBM 联手，解决了这一难题

复合事件与显示效果

邻居发现协议（NDP）简易实现 | Yiran's Blog

About Joyk