Boredcaveman | Posts - JOYK Joy of Geek, Geek News, Link all geek

Database-less torrent website

10/1/2022

TL;DR

Since IPFS, a p2p network for storing and retrieving files, has a javascript implementation that runs on browsers, it can be used in torrent websites to retrieve a small database from peers instead of the server. A .sqlite file, 13 MB in size containing around 135,000 records, can be retrieved in 8 seconds from the IPFS network using a 100 Mbit connection. Since the database object is small enough to be cached by the browser, subsequent visits take zero. Live demo here!

Websites go down, torrents get lost

Local file sharing websites are struggling with law enforcement and their only defence is to frequently delocate the website by changing DNS name and using new IP as reverse proxy. Many of them have been closed by their owners because of threats of legal actions. One little downside of this is that the owners hardly ever dump a backup database to the wild, so those torrents are virtually lost, because they can't be reached anymore by the public.

Database as a torrent, served from peers

Years ago I found out about IPFS, a p2p network and protocol that aims to decentralize the Internet. It shares the same principles about peers and DHT network of BitTorrent to retrieve files based on hash, even though it's not intended to be a replacement. From the user perspective is much more easier to share and retrieve a file in this network. The only calls you need to use are add() and cat(). Know what? It runs on browser too!

Well, what could they do if someone hosts a torrent website without hosting any data or metadata about torrents? What if the torrent database is retrieved from a peer-to-peer network by the browser? In this case, who is violating what?

The only infos you need to store about a torrent are title, size and the magnet URI of course. Considering an average torrent title to be 50 bytes long, the size 14[^1] bytes and the magnet URI 40 bytes (without trackers informations), the size of a single record is about ~100 bytes. What about 100,000 records? Only 10 MB of storage for such amount of torrents. Even though it's relatively small compared to world wide torrent websites, like ThePirateBay hosting a 7 GB database with ~20 million records, it's enough for a local website and for communities around a general topic, small enough to be downloaded almost instantly from the IPFS network.

[^1]: size expressed in bytes, stored as a string

Let's see it in action!

Live demo here!

As I wanted to test the whole thing to see if it could actually work, I started looking for a dump and I found the database from TNT Village. What an irony: TNT Village, closed in 2019, was The italian file sharing website moderated by ScambioEtico, a project/political movement from the Italian Pirate Party. As they closed, The owner decided to dump the database in the current state before shutting down the whole thing. Unfortunately, no one used it to start a new website again and the community moved to other closed websites (private trackers). The torrents inside it are largely dead.

Shrinking the database, adding it on IPFS

To shrink the size of the dump, originally 26 MB containing 135,000 records, I dropped irrelevant columns like author and post referring to the (now closed) froum, and mildly-relevant columns like the author of the torrent and description. The result is a database of 13 MB, half size of the original. I then added an pinned it to my local IPFS node.

$ sqlite3
sqlite> .mode csv
sqlite> .open dump.csv torrent
sqlite> alter table drop *columns to save space*;
sqlite> vacuum;
sqlite> .save dump.sqlite
$ ipfs add dump.sqlite
$ ipfs pin add 'Qm...some1337cid...ftW'
$ ipfs daemon

IPFS on browser

Documentation

<script src="https://cdn.jsdelivr.net/npm/ipfs/dist/index.min.js"></script>
<script type="text/javascript">

async function loadAndCat(cid) {
    const node = await Ipfs.create()
    const stream = await node.cat(cid)

    let data = ''
    for await (const chunk of stream) {
        data += chunk.toString()
    }

    console.log(data)
}
loadIPFSAndCat('QmYwAPJzv5CZsnA625s3Xf2nemtYgPpHdWEz79ojWnPbdG/readme')

</script>

SQLite on browser

Documentation

<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/sql-wasm.min.js"></script>
<script type="text/javascript">

async function loadDBAndExec(query) {
    const SQLPromise = initSqlJs({
        locateFile: file => 'https://cdn.jsdelivr.net/npm/[email protected]/dist/sql-wasm.wasm'
    })

    const DBPromise = fetch('/path/to/database.sqlite') // This will be done on IPFS
        .then(result => result.arrayBuffer())

    const [SQL, buffer] = await Promise.all([SQLPromise, DBPromise])
    const db = new SQL.Database(new Uint8Array(buffer))
    db.exec(query)
}
loadDBAndExec('SELECT * FROM my_table LIMIT 10;')

</script>

Merge

let dbPromise = new Promise((resolve, reject) => {
    try {
        Promise.all([getDbFromIPFS(), loadDbModule()])
            .then(([buffer, SQL]) => resolve(new SQL.Database(buffer)))
    } catch(e) {
        reject(e)
    }
})

async function getDbFromIPFS() {
    const node = await Ipfs.create()
    const stream = await node.cat('Qm...some1337cid...ftW')

    let i = 0
    const data = new Uint8Array(12873728) // DB size in bytes
    for await (const chunk of stream) {
        data.set(chunk, i)
        i += chunk.length
    }

    return data
}

function loadDbModule() {
    return initSqlJs({
        locateFile: file => 'https://cdn.jsdelivr.net/npm/[email protected]/dist/sql-wasm.wasm'
    })
}

function search(searchString) {
    return dbPromise
        .then(database => database.exec(
            'SELECT * FROM torrent WHERE title LIKE \'' + searchString.trim() + '%\' LIMIT 10;'
        ))
}
search('something').then(result => console.log(result))

Results

dbsize: 13 MB
records: 135,000 defined as (title, size, category, magnet URI)
- magnet URI w/o trackers informations to save space
loading time (first time): 8 seconds on cold start (download time from IPFS)
loading time (on reload): 0 seconds (browser caching)

Final thoughts

Pros ordered by relevance

it resists censorship
- taking down one website is useless, you should take down the file from IPFS
in 8 seconds you have full access to every torrent ever uploaded
after first visit, the loading time is 0 till the database is updated (a new CID to download)

Cons ordered by relevance

not fully decentralized
- connection to IPFS peers depends on few WebRTC servers that act as gateways. Maybe WebTorrent has more RTC servers
text search is limited to starts with or contains
- Unfortunately sql.js doesn't have the text search module. There are maybe other implementations having it
small database
- since I consider 8 seconds the max time a user should wait for actually using a website of this kind, that means it can't host database heavier than 13 MB, so 130,000 records is the upper limit
non-tech users can be scared of the loading percentage (perceived as an ad or cryptominer)

Peace!

Boredcaveman | Posts

Database-less torrent website

TL;DR

Websites go down, torrents get lost

Database as a torrent, served from peers

Let's see it in action!

Shrinking the database, adding it on IPFS

IPFS on browser

SQLite on browser

Merge

Results

Final thoughts

Pros ordered by relevance

Cons ordered by relevance

Recommend

AtlasHackathon: weTalk (Inter college social media application)

I Was Betrayed By Computer Science Students

Creste vizualizari YouTube de la 0 la 100.000

Why Should Open Source Software Selectively Ignore Copyright?

Open 3D Engine: Amazon’s Old Clothes Or A Game Engine To Truly Get Excited About...

How Aaron Swartz protest broke my app

What to ask at job interviews when you are being interviewed

Obsidian: trouble in paradise

Virtual and Actual Events 2022

Deploy Strapi on Digital Ocean APP error

About Joyk