Orger: plaintext reflection of your digital self

TLDR: I'll write about orger , a tool I'm using to convert my personal data into easily readable and searchable org-mode views. I'll present some examples and use cases, that will hopefully be helpful to you as well even if you are not sold by using my tool.

There is alsosecond part where I'm explaining how it can be used to read Reddit, create quick tasks from Telegram messages and help with spaced repetition.

If you're impatient, you can jump straight to a.

1 Intro

I consume lots of digital content (books, articles, reddit, youtube, reddit, etc.) and most of it I find somewhat useful and insightful. I want to use that knowledge later, act and build on it. But there's an obstacle: human brain.

It would be cool to be capable of always remembering and instantly recalling information you've interacted with, metadata and your thoughts on it. Until we get augmented though, there are two options: first is just to accept it and live with it. You might have guessed this is not the option I'm taking.

Second option is compensating for your sloppy meaty memory and having information you've read at hand and a quick way of searching over it.

That sounds simple enough but as with many simple things, on practice you run into obstacles. I'll give some I've personally been overcoming as examples:

convenience of access, e.g.:
- to access highlights and notes on my Kobo ebook I need to actually reach my reader and tap through e-ink touch screen. Not much fun!
- if you want to search over annotations in your PDF collections… well good luck, I'm just not aware of such a tool. Let alone the fact that many pdf viewers wouldn't even let you search through highlights within a single opened PDF file. TODO link annotations
- there is no easy way to access all of your twitter favorites, people suggest using hacks like autoscroll extension .
searching data, e.g.:
```
Ctrl-F
```
data ownership and liberation, e.g.
- what happens if data disappears or service is down (temporary/permanently) or banned by your government?
- 99% of services don't have support for offline mode. This may be just a small inconvenience if you're on a train or something, but there is more to it. What if some sort of apocalypse happens and you lose all access to data? That depends on your paranoia level of course, and apocalypse is bad enough, but my take on it is that at least I'd have my data :)
- if you delete a book on Kobo, not only you can't access its annotations, but they seem to get wiped from the database.
- in 2018, instapaper was unavailable in Europe for several months (!) due to missing the GDPR deadline

Thinking about that and tinkering helped me understand what I want: some sort of search engine , over my personal data , with uniform and always available way of accessing it.

So, I present you a system that I've developed and that solves all my problems™: orger .

2 What Orger does

It's really so quite trivial that it's almost stupid. Orger provides a simple python API to render any data as an Org-mode file. It's easier to give an example:

from orger import StaticView
from orger.inorganic import node, link
from orger.common import dt_heading

import my.github_data

class Github(StaticView):
  def get_items(self):
    for event in my.github_data.get_events():
      yield node(dt_heading(event.dt, event.summary))

Github.main()

That ten line program results in a file Github.org :

# AUTOGENERATED BY /code/orger/github.py

* [2016-10-30 Sun 10:29] opened PR Add __enter__ and __exit__ to Pool stub
* [2016-11-10 Thu 09:29] opened PR Update gradle to 2.14.1 and gradle plugin to 2.1.1
* [2016-11-16 Wed 20:20] commented on issue Linker error makes it impossible to use a stack-provided ghc
* [2016-12-30 Fri 11:57] commented on issue Fix performance in the rare case of hashCode evaluating to zero 
* [2019-09-21 Sat 16:51] commented on issue Tags containing letters outside of a-zA-Z
....

Even with event summaries only it can already be very useful to search over. What you can potentially do really depends on your imagination and needs! You can also add:

links
tags
timestamps
properties
child nodes

Seesection for more.

So as you can see orger itself is a really not sophisticated tool, at least until you spend time trying to reimplement the same. As always the devil is in the details (look at that cheeky my.github_data import), which I'll explain.

3 Demo: displaying Pocket data via Orger

I've documented one of modules, pocket_demo so you could get the sense of using Orger.

Click to view the code

#!/usr/bin/env python3
"""
Demo Orger adapter for Pocket data. For documentation purposes, so please modify pocket.py if you want to contribute.
"""

"""
First we define some abstractions for Pocket entities (articles and highlights).

While it's not that necessary and for one script you can get away with using json directly,
 it does help to separate parsing and rendering, allows you to reuse parsing for other projects
 and generally makes everything clean.

Also see https://github.com/karlicoss/my package for some inspiration.
"""


from datetime import datetime
from pathlib import Path
from typing import NamedTuple, Sequence, Any

class Highlight(NamedTuple):
    """
    Abstract representation of Pocket highlight
    """
    json: Any

    @property
    def text(self) -> str:
        return self.json['quote']

    @property
    def created(self) -> datetime:
        return datetime.strptime(self.json['created_at'], '%Y-%m-%d %H:%M:%S')


class Article(NamedTuple):
    """
    Abstract representation of Pocket saved page
    """
    json: Any

    @property
    def url(self) -> str:
        return self.json['given_url']

    @property
    def title(self) -> str:
        return self.json['given_title']

    @property
    def pocket_link(self) -> str:
        return 'https://app.getpocket.com/read/' + self.json['item_id']

    @property
    def added(self) -> datetime:
        return datetime.fromtimestamp(int(self.json['time_added']))

    @property
    def highlights(self) -> Sequence[Highlight]:
        raw = self.json.get('annotations', [])
        return list(map(Highlight, raw))

    # TODO add tags?


def get_articles(json_path: Path) -> Sequence[Article]:
    """
    Parses Pocket export produced by https://github.com/karlicoss/pockexport
    """
    import json
    raw = json.loads(json_path.read_text())['list']
    return list(map(Article, raw.values()))

"""
Ok, now we can get to implementing the adapter.
"""
from orger import StaticView
"""
StaticView means it's meant to be read-only view onto data (as opposed to InteractiveView).
"""
from orger.inorganic import node, link
from orger.common import dt_heading


class PocketView(StaticView):
    def get_items(self):
        """
        get_items returns a sequence/iterator of nodes
        see orger.inorganic.OrgNode to find out about attributes you can use
        """
        export_file = self.cmdline_args.file # see setup_parser
        for a in get_articles(export_file):
            yield node(
                heading=dt_heading(
                    a.added,
                    link(title=a.title, url=a.url)
                ),
                body=link(title='Pocket link', url=a.pocket_link), # permalink is pretty convenient to jump straight into Pocket app
                children=[node( # comments are displayed as org-mode child entries
                    heading=dt_heading(hl.created, hl.text)
                ) for hl in a.highlights]
            )


def setup_parser(p):
    """
    Optional hooks for extra arguments if you need them in your adapter
    """
    p.add_argument('--file', type=Path, help='JSON file from API export', required=True)


if __name__ == '__main__':
    """
    Usage example: ./pocket.py --file /backups/pocket/last-backup.json --to /data/orger/pocket.org
    """
    PocketView.main(setup_parser=setup_parser)

"""
Example pocket.org output:

# AUTOGENERATED BY /L/zzz_syncthing/coding/orger/pocket.py

* [2018-07-09 Mon 10:56] [[https://www.gwern.net/Complexity-vs-AI][Complexity no Bar to AI - Gwern.net]]
 [[https://app.getpocket.com/read/1949330650][Pocket link]]
* [2016-10-21 Fri 14:42] [[https://johncarlosbaez.wordpress.com/2016/09/09/struggles-with-the-continuum-part-2/][Struggles with the Continuum (Part 2) | Azimuth]]
 [[https://app.getpocket.com/read/1407671000][Pocket link]]
* [2016-05-31 Tue 18:25] [[http://www.scottaaronson.com/blog/?p=2464][Bell inequality violation finally done right]]
 [[https://app.getpocket.com/read/1042711293][Pocket link]]
* [2016-05-31 Tue 18:24] [[https://packetzoom.com/blog/how-to-test-your-app-in-different-network-conditions.html][How to test your app in different network conditions -]]
 [[https://app.getpocket.com/read/1188624587][Pocket link]]
* [2016-05-31 Tue 18:24] [[http://www.schibsted.pl/2016/02/hood-okhttps-cache/][What's under the hood of the OkHttp's cache?]]
 [[https://app.getpocket.com/read/1191143185][Pocket link]]
* [2016-03-15 Tue 17:27] [[http://joeduffyblog.com/2016/02/07/the-error-model/][Joe Duffy - The Error Model]]
 [[https://app.getpocket.com/read/1187239791][Pocket link]]
** [2019-09-25 Wed 18:20] A bug is a kind of error the programmer didn’t expect. Inputs weren’t validated correctly, logic was written wrong, or any host of problems have arisen.
** [2019-09-25 Wed 18:19] First, throwing an exception is usually ridiculously expensive. This is almost always due to the gathering of a stack trace.
** [2019-09-25 Wed 18:20] In other words, an exception, as with error codes, is just a different kind of return value!
"""

Click to view the output

[2018-07-09 Mon 10:56] Complexity no Bar to AI - Gwern.net

Pocket link

[2016-10-21 Fri 14:42] Struggles with the Continuum (Part 2) | Azimuth

Pocket link

[2016-05-31 Tue 18:25] Bell inequality violation finally done right

Pocket link

[2016-05-31 Tue 18:24] How to test your app in different network conditions -

Pocket link

[2016-05-31 Tue 18:24] What's under the hood of the OkHttp's cache?

Pocket link

[2016-03-15 Tue 17:27] Joe Duffy - The Error Model

Pocket link

[2019-09-25 Wed 18:20] A bug is a kind of error the programmer didn’t expect. Inputs weren’t validated correctly, logic was written wrong, or any host of problems have arisen.

[2019-09-25 Wed 18:19] First, throwing an exception is usually ridiculously expensive. This is almost always due to the gathering of a stack trace.

[2019-09-25 Wed 18:20] In other words, an exception, as with error codes, is just a different kind of return value!

As you can see it's quite easy to search in your highlights and jump straight in the pocket app to the article you were reading.

4 More examples

I'm using more than ten different Orger modules, most of which I've moved into the repository . Here I'll describe some featured views I'm generating.

To give you a heads up, if you read the code, you'll see bunch of imports like from my.hypothesis import ... . I find it easier to move all data parsing in a separate my package, that deals with parsing and converting input data (typically, some JSON). That makes everything less messy, separates data and rendering and lets me reuse abstract models in other tools. Also that lets me access my data from any python code, which makes it way easier to use and interact with data.

Some of these are still private so if you're interested in something not present in the github repo, please don't be shy and open an issue so I can prioritize.

Hopefully the code is readable enough and will give you some inspiration. If you find something confusing or you write your own module and want to contribute, please feel free to open issue/PR!

instapaper

Instapaperdoesn't have search over annotations, so I implemented my own!

hypothesis

Hypothesis does have search, but it's still way quicker for me to invoke search in Emacs (takes literally less than a second) than do that in web browser.

kobo

Generates views for all highlights and comments along with book titles from my Kobo database export.

pinboard

Searches over my Pinboard bookmarks.

pdfs

Crawls my filesystem for PDF files and collects all highlights and comments in a single view.

twitter

It's got two modes

First mode generates a view of everything I've ever tweeted, so I can search over it.
Second mode generates a view of all older tweets from the previous years posted on the same day. I find it quite fascinating to read through it and observe how I've been changing over years.

rtm2org

I stopped using Remember The Milk a while ago, but there are still some tasks and notes I've left behind, which I'm slowly moving to org-mode or canceling over time.

telegram2org

Lets me create todo tasks from Telegram messages in couple of taps (you can't use share function on them in Android).

I write about itin the second part.

Displays and lets me search my Reddit saved posts/comments.

I write about itin the second part.

5 It does sound very simple. Does that really deserve a post?

Well yeah it really does seem simple… until you try to do it.

emitting Org-mode

While it's plaintext, and generating simple outlines is trivial, with more sophisticated inputs, there is some nasty business of escaping and sanitizing that has to be dealt with. I didn't manage to find any Python libraries capable of emitting Org-mode. Only project I knew of was PyOrgMode but the author abandoned it.

When it comes to generating 10+ views from different data sources, you really want to make sure it's as little effort and minimal boilerplate as it can possibly be.

That's how inorganic library was born.
accessing data sources and exposing it through Python interfaces

This is probably where most of effort was spent. All sorts of stupid APIs, tedious parsing, you can imagine.

I'll write separately about it sometime, for now you can see some of the code I prettified and shared in my github 'export' and my packages. I tried to make sure they are easy to use for other people and not specific to my use cases.
keeping track of already processed items for Interactive views

Because there is no feedback from org-mode files back to data sources, you want to keep track of items already added in the file, otherwise you're gonna have duplicates.

It's not rocket science of course, but it is quite tedious. There is some additional logic that checks for lock files, makes sure writes are atomic, etc. You really don't want to implement it more than once. I figured it was worth extracting this 'pattern' in a separate python module .

6 What makes Orger good?

it solves!

I won't go long into Org-mode propaganda, there are people that do it better than me out there, but for me it's good because it's a decent balance between ease of use and ease of augmenting.
- it's easy to do unstructured (i.e. grep) or structured (i.e. tag search in emacs) search on any of your devices be it desktop or phone
- you can open it anywhere you can open a text file
- tasks as easy to create as any other Org outline so it can integrate with your todo list and agenda (see more inthe second part).
it doesn't require Emacs

If you're not willing to go full on Emacs, you can still benefit from this setup by using plaintext viewer and search tool of your choice.
written in Python. I don't claim at all that Python is the best programming language, but that's the one I'm most productive on as well as many other people.

Also the fact that it's a real programming language rather than some YAML config makes sure you can do anything and not restricted by stupid DSL.
it's extremely easy to add new views — a matter of 10-20 lines of code.
agnostic to what you feed in it – it could be offline data from your regular backups, or it could be fresh API data. Again, it's a real programming language, you can do literally anything.

7 Using Orger views

Apart from, obviously, opening org mode file in your favorite text editor, one major strength of this system is being able to search over them.

I'll write about my setup separately at some point, but for now I'll give a quick summary and clues.

On my desktop I'm just using spacemacs or cloudmacs from web browser.

I'm usually just using helm-ag with ripgrep (your can find how to marry them here ).
sometimes helm-swoop is very convenient, especially helm-multi-swoop function.

These two are incremental (instant feedback) and effectively instantaneous. For more structured search you could use:

good old org-tags-view
org-ql with helm-org-ql is a nicer and incremental alternative

On my Android phone I'm using orglzy for structured search/viewing Org-mode files. Sometimes Docsearch + is also useful, which is indexing plaintext files and lets you search in them. While it's not tailored for org-mode files, it's usually good enough for me.

You can also set up some proper indexing daemon like recoll .

Typical use patterns

I'll just give some of my use cases:

While running tests for orgparse I started randomly getting AssertionError: Cannot find component 'A@3' for 'orgparse.A@3 .

I recall that I had same issue few month ago but don't quite remember what was the fix. I press F1 which invokes helm-ag for me and type 'cannot find component'. I instantly find a github issue I opened in github.org and figure out what I need to do to work around the problem.
While discussing special relativity with a friend, I recall watching some intuitive rationale for Maxwell's equations, but don't quite recall what was the video.

I press F1 , type 'Special relativity' and instantly get few results, in particular this awesome Veritasium video in youtube.org , which I was looking for.
Recommending books

I often struggle to recall the details why I liked a particular book, especially fiction. Having all annotations in my kobo.org file lets me quickly look up and skim through highlighted bits so I can freshen up my memory.

8 Potential improvements

TODO more frequent, ideally realtime updates to views

If the API doesn't provide push-based interface (as most of them), ultimately it's a question of polling them carefully to avoid rate limiting penalties.

TODO alternative export formats

There is nothing really about Org-mode that's specific to this system. For instance, there are markdown-based organizers out there and people could benefit from using Orger for them.

TODO two-way data flow

It would be cool to implement feedback from emacs, e.g. editing Github comment when you edit the corresponding Orger item. But it requires considerably more effort and would only work within emacs.

TODO potential for race condition

Unfortunately there is a little space for race condition if Orger appends something while you're editing file. Orger tries to detect emacs and vim swap/lock files, but it's if you're very unlucky or using different setup it's still possible. Hopefully your text editor warns you when the file had been overwritten while you were editing it (e.g. as emacs does).

Also I run Orger jobs at night (via cron) so it's quite unlikely to overlap with editing anything.

9 ----

I'd be interesting in hearing your thoughts or feature requests.

This post ended up longer that I expected so in the next part I will tell about more use cases, in particular how I'm using Orger to process Reddit.

1 Intro

2 What Orger does

3 Demo: displaying Pocket data via Orger

[2018-07-09 Mon 10:56] Complexity no Bar to AI - Gwern.net

[2016-10-21 Fri 14:42] Struggles with the Continuum (Part 2) | Azimuth

[2016-05-31 Tue 18:25] Bell inequality violation finally done right

[2016-05-31 Tue 18:24] How to test your app in different network conditions -

[2016-05-31 Tue 18:24] What's under the hood of the OkHttp's cache?

[2016-03-15 Tue 17:27] Joe Duffy - The Error Model

4 More examples

5 It does sound very simple. Does that really deserve a post?

6 What makes Orger good?

7 Using Orger views

Typical use patterns

8 Potential improvements

TODO more frequent, ideally realtime updates to views

TODO alternative export formats

TODO two-way data flow

TODO potential for race condition

9 ----

Recommend

About Joyk