865

Dankenstein: Markov Chain Twitter Bot Generator

 5 years ago
source link: https://www.tuicool.com/articles/hit/2Uziiqz
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Dankenstein

Dankenstein is a Markov Chain Twitter Bot generator, based on making a mashup of different corpora.

It is described in this blogpost .

Dependencies

The software itself is implemented using Bash, Make, and Python.

You'll need a Twitter Account and a Twitter application. From the latter, you'll need a consumer key, a consumer secret, an access token and an access token secret.

Enter these into twitterCredentials.sh

To generate the supported datasets you'll need :

  • Poppler
    brew install poppler
    sudo apt-get install -y poppler-utils
    

After that, you can run the importEnv.sh script in the dev -folder. This will create a Python virtual environment (conda if you have it, virtualenv otherwise) called dankenstein , and install all python-dependencies.

If you'd rather install them separately, they are: Tweepy, darklyrics, wikiquote, markovify; (optionally nltk)

Usage

Usage is based around Make. The following options are available:

make corpora:
    Builds all corpora

make list:
    Lists all available corpora

make combinations:
    Prints all possible combinations of two candidates - may include duplicates (with switched positions)

make model ARGS="corpus1 corpus2 [(scale1 scale2) stateSize overlapTotal overlapRatio tries sentences modelComplexity]":
    Generates a model based on two corpora (the only required args).

    If you define scale 1, you must define scale 2.

    All options inside the square brackets are optional.

    ARGS:
        corpus1
            string - the name of the first corpus
                no default

        corpus2
            string - the name of the second corpus
                no default

        scale1
            float/int - scale of corpus1 relative to corpus2
                defaults to 1

        scale2
            float/int - scale of corpus2 relative to corpus1
                defaults to 1

        stateSize
            int - state size of the Markov chain
                defaults to 2

        overlapTotal
            int - maximum sequential words that overlap with a sentence from the corpora
                defaults to 15

        overlapRatio
            int - maximum percentage of overlap with a sentence from the corpora
                defaults to 70

        tries
            int - attempts to make an original sentence
                defaults to 10

        sentences
            int - number of sentences
                defaults to 5

        modelComplexity
            string - naive | expert
                defaults to "naive"

    EXAMPLE:
        make model ARGS="tcm jobs 2 1 2 15 30 10 25 expert" && make sentences

make sentence:
    Outputs one sentence based on an available (previously trained) model.

make sentences:
    Outputs the amount of sentences given as an argument during model generation.

make clean:
    Deletes all corpora.

make tweet:
    Posts a tweet.

It is possible to se all combinations for a given candidate, e.g. make combinations | grep "picard +" .

Deployment:

Twitter secrets are set via environment variables, e.g.

export CONSUMER_KEY="consumer_key"
export CONSUMER_SECRET="consumer_secret"
export ACCESS_KEY="access_token"
export ACCESS_SECRET="access_token_secret"

An example deployment is described in the bottom of this blogpost .


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK