extratools: 145+ extra higher-level functional tools

Featured on GitHub's Trending Python repos on May 25, 2018. Thank you so much for support!

145+ extra higher-level functional tools that go beyond standard library's itertools , functools , etc. and popular third-party libraries like toolz , fancy , and more-itertools .

Like toolz and others, most of the tools are designed to be efficient, pure, and lazy. Several useful yet non-functional tools are also included.
While toolz and others target basic scenarios, this library targets more advanced and higher-level scenarios.
A few useful CLI tools for respective functions are also installed. They are available as extratools-[func] .

Full documentation is available here .

Why this library?

Typical pseudocode has less than 20 lines, where each line is a higher-level description. However, when implementing, many lower-level details have to be filled in.

This library reduces the burden of writing and refining the lower-level details again and again, by including an extensive set of carefully designed general purpose higher-level tools.

Current status and future plans?

There are currently 140+ functions among 17 categories, 3 data structures, and 3 CLI tools.

Currently adopted by TopSim and PrefixSpan-py .

This library is under active development, and new tools are added on weekly basis.

Any idea or contribution is highly welcome.

Besides many other interesting ideas, I am planning to make the following updates in recent days/weeks/months.

Add dicttools.unflatten and jsontools.unflatten .
Add trie and suffixtree (according to generalized suffix tree ).
Update seqtools.align to support more than two sequences.

No plan to implement tools that are well covered by other popular libraries.

Which tools are available?

Function Categories: debugtools dicttools gittools graphtools htmltools jsontools mathtools misctools printtools rangetools recttools seqtools settools sortedtools stattools strtools tabletools
Data Structures: defaultlist disjointsets segmenttree
CLI Tools: dicttools.remap jsontools.flatten stattools.teststats

Any example?

Here are ten examples out of our hundreds of tools.

jsontools.flatten(data, force=False) flattens a JSON object by returning all the tuples, each with a path and the respective value.

import json
from extratools.jsontools import flatten

flatten(json.loads("""{
  "name": "John",
  "address": {
    "streetAddress": "21 2nd Street",
    "city": "New York"
  },
  "phoneNumbers": [
    {
      "type": "home",
      "number": "212 555-1234"
    },
    {
      "type": "office",
      "number": "646 555-4567"
    }
  ],
  "children": [],
  "spouse": null
}"""))
# {'name': 'John',
#  'address.streetAddress': '21 2nd Street',
#  'address.city': 'New York',
#  'phoneNumbers[0].type': 'home',
#  'phoneNumbers[0].number': '212 555-1234',
#  'phoneNumbers[1].type': 'office',
#  'phoneNumbers[1].number': '646 555-4567',
#  'children': [],
#  'spouse': None}

rangetools.gaps(covered, whole=(-inf, inf)) computes the uncovered ranges of the whole range whole , given the covered ranges covered .

from math import inf
from extratools.rangetools import gaps

list(gaps(
    [(-inf, 0), (0.1, 0.2), (0.5, 0.7), (0.6, 0.9)],
    (0, 1)
))
# [(0, 0.1), (0.2, 0.5), (0.9, 1)]

recttools.heatmap(rect, rows, cols, points, usepos=False) computes the heatmap within rectangle rect by a grid of rows rows and cols columns.

from extratools.recttools import heatmap

heatmap(
    ((1, 1), (3, 4)),
    3, 4,
    [(1.5, 1.25), (1.5, 1.75), (2.75, 2.75), (2.75, 3.5), (3.5, 2.5)]
)
# {1: 2, 7: 1, 11: 1, None: 1}

heatmap(
    ((1, 1), (3, 4)),
    3, 4,
    [(1.5, 1.25), (1.5, 1.75), (2.75, 2.75), (2.75, 3.5), (3.5, 2.5)],
    usepos=True
)
# {(0, 1): 2, (1, 3): 1, (2, 3): 1, None: 1}

setcover(whole, covered, key=len) solves the set cover problem by covering the universe set whole as best as possible, using a subset of the covering sets covered .

from extratools.settools import setcover

list(setcover(
    { 1, 2, 3,         4,         5},
    [{1, 2, 3}, {2, 3, 4}, {2, 4, 5}]
))
# [{1, 2, 3}, {2, 4, 5}]

seqtools.compress(data, key=None) compresses the sequence data by encoding continuous identical items to a tuple of item and count, according to run-length encoding .

from extratools.seqtools import compress

list(compress([1, 2, 2, 3, 3, 3, 4, 4, 4, 4]))
# [(1, 1), (2, 2), (3, 3), (4, 4)]

mergeseqs(seqs, default=None, key=None) merges the sequences of equal length in seqs into a single sequences. Returns None if there is conflict in any position.

from extratools.seqtools import mergeseqs

seqs = [
    (0   , 0   , None, 0   ),
    (None, 1   , 1   , None),
    (2   , None, None, None),
    (None, None, None, None)
]

list(mergeseqs(seqs[1:]))
# [2,
#  1,
#  1,
#  None]

list(mergeseqs(seqs))
# None

strtools.smartsplit(s) finds the best delimiter to automatically split string s . Returns a tuple of delimiter and split substrings.

from extratools.strtools import smartsplit

smartsplit("abcde")
# (None,
#  ['abcde'])

smartsplit("a b c d e")
# (' ',
#  ['a', 'b', 'c', 'd', 'e'])

smartsplit("/usr/local/lib/")
# ('/',
#  ['', 'usr', 'local', 'lib', ''])

smartsplit("a ::b:: c :: d")
# ('::',
#  ['a ', 'b', ' c ', ' d'])

smartsplit("{1, 2, 3, 4, 5}")
# (', ',
#  ['{1', '2', '3', '4', '5}'])

strtools.learnrewrite(src, dst, minlen=3) learns the respective regular expression and template to rewrite src to dst .

from extratools.strtools import learnrewrite

learnrewrite(
    "Elisa likes Apple.",
    "Apple is Elisa's favorite."
)
# ('(.*) likes (.*).',
#  "{1} is {0}'s favorite.")

tabletools.parsebymarkdown(text) parses a text of multiple lines to a table, according to Markdown format.

from extratools.tabletools import parsebymarkdown

list(parsebymarkdown("""
| foo | bar |
| --- | --- |
| baz | bim |
"""))
# [['foo', 'bar'],
#  ['baz', 'bim']]

tabletools.hasheader(data) returns the confidence (between 0 and 1 ) of whether the first row of the table data is header.

from extratools.tabletools import hasheader

t = [
    ['Los Angeles'  , '34°03′'   , '118°15′'  ],
    ['New York City', '40°42′46″', '74°00′21″'],
    ['Paris'        , '48°51′24″', '2°21′03″' ]
]

hasheader(t)
# 0.0

hasheader([
    ['City', 'Latitude', 'Longitude']
] + t)
# 0.6666666666666666

hasheader([
    ['C1', 'C2', 'C3']
] + t)
# 1.0

How to install?

This package is available on PyPI. Just use pip3 install -U extratools to install it.

How to cite?

When using for research purpose, please cite this library as follows.

@misc{extratools,
  author = {Chuancong Gao},
  title = {{extratools}},
  howpublished = "\url{https://github.com/chuanconggao/extratools}",
  year = {2018}
}

Any recommended library?

There are several great libraries recommended to use together with extratools : regex sortedcontainers toolz sh

Featured on GitHub's Trending Python repos on May 25, 2018. Thank you so much for support!

Why this library?

Current status and future plans?

Which tools are available?

Any example?

How to install?

How to cite?

Any recommended library?

Recommend

GitHub - nmap/ncrack: Ncrack network authentication tool

科学松鼠会 » 他是天下母亲的救星，却一度被视为医界公敌

各国足球运动

一年时间的努力，从160+的死胖子变成了135的肌肉男，虽然还不是很完美，但是我复活了...

Kafka 源码解析之 Partition 副本迁移实现（十八） | Matt's Blog

yaf整合swoole开发web应用

微服务session落坑记 - 简书

找工作必看！数据科学与机器学习最全面试指南

自组织团队不是随机组建的

理解索引：HBase介绍和架构

About Joyk