Let's make a reference implementation of N-dimensional pixel hoeing / counting for Python digital

advertisements

I frequently want to pixel bin/pixel bucket a numpy array, meaning, replace groups of N consecutive pixels with a single pixel which is the sum of the N replaced pixels. For example, start with the values:

x = np.array([1, 3, 7, 3, 2, 9])

with a bucket size of 2, this transforms into:

bucket(x, bucket_size=2)
= [1+3, 7+3, 2+9]
= [4, 10, 11]

As far as I know, there's no numpy function that specifically does this (please correct me if I'm wrong!), so I frequently roll my own. For 1d numpy arrays, this isn't bad:

import numpy as np

def bucket(x, bucket_size):
    return x.reshape(x.size // bucket_size, bucket_size).sum(axis=1)

bucket_me = np.array([3, 4, 5, 5, 1, 3, 2, 3])
print(bucket(bucket_me, bucket_size=2)) #[ 7 10  4  5]

...however, I get confused easily for the multidimensional case, and I end up rolling my own buggy, half-assed solution to this "easy" problem over and over again. I'd love it if we could establish a nice N-dimensional reference implementation.

Preferably the function call would allow different bin sizes along different axes (perhaps something like bucket(x, bucket_size=(2, 2, 3)))
Preferably the solution would be reasonably efficient (reshape and sum are fairly quick in numpy)
Bonus points for handling edge effects when the array doesn't divide nicely into an integer number of buckets.
Bonus points for allowing the user to choose the initial bin edge offset.

As suggested by Divakar, here's my desired behavior in a sample 2-D case:

x = np.array([[1, 2, 3, 4],
              [2, 3, 7, 9],
              [8, 9, 1, 0],
              [0, 0, 3, 4]])

bucket(x, bucket_size=(2, 2))
= [[1 + 2 + 2 + 3, 3 + 4 + 7 + 9],
   [8 + 9 + 0 + 0, 1 + 0 + 3 + 4]]
= [[8, 23],
   [17, 8]]

...hopefully I did my arithmetic correctly ;)

Natively from as_strided :

x = array([[1, 2, 3, 4],
           [2, 3, 7, 9],
           [8, 9, 1, 0],
           [0, 0, 3, 4]])

from numpy.lib.stride_tricks import as_strided
def bucket(x,bucket_size):
      x=np.ascontiguousarray(x)
      oldshape=array(x.shape)
      newshape=concatenate((oldshape//bucket_size,bucket_size))
      oldstrides=array(x.strides)
      newstrides=concatenate((oldstrides*bucket_size,oldstrides))
      axis=tuple(range(x.ndim,2*x.ndim))
      return as_strided (x,newshape,newstrides).sum(axis)

if a dimension not divide evenly into the corresponding dimension of x, remaining elements are lost.

verification :

In [9]: bucket(x,(2,2))
Out[9]:
array([[ 8, 23],
       [17,  8]])

Let's make a reference implementation of N-dimensional pixel hoeing / counting f...

Let's make a reference implementation of N-dimensional pixel hoeing / counting for Python digital

Recommend

groupBy on multiple values

to call a stored procedure from php

Why is WeakReference to a WeakRef object not recovered?

CudaGraphicsGLRegisterImage (...) returns cudaErrorUnknown; What are the possibl...

Selection of the number of threads dynamically in C #?

How do I know if I lose IMalloc memory?

Using jQuery fadeToggle for the navigation bar, how to show / hide content one a...

Dynamic Web Form, fields can be nested to an arbitrary depth, also add / remove...

Use a bind_param () with a variable number of entries

How do I run the GAE / J Convenience Store before performing Maven integration t...

About Joyk