NumPy | digitize method
digitize(~) method returns a Numpy array of indices of the bins to which the values in the input array belongs to. To explain in plain words is difficult, so please look at the examples for clarification.
The array of values.
The array of bins, which must be one-dimensional and sorted in ascending order.
If True, then value will be placed in the next bin at the endpoints. If False, then value will be placed in the previous bin. By default,
A Numpy array of integer indices.
Consider the following code snippet:
# Our array of valuesa = [3, 6.5, 9]# Our binsbins = [5, 6, 7, 8]np.digitize(a, bins)array([0, 2, 4])
Let's understand the output here.
The first value 3 is between 3 <= 5 (the first bin), so the returned integer index is 0.
The second value 6.5 is between 6 and 7 (2nd and 3rd bin), so the returned integer index is 2.
The third value 9 is larger than 8 (the 4th bin), so the returned integer index is 4.
A nice way of wrapping your head around this is to think of the index of the value if it were to be inserted into the bins array. For instance, the value 3 will be inserted into index 0, so 0 is returned. 6.5 will be inserted into index 2, so 2 is returned, and so on.
Handling the endpoints
By default, when checking for which bin to place a value in, Numpy will use the < comparison. For instance,
a = bins = [5, 6]np.digitize(a, bins) # or right=Falsearray()
The reason we get an integer index of 1 is that the first comparison we perform is 5 < 5, which evaluates to
Instead of a < comparison, we can perform a <= comparison, like follows:
a = bins = [5, 6]np.digitize(a, bins, right=True)array()
Here, we get a integer index of 0 because the first comparison 5 <=5 evaluates to