# Numpy modify array in place?

12.3k Views

I have the following code which is attempting to normalize the values of an `m x n` array (It will be used as input to a neural network, where `m` is the number of training examples and `n` is the number of features).

However, when I inspect the array in the interpreter after the script runs, I see that the values are not normalized; that is, they still have the original values. I guess this is because the assignment to the `array` variable inside the function is only seen within the function.

How can I do this normalization in place? Or do I have to return a new array from the normalize function?

``````import numpy

def normalize(array, imin = -1, imax = 1):
"""I = Imin + (Imax-Imin)*(D-Dmin)/(Dmax-Dmin)"""

dmin = array.min()
dmax = array.max()

array = imin + (imax - imin)*(array - dmin)/(dmax - dmin)
print array

def main():

for column in array.T:
normalize(column)

return array

if __name__ == "__main__":
a = main()
`````` If you want to apply mathematical operations to a numpy array in-place, you can simply use the standard in-place operators `+=`, `-=`, `/=`, etc. So for example:

``````>>> def foo(a):
...     a += 10
...
>>> a = numpy.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> foo(a)
>>> a
array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])
``````

The in-place version of these operations is a tad faster to boot, especially for larger arrays:

``````>>> def normalize_inplace(array, imin=-1, imax=1):
...         dmin = array.min()
...         dmax = array.max()
...         array -= dmin
...         array *= imax - imin
...         array /= dmax - dmin
...         array += imin
...
>>> def normalize_copy(array, imin=-1, imax=1):
...         dmin = array.min()
...         dmax = array.max()
...         return imin + (imax - imin) * (array - dmin) / (dmax - dmin)
...
>>> a = numpy.arange(10000, dtype='f')
>>> %timeit normalize_inplace(a)
10000 loops, best of 3: 144 us per loop
>>> %timeit normalize_copy(a)
10000 loops, best of 3: 146 us per loop
>>> a = numpy.arange(1000000, dtype='f')
>>> %timeit normalize_inplace(a)
100 loops, best of 3: 12.8 ms per loop
>>> %timeit normalize_copy(a)
100 loops, best of 3: 16.4 ms per loop
``````
• 2
• what is `%timeit`? That looks interesting, is it built-in?
• 1
• The version I use here is only built in to ipython. But it's based on the `timeit` function in the `timeit` module.
• 1
• Ah finally looked at ipython. Funny I had always associated it with ironpython, mistakenly I now see.
• @User, yeah it's quite useful at times. I usually just use the regular python shell, but for timings, the `%timeit` "magic command" in incredibly handy, because it takes care of all the awkward setup for you.

This is a trick that it is slightly more general than the other useful answers here:

``````def normalize(array, imin = -1, imax = 1):
"""I = Imin + (Imax-Imin)*(D-Dmin)/(Dmax-Dmin)"""

dmin = array.min()
dmax = array.max()

array[...] = imin + (imax - imin)*(array - dmin)/(dmax - dmin)
``````

Here we are assigning values to the view `array[...]` rather than assigning these values to some new local variable within the scope of the function.

``````x = np.arange(5, dtype='float')
print x
normalize(x)
print x

>>> [0. 1. 2. 3. 4.]
>>> [-1.  -0.5  0.   0.5  1. ]
``````

EDIT:

It's slower; it allocates a new array. But it may be valuable if you are doing something more complicated where builtin in-place operations are cumbersome or don't suffice.

``````def normalize2(array, imin=-1, imax=1):
dmin = array.min()
dmax = array.max()

array -= dmin;
array *= (imax - imin)
array /= (dmax-dmin)
array += imin

A = np.random.randn(200**3).reshape( * 3)
%timeit -n5 -r5 normalize(A)
%timeit -n5 -r5 normalize2(A)

>> 47.6 ms ± 678 µs per loop (mean ± std. dev. of 5 runs, 5 loops each)
>> 26.1 ms ± 866 µs per loop (mean ± std. dev. of 5 runs, 5 loops each)
``````
• 2
• and what would be the timings of it?
``````def normalize(array, imin = -1, imax = 1):
"""I = Imin + (Imax-Imin)*(D-Dmin)/(Dmax-Dmin)"""

dmin = array.min()
dmax = array.max()

array -= dmin;
array *= (imax - imin)
array /= (dmax-dmin)
array += imin

print array
``````
• 1
• Performance-wise is there any issue doing it this way? How does it compare to creating a new array?
• 1
• I mean, for that you'd have to benchmark. It depends on the size of the array. For small-ish problems, I would certainly just create the new array.

There is a nice way to do in-place normalization when using numpy. `np.vectorize` is is very usefull when combined with a `lambda` function when applied to an array. See the example below:

``````import numpy as np

def normalizeMe(value,vmin,vmax):

vnorm = float(value-vmin)/float(vmax-vmin)

return vnorm

imin = 0
imax = 10
feature = np.random.randint(10, size=10)

# Vectorize your function (only need to do it once)
temp = np.vectorize(lambda val: normalizeMe(val,imin,imax))
normfeature = temp(np.asarray(feature))

print feature
print normfeature
``````

One can compare the performance with a generator expression, however there are likely many other ways to do this.

``````%%timeit
temp = np.vectorize(lambda val: normalizeMe(val,imin,imax))
normfeature1 = temp(np.asarray(feature))
10000 loops, best of 3: 25.1 µs per loop

%%timeit
normfeature2 = [i for i in (normalizeMe(val,imin,imax) for val in feature)]
100000 loops, best of 3: 9.69 µs per loop

%%timeit
normalize(np.asarray(feature))
100000 loops, best of 3: 12.7 µs per loop
``````

So vectorize is definitely not the fastest, but can be conveient in cases where performance is not as important.