# expanding (adding a row or column) a scipy.sparse matrix

11.8k Views

Suppose I have a NxN matrix M (lil_matrix or csr_matrix) from scipy.sparse, and I want to make it (N+1)xN where M_modified[i,j] = M[i,j] for 0 <= i < N (and all j) and M[N,j] = 0 for all j. Basically, I want to add a row of zeros to the bottom of M and preserve the remainder of the matrix. Is there a way to do this without copying the data? • 1
• 1
• Spicy does provide routines. Look at Sidhant's answer below and give it a Thumps UP. hstack and vstack pages give you simple sample code too.

Scipy doesn't have a way to do this without copying the data but you can do it yourself by changing the attributes that define the sparse matrix.

There are 4 attributes that make up the csr_matrix:

data: An array containing the actual values in the matrix

indices: An array containing the column index corresponding to each value in data

indptr: An array that specifies the index before the first value in data for each row. If the row is empty then the index is the same as the previous column.

shape: A tuple containing the shape of the matrix

If you are simply adding a row of zeros to the bottom all you have to do is change the shape and indptr for your matrix.

``````x = np.ones((3,5))
x = csr_matrix(x)
x.toarray()
>> array([[ 1.,  1.,  1.,  1.,  1.],
[ 1.,  1.,  1.,  1.,  1.],
[ 1.,  1.,  1.,  1.,  1.]])
# reshape is not implemented for csr_matrix but you can cheat and do it  yourself.
x._shape = (4,5)
# Update indptr to let it know we added a row with nothing in it. So just append the last
# value in indptr to the end.
# note that you are still copying the indptr array
x.indptr = np.hstack((x.indptr,x.indptr[-1]))
x.toarray()
array([[ 1.,  1.,  1.,  1.,  1.],
[ 1.,  1.,  1.,  1.,  1.],
[ 1.,  1.,  1.,  1.,  1.],
[ 0.,  0.,  0.,  0.,  0.]])
``````

Here is a function to handle the more general case of vstacking any 2 csr_matrices. You still end up copying the underlying numpy arrays but it is still significantly faster than the scipy vstack method.

``````def csr_vappend(a,b):
""" Takes in 2 csr_matrices and appends the second one to the bottom of the first one.
Much faster than scipy.sparse.vstack but assumes the type to be csr and overwrites
the first matrix instead of copying it. The data, indices, and indptr still get copied."""

a.data = np.hstack((a.data,b.data))
a.indices = np.hstack((a.indices,b.indices))
a.indptr = np.hstack((a.indptr,(b.indptr + a.nnz)[1:]))
a._shape = (a.shape+b.shape,b.shape)
return a
``````
• 2
• I think you can get away with not even returning `a`, since function arguments are passed by reference, `a` is directly modified even within the function scope. Also, can there be a csc_happend(a,b) analog?
• Good idea, to simply reset the shape.
• Spicy does provide routines. Look at Sidhant's answer below and give it a Thumps UP. hstack and vstack pages give you simple sample code too.

Not sure if you're still looking for a solution, but maybe others can look into `hstack` and `vstack` - http://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.hstack.html. I think we can define a csr_matrix for the single additional row and then `vstack` it with the previous matrix.

• 2
• source code for vstack As this implies, it returns a new copy of the input matrices, thus not efficient enough if we want to expand a matrix inplace.

I don't think that there is any way to really escape from doing the copying. Both of those types of sparse matrices store their data as Numpy arrays (in the data and indices attributes for csr and in the data and rows attributes for lil) internally and Numpy arrays can't be extended.

LIL does stand for LInked List, but the current implementation doesn't quite live up to the name. The Numpy arrays used for `data` and `rows` are both of type object. Each of the objects in these arrays are actually Python lists (an empty list when all values are zero in a row). Python lists aren't exactly linked lists, but they are kind of close and quite frankly a better choice due to O(1) look-up. Personally, I don't immediately see the point of using a Numpy array of objects here rather than just a Python list. You could fairly easily change the current lil implementation to use Python lists instead which would allow you to add a row without copying the whole matrix.