# Filtering a list based on a list of booleans

4.3k Views

I have a list of values which I need to filter given the values in a list of booleans:

``````list_a = [1, 2, 4, 6]
filter = [True, False, True, False]
``````

I generate a new filtered list with the following line:

``````filtered_list = [i for indx,i in enumerate(list_a) if filter[indx] == True]
``````

which results in:

``````print filtered_list
[1,4]
``````

The line works but looks (to me) a bit overkill and I was wondering if there was a simpler way to achieve the same.

1- Don't name a list `filter` like I did because it is a built-in function.

2- Don't compare things to `True` like I did with `if filter[idx]==True..` since it's unnecessary. Just using `if filter[idx]` is enough. • 2
• Some style notes: `if filter[indx] == True` Do not use `==` if you want to check for identity with `True`, use `is`. Anyway in this case the whole comparison is useless, you could simply use `if filter[indx]`. Lastly: never use the name of a built-in as a variable/module name(I'm referring to the name `filter`). Using something like `included`, so that the `if` reads nicely (`if included[indx]`).

You're looking for `itertools.compress`:

``````>>> from itertools import compress
>>> list_a = [1, 2, 4, 6]
>>> fil = [True, False, True, False]
>>> list(compress(list_a, fil))
[1, 4]
``````

## Timing comparisons(py3.x):

``````>>> list_a = [1, 2, 4, 6]
>>> fil = [True, False, True, False]
>>> %timeit list(compress(list_a, fil))
100000 loops, best of 3: 2.58 us per loop
>>> %timeit [i for (i, v) in zip(list_a, fil) if v]  #winner
100000 loops, best of 3: 1.98 us per loop

>>> list_a = [1, 2, 4, 6]*100
>>> fil = [True, False, True, False]*100
>>> %timeit list(compress(list_a, fil))              #winner
10000 loops, best of 3: 24.3 us per loop
>>> %timeit [i for (i, v) in zip(list_a, fil) if v]
10000 loops, best of 3: 82 us per loop

>>> list_a = [1, 2, 4, 6]*10000
>>> fil = [True, False, True, False]*10000
>>> %timeit list(compress(list_a, fil))              #winner
1000 loops, best of 3: 1.66 ms per loop
>>> %timeit [i for (i, v) in zip(list_a, fil) if v]
100 loops, best of 3: 7.65 ms per loop
``````

Don't use `filter` as a variable name, it is a built-in function.

• @Mehdi I find the Matlab way highly unintuitive, but I suppose it depends on what you are used to.
• How can I select `[2, 6]` ?
• I get it, `list(compress(list_a, [not i for i in fill]))` should return `[2, 6]`

Like so:

``````filtered_list = [i for (i, v) in zip(list_a, filter) if v]
``````

Using `zip` is the pythonic way to iterate over multiple sequences in parallel, without needing any indexing. This assumes both sequences have the same length (zip stops after the shortest runs out). Using `itertools` for such a simple case is a bit overkill ...

One thing you do in your example you should really stop doing is comparing things to True, this is usually not necessary. Instead of `if filter[idx]==True: ...`, you can simply write `if filter[idx]: ...`.

With numpy:

``````In : list_a = np.array([1, 2, 4, 6])
In : filter = np.array([True, False, True, False])
In : list_a[filter]

Out: array([1, 4])
``````

or see Alex Szatmary's answer if list_a can be a numpy array but not filter

Numpy usually gives you a big speed boost as well

``````In : list_a = [1, 2, 4, 6]*10000
In : fil = [True, False, True, False]*10000
In : list_a_np = np.array(list_a)
In : fil_np = np.array(fil)

In : %timeit list(itertools.compress(list_a, fil))
1000 loops, best of 3: 625 us per loop

In : %timeit list_a_np[fil_np]
10000 loops, best of 3: 173 us per loop
``````
• 2
• Good point, I prefer using `NumPy` over `list` where possible. But if you need to use `list` anyway, you have (using `NumPy` solution) create `np.array` from both lists, use boolean indexing and finally converting array back to list with `tolist()` method. To be precise, you should include those objects creation into time comparison. Then, using `itertools.compress` will be still the fastest solution.

To do this using numpy, ie, if you have an array, `a`, instead of `list_a`:

``````a = np.array([1, 2, 4, 6])
my_filter = np.array([True, False, True, False], dtype=bool)
a[my_filter]
> array([1, 4])
``````
• If you turn my_filter into a boolean array, you can use direct boolean indexing, without the need for `where`.
``````filtered_list = [list_a[i] for i in range(len(list_a)) if filter[i]]
With python 3 you can use `list_a[filter]` to get `True` values. To get `False` values use `list_a[~filter]`