• 10
name

A PHP Error was encountered

Severity: Notice

Message: Undefined index: userid

Filename: views/question.php

Line Number: 191

Backtrace:

File: /home/prodcxja/public_html/questions/application/views/question.php
Line: 191
Function: _error_handler

File: /home/prodcxja/public_html/questions/application/controllers/Questions.php
Line: 433
Function: view

File: /home/prodcxja/public_html/questions/index.php
Line: 315
Function: require_once

name Punditsdkoslkdosdkoskdo

Reversal of string.contains In python, pandas

I have something like this in my code:

df2 = df[df['A'].str.contains("Hello|World")]

However, I want all the rows that don't contain either of Hello or World. How do I most efficiently reverse this?

You can use the tilde ~ to flip the bool values:

>>> df = pd.DataFrame({"A": ["Hello", "this", "World", "apple"]})
>>> df.A.str.contains("Hello|World")
0     True
1    False
2     True
3    False
Name: A, dtype: bool
>>> ~df.A.str.contains("Hello|World")
0    False
1     True
2    False
3     True
Name: A, dtype: bool
>>> df[~df.A.str.contains("Hello|World")]
       A
1   this
3  apple

[2 rows x 1 columns]

Whether this is the most efficient way, I don't know; you'd have to time it against your other options. Sometimes using a regular expression is slower than things like df[~(df.A.str.contains("Hello") | (df.A.str.contains("World")))], but I'm bad at guessing where the crossovers are.

  • 40
Reply Report
      • 1
    • Much better than a convoluted negative lookaround test. No experience with Pandas myself, however, so I have no idea what would be the faster approach.
    • The regex lookaround test took significantly longer (about 30s vs 20s), and the two methods apparently have slightly different results (3663K result vs 3504K - from ~3G original - haven't looked to see specifics).
    • @DSM I have seen this ~ symbol many times, specially in JavaScript. Haven't seen in python. What does it mean, exactly?

The .contains() method uses regular expressions, so you can use a negative lookahead test to determine that a word is not contained:

df['A'].str.contains(r'^(?:(?!Hello|World).)*$')

This expression matches any string where the words Hello and World are not found anywhere in the string.

Demo:

>>> df = pd.DataFrame({"A": ["Hello", "this", "World", "apple"]})
>>> df['A'].str.contains(r'^(?:(?!Hello|World).)*$')
0    False
1     True
2    False
3     True
Name: A, dtype: bool
>>> df[df['A'].str.contains(r'^(?:(?!Hello|World).)*$')]
       A
1   this
3  apple
  • 7
Reply Report

Trending Tags