• 7
name

A PHP Error was encountered

Severity: Notice

Message: Undefined index: userid

Filename: views/question.php

Line Number: 191

Backtrace:

File: /home/prodcxja/public_html/questions/application/views/question.php
Line: 191
Function: _error_handler

File: /home/prodcxja/public_html/questions/application/controllers/Questions.php
Line: 433
Function: view

File: /home/prodcxja/public_html/questions/index.php
Line: 315
Function: require_once

name Punditsdkoslkdosdkoskdo

How to remove bad path characters in Python?

What is the most cross platform way of removing bad path characters (e.g. "\" or ":" on Windows) in Python?

Solution

Because there seems to be no ideal solution I decided to be relatively restrictive and did use the following code:

def remove(value, deletechars):
    for c in deletechars:
        value = value.replace(c,'')
    return value;

print remove(filename, '\/:*?"<>|')

I think the safest approach here is to just replace any suspicious characters. So, I think you can just replace (or get rid of) anything that isn't alphanumeric, -, _, a space, or a period. And here's how you do that:

import re
re.sub('[^\w\-_\. ]', '_', filename)

The above escapes every character that's not a letter, '_', '-', '.' or space with an '_'. So, if you're looking at an entire path, you'll want to throw os.sep in the list of approved characters as well.

Here's some sample output:

In [27]: re.sub('[^\w\-_\. ]', '_', 'some\\*-file._n\\\\ame')
Out[27]: 'some__-file._n__ame'
  • 24
Reply Report
    • Looks like I got carried away with the last edit. It was right exactly as it was. Keep in mind that it's only allowing specific characters (not excluding a set of characters). Raw string was unnecessary. See my clarification and the sample output in the updated answer.
      • 1
    • Yeah ... I think if you don't use r'...', you'll still need a backslash in front of each of those backslashes. Thus a total of 10 backslashes.

Unfortunately, the set of acceptable characters varies by OS and by filesystem.

  • Windows:

    • Use almost any character in the current code page for a name, including Unicode characters and characters in the extended character set (128–255), except for the following:
      • The following reserved characters are not allowed:
        < > : " / \ | ? *
      • Characters whose integer representations are in the range from zero through 31 are not allowed.
      • Any other character that the target file system does not allow.

    The list of accepted characters can vary depending on the OS and locale of the machine that first formatted the filesystem.

    .NET has GetInvalidFileNameChars and GetInvalidPathChars, but I don't know how to call those from Python.

  • Mac OS: NUL is always excluded, "/" is excluded from POSIX layer, ":" excluded from Apple APIs
    • HFS+: any sequence of non-excluded characters that is representable by UTF-16 in the Unicode 2.0 spec
    • HFS: any sequence of non-excluded characters representable in MacRoman (default) or other encodings, depending on the machine that created the filesystem
    • UFS: same as HFS+
  • Linux:
    • native (UNIX-like) filesystems: any byte sequence excluding NUL and "/"
    • FAT, NTFS, other non-native filesystems: varies

Your best bet is probably to either be overly-conservative on all platforms, or to just try creating the file name and handle errors.

  • 20
Reply Report
      • 1
    • Note that on Windows, you'll also have issues if you try to use filenames like CON.*. And spaces at the end of a filename tend to cause problems too.
      • 2
    • @Antimony Yes, the legacy DOS device names cannot be used as filenames in Win32. But the filesystem supports them just fine, and using the NT APIs to get around Win32 works fine. (At least, as far as I recall; I haven't got a Windows machine to test on anymore.)
      • 1
    • You may be able to do it using NT APIs, but Python can't. Python on windows is unfortunately restricted in filename handling. The worst part is that often times the bad filenames will fail silently or give you a different file than what you asked for (try opening CON in a script run from the console).

Trending Tags