• 8

A PHP Error was encountered

Severity: Notice

Message: Undefined index: userid

Filename: views/question.php

Line Number: 191


File: /home/prodcxja/public_html/questions/application/views/question.php
Line: 191
Function: _error_handler

File: /home/prodcxja/public_html/questions/application/controllers/Questions.php
Line: 433
Function: view

File: /home/prodcxja/public_html/questions/index.php
Line: 315
Function: require_once

name Punditsdkoslkdosdkoskdo

Unpickling a python 2 object with python 3

I'm wondering if there is a way to load an object that was pickled in Python 2.4, with Python 3.4.

I've been running 2to3 on a large amount of company legacy code to get it up to date.

Having done this, when running the file I get the following error:

  File "H:\fixers - 3.4\addressfixer - 3.4\trunk\lib\address\address_generic.py"
, line 382, in read_ref_files
    d = pickle.load(open(mshelffile, 'rb'))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal
not in range(128)

Looking at the pickled object in contention, it's a dict in a dict, containing keys and values of type str.

So my question is: Is there a way to load an object, originally pickled in python 2.4, with python 3.4?

      • 1
    • Does Python 2.4 have the json module? Perhaps you could write a 2.4 script that unpickles the object and saves it as a json object, and then write a 3.4 script that reads the json object and saves it as a 3.4-compatible pickle object. This would be a one-time operation that you run on all your pickle files.
      • 1
    • I was thinking along similar lines, considering that these are dicts I reckon I could just change sys.stdout to a file and print them out, but I want to see if I can load them first

You'll have to tell pickle.load() how to convert Python bytestring data to Python 3 strings, or you can tell pickle to leave them as bytes.

The default is to try and decode all string data as ASCII, and that decoding fails. See the pickle.load() documentation:

Optional keyword arguments are fix_imports, encoding and errors, which are used to control compatibility support for pickle stream generated by Python 2. If fix_imports is true, pickle will try to map the old Python 2 names to the new names used in Python 3. The encoding and errors tell pickle how to decode 8-bit string instances pickled by Python 2; these default to ‘ASCII’ and ‘strict’, respectively. The encoding can be ‘bytes’ to read these 8-bit string instances as bytes objects.

Setting the encoding to latin1 allows you to import the data directly:

with open(mshelffile, 'rb') as f:
    d = pickle.load(f, encoding='latin1') 

but you'll need to verify that none of your strings are decoded using the wrong codec; Latin-1 works for any input as it maps the byte values 0-255 to the first 256 Unicode codepoints directly.

The alternative would be to load the data with encoding='bytes', and decode all bytes keys and values afterwards.

Note that up to Python versions before 3.6.8, 3.7.2 and 3.8.0, unpickling of Python 2 datetime object data is broken unless you use encoding='bytes'.

  • 189
Reply Report
    • How could this be made backward compatible with Python 2? Apparently, encoding argument isn't present for Python 2.
    • @EpicAdv: you don't need to make this code compatible with Python 2; this question is about how to load Python 2 pickles into Python 3. Drop the encoding keyword altogether for Python 2.
      • 2
    • @EpicAdv: You can create a pickle_options dictionary that is either empty for python 2 or has 'encoding': 'latin1' and send **pickle_options to pickle. This way it should run in both versions.
    • @pipefish - Clever, but somewhere you have to detect which version you're using, so you could also more straightforwardly just do the call differently (one with and one without the extra argument) depending on the version. But at least you got the gist of EpicAdv's comment, which Martijn's comment doesn't address at all.
      • 1
    • I realize the datetime comment was not the main thrust of this answer, but for future readers, I'd like to point out that even the "fixed" versions of Python 3 still require encoding='latin-1' to unpickle Python 2 datetimes. If your pickled Python 2 data happens to include both datetimes and bytestrings encoded in something other than Latin-1, then you might still be better off using encoding='bytes' after all.

Using encoding='latin1' causes some issues when your object contains numpy arrays in it.

Using encoding='bytes' will be better.

Please see this answer for complete explanation of using encoding='bytes'

  • 15
Reply Report

Trending Tags