• 4

A PHP Error was encountered

Severity: Notice

Message: Undefined index: userid

Filename: views/question.php

Line Number: 191


File: /home/prodcxja/public_html/questions/application/views/question.php
Line: 191
Function: _error_handler

File: /home/prodcxja/public_html/questions/application/controllers/Questions.php
Line: 433
Function: view

File: /home/prodcxja/public_html/questions/index.php
Line: 315
Function: require_once

name Punditsdkoslkdosdkoskdo

scikit-learn: how to scale back the 'y' predicted result

I'm trying to learn scikit-learn and Machine Learning by using the Boston Housing Data Set.

# I splitted the initial dataset ('housing_X' and 'housing_y')
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(housing_X, housing_y, test_size=0.25, random_state=33)

# I scaled those two datasets
from sklearn.preprocessing import StandardScaler
scalerX = StandardScaler().fit(X_train)
scalery = StandardScaler().fit(y_train)
X_train = scalerX.transform(X_train)
y_train = scalery.transform(y_train)
X_test = scalerX.transform(X_test)
y_test = scalery.transform(y_test)

# I created the model
from sklearn import linear_model
clf_sgd = linear_model.SGDRegressor(loss='squared_loss', penalty=None, random_state=42) 

Based on this new model clf_sgd, I am trying to predict the y based on the first instance of X_train.

X_new_scaled = X_train[0]
print (X_new_scaled)
y_new = clf_sgd.predict(X_new_scaled)
print (y_new)

However, the result is quite odd for me (1.34032174, instead of 20-30, the range of the price of the houses)

[-0.32076092  0.35553428 -1.00966618 -0.28784917  0.87716097  1.28834383
  0.4759489  -0.83034371 -0.47659648 -0.81061061 -2.49222645  0.35062335
[ 1.34032174]

I guess that this 1.34032174 value should be scaled back, but I am trying to figure out how to do it with no success. Any tip is welcome. Thank you very much.

      • 2
    • I don't think you need to apply scaling on your target variable. Scaling and other feature engineering techniques are applied only on the feature vectors.

Bit late to the game: Just don't scale your y. With scaling y you actually loose your units. The regression or loss optimization is actually determined by the relative differences between the features. BTW for house prices (or any other monetary value) it is common practice to take the logarithm. Then you obviously need to do an numpy.exp() to get back to the actual dollars/euros/yens...

  • 5
Reply Report

Trending Tags