• 13

A PHP Error was encountered

Severity: Notice

Message: Undefined index: userid

Filename: views/question.php

Line Number: 191


File: /home/prodcxja/public_html/questions/application/views/question.php
Line: 191
Function: _error_handler

File: /home/prodcxja/public_html/questions/application/controllers/Questions.php
Line: 433
Function: view

File: /home/prodcxja/public_html/questions/index.php
Line: 315
Function: require_once

EDIT: This was an out-of-control application process, not GCE. Here's the issue, and answered below:

I just had some kind of outage with my CE VM on a trial account, but I don't see any outage reported on the Google Compute Outage list.

I'm not sure how long it lasted since I'm not sure when it started. From the behavior it matches something that seemed to happen a few weeks ago (losing the ability to log in with SSH over the Compute Engine dashboard until the VM was rebooted).

My test VM disconnected my SSH connection in the last day or so, and when I noticed today I was unable to reconnect. I then tried to connect with SSH using "SSH" connect on the Compute Engine VM list, and that failed. The only thing I could do was get a prompt on the serial console... but I didn't have a password-enabled account at all, I was relying on SSH (now fixed). I had to stop the VM and restart it... then I could connect using the "SSH" connect option on the VM list, although I could NOT connect from outside. I connected to the serial console and saw some network error messages trying to connect to various snaps. I tried to SSH to a remote server from my SSH window into the VM, and initially could not. After a minute or so that worked, and suddenly remote connections worked again.

EDIT: I got a response from my support request from Google. They're saying I experienced a Live Migration event. That doesn't sound right. This was at least 10 minutes of disrupted networking. I could connect to the serial console, and it seemed responsive. It was only after rebooting and the failure of the google management snaps to initialize that it appeared to suddenly start working. Maybe a failure of communication in boot triggered the migration event? I don't know.

EDIT: I removed my worrying about GCE's stability since the infrastructure had nothing to do with the problem.

There may be a number of reasons for this to happen. I would recommend checking the SSH troubleshooting document for more information about how to troubleshoot this issue.

This issue could also occur if the Linux guest environment did not initiate properly after the live migration. The guest environments includes a set of scripts and processes that run contents from a metadata server and creates the proper environment for a virtual machine to run. It might be possible that the SSH keys were not set properly during the guest environment setup.

You may also set the 'automaticRestart' field to 'true' as mentioned in this document. This will automatically restart your instance if it crashes due to a hardware issue or after a live migration. This will ensure that the SSH keys were set up correctly. Feel free to read the live migration documentation if you need further information about live migration in Google Cloud Platform.

  • 1
Reply Report
    • The VM was online and responding on the serial console... and the 'automaticRestart' field is 'On'. Many items in that document are assuming something is misconfigured. The firewall rules were correct... I added the firewall rule names to the 'Network tags' list, which didn't do anything... that's the only change I made, apart from later rebooting it. It still did not initially work... I could see network errors on the serial console... and then suddenly it started working. It really acted like a loss of network connectivity. I'm wondering if there's something wrong with the Ubuntu image.
      • 1
    • Could you please post the network errors you got on serial console? This would give more insight into the issue you had. Also did you find any abnormalities on your logs at the time of this incident? If yes, can you post that too?
    • Ah. Okay... I figured it out. Our application process went crazy and started eating lots of memory... reaching a point where the system was constantly killing the process (which would just respawn) for OOM. The system must have been unable to service SSH connections in that state. It'd be good if GCE would monitor memory usage as well as CPU usage. :) That plot would have led right to it. Silly of me not to check all the logs first though. Sorry. I thought it was a networking thing, since the serial console was responsive and it was still alive.
    • You know, I asked this question here because I was skeptical of getting serious support through Google since it was a free trial account. Not only did that channel communicate very reasonably, you also spent more effort than I expected trying to help. I'm impressed with GCE support now, really. Thank you.

The instance appeared functional on the serial console, but it was in fact in high distress due to an out of control root-privileged (a temporary testing thing) process eating up all available memory. The system OOM killer was constantly killing the process, which would just respawn.

Google Compute Engine should monitor system memory usage by default. It's kind of weird that it doesn't.

So, uh... given the situation the usefulness of this question to anyone seems low. Should it be deleted?

  • 0
Reply Report

Warm tip !!!

This article is reproduced from Stack Exchange / Stack Overflow, please click

Trending Tags