Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Netapp: toasters

Truncated SSH ouptut on 8.x?

 

 

Netapp toasters RSS feed   Index | Next | Previous | View Threaded


ddunham at taos

Jun 7, 2011, 2:39 PM

Post #1 of 11 (2450 views)
Permalink
Truncated SSH ouptut on 8.x?

Anyone seen differences with SSH command output on 8.x filers?

I've encountered this a few times lately, and as best I can tell it's
always been on 8.x machines. It appears to be more likely to happen on
the first command to a filer in a while.

I'll run something like "ssh <filer> exportfs" and either get a line or
two or just a normal prompt returned with no output. Never an error.
Then I'll just repeat the last command and get full output. In fact I
can repeat it multiple times and never see the error again.

There is a KB article that seems similar
(https://kb.netapp.com/support/index?page=content&id=2013198), but it's
not really the same, affects 7.x, and the variable that is mentioned to
resolve doesn't seem appropriate.

I don't have a lot of hard data on this, I'm not prepared to open a
ticket on just yet. But I'm wondering if others have seen it.

Thanks,
--
Darren


ddunham at taos

Jun 9, 2011, 5:10 PM

Post #2 of 11 (2372 views)
Permalink
Re: Truncated SSH ouptut on 8.x? [In reply to]

On Wed, Jun 08, 2011 at 08:24:54AM +0200, Sander Klein wrote:
> Maybe it's related to the ssh client you're using?

Pretty sure it's not a network or SSH issue.

I just got a nice reproduction, and the data I got really surprised me.
(BTW, the filer I'm hitting below is running 8.0.1P2).

I have a script that checks "qtree stats" periodically. And to save on
SSH setup/teardown costs, it runs a sysstat in the same command. But
the script then works through the data and hands back a summary. I
don't have a copy of the raw stuff coming back. But I saw some oddities
that made it appear that it was not getting all the data.

So I ran the following command and was going to put it in a crontab to
run mutliple times so I could see if it went off.
"qtree stats; sysstat -c 1 5"

Well, while testing the script, on the 4th time I ran it I got "short"
output. On this filer, that command should return 179 lines. On the
last try it only returned 10, but not simply the first 10. Here's a
(slightly obfuscated) version of what I got back:



-----BEGIN
No qtrees are in use in Volume vol0
Volume Tree NFS ops CIFS ops
-------- -------- ------- --------
perf [user]perf 1773 0
Volume Tree NFS ops CIFS ops
-------- -------- ------- --------
test_vm_volume vm_vol 0 0
CPU NFS CIFS HTTP Net kB/s Disk kB/s Tape kB/s Cache
in out read write read write age
77% 28350 0 0 290677 82179 194167 415548 0 0 6s
-----END

In reality, the "perf" volume has over 100 qtrees. The one shown above
is the first line in the list when the command is complete. So it's not
simply truncating the total output, it has managed to drop a portion of
the output of one command.

If it were a network or SSH problem, I wouldn't expect a gap in the
middle of a command, nor would I expect it exactly on a nice line
boundary.

Looks like I can open a ticket now.

--
Darren


tmacmd at gmail

Jun 9, 2011, 5:56 PM

Post #3 of 11 (2375 views)
Permalink
Re: Truncated SSH ouptut on 8.x? [In reply to]

Of course, f you know the right people...you can get the Ontap API software.
I am able to query 8 different filers, collect info on about 200 volumes
(volume info, aggr info, qtree info--> which there are 1000's of qtrees)
in about 3-4 seconds and no SSH involved at all.

perl library, jar file, .NET, c++...lots of different ways to plug into ONTAP.

I believe it comes in on the http admin route.

--tmac
         Tim McCarthy
     Principal Consultant

  RedHat Certified Engineer
   804006984323821 (RHEL4)
   805007643429572 (RHEL5)



On Thu, Jun 9, 2011 at 8:10 PM, A Darren Dunham <ddunham [at] taos> wrote:
> On Wed, Jun 08, 2011 at 08:24:54AM +0200, Sander Klein wrote:
>> Maybe it's related to the ssh client you're using?
>
> Pretty sure it's not a network or SSH issue.
>
> I just got a nice reproduction, and the data I got really surprised me.
> (BTW, the filer I'm hitting below is running 8.0.1P2).
>
> I have a script that checks "qtree stats" periodically.  And to save on
> SSH setup/teardown costs, it runs a sysstat in the same command.  But
> the script then works through the data and hands back a summary.  I
> don't have a copy of the raw stuff coming back.  But I saw some oddities
> that made it appear that it was not getting all the data.
>
> So I ran the following command and was going to put it in a crontab to
> run mutliple times so I could see if it went off.
> "qtree stats; sysstat -c 1 5"
>
> Well, while testing the script, on the 4th time I ran it I got "short"
> output.  On this filer, that command should return 179 lines.  On the
> last try it only returned 10, but not simply the first 10.  Here's a
> (slightly obfuscated) version of what I got back:
>
>
>
> -----BEGIN
> No qtrees are in use in Volume vol0
> Volume           Tree            NFS ops         CIFS ops
> --------         --------        -------         --------
> perf             [user]perf         1773               0
> Volume           Tree            NFS ops         CIFS ops
> --------         --------        -------         --------
> test_vm_volume        vm_vol                0               0
>  CPU     NFS    CIFS    HTTP     Net   kB/s    Disk   kB/s    Tape   kB/s  Cache
>                                  in    out    read  write    read  write    age
>  77%   28350       0       0  290677  82179  194167 415548       0      0     6s
> -----END
>
> In reality, the "perf" volume has over 100 qtrees.  The one shown above
> is the first line in the list when the command is complete.  So it's not
> simply truncating the total output, it has managed to drop a portion of
> the output of one command.
>
> If it were a network or SSH problem, I wouldn't expect a gap in the
> middle of a command, nor would I expect it exactly on a nice line
> boundary.
>
> Looks like I can open a ticket now.
>
> --
> Darren
>


ddunham at taos

Jun 10, 2011, 9:05 AM

Post #4 of 11 (2364 views)
Permalink
Re: Truncated SSH ouptut on 8.x? [In reply to]

On Thu, Jun 09, 2011 at 08:56:27PM -0400, tmac wrote:
> Of course, f you know the right people...you can get the Ontap API software.
> I am able to query 8 different filers, collect info on about 200 volumes
> (volume info, aggr info, qtree info--> which there are 1000's of qtrees)
> in about 3-4 seconds and no SSH involved at all.
>
> perl library, jar file, .NET, c++...lots of different ways to plug
> into ONTAP

No question. But I still have many items running that are hideously
complex and more than 10 years old. I can't convert them in any
reasonable time frame.

And all our "by hand" administration is still via SSH, which is where we
first noticed this. I was just able to get confirmation from this
script that is using SSH as well.

It's very possible that by going around the shell/interpreter, the API
transport would not be subject to this problem. I'll have to see if I
can get that tested. Even if it's fine, I really need the SSH stuff to
work because of the legacy scripts I have.

--
Darren


Peter.Learmonth at netapp

Jun 10, 2011, 10:57 AM

Post #5 of 11 (2380 views)
Permalink
RE: Truncated SSH ouptut on 8.x? [In reply to]

Hi Guys
I ran into this one last year doing some early testing on pre-release
8.0.1. Never occurred to me this is a bug that might be experienced in
the field - my bad. It has since been reported as bug 485715
http://now.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=485715.
Although the bug report lists no workaround, I found that appending a
command that takes a few seconds, like ping <something that isn't there>
or "sysstat -c 1 3' gave the command you wanted time to complete output
before the SSH connection is closed.

>From one of my test scripts:

sshcmd $filer -qfl root 'priv set -q diag ; stats show vstorage ; ping
172.16.26.254'

Down side is that ping can take 15 seconds to time out, and there's no
option in 7-mode to specify a timeout. Sysstat gives you control down
to the second, but you have to scrub the output. Sorry, no sleep (ONTAP
or me these days! ;-)

Please open a case with NetApp support and have them add you/your case
as call_rec to burt 485715.

Thanks

Peter

-----Original Message-----
From: A Darren Dunham [mailto:ddunham [at] taos]
Sent: Friday, June 10, 2011 9:06 AM
To: toasters [at] mathworks
Subject: Re: Truncated SSH ouptut on 8.x?

On Thu, Jun 09, 2011 at 08:56:27PM -0400, tmac wrote:
> Of course, f you know the right people...you can get the Ontap API
software.
> I am able to query 8 different filers, collect info on about 200
volumes
> (volume info, aggr info, qtree info--> which there are 1000's of
qtrees)
> in about 3-4 seconds and no SSH involved at all.
>
> perl library, jar file, .NET, c++...lots of different ways to plug
> into ONTAP

No question. But I still have many items running that are hideously
complex and more than 10 years old. I can't convert them in any
reasonable time frame.

And all our "by hand" administration is still via SSH, which is where we
first noticed this. I was just able to get confirmation from this
script that is using SSH as well.

It's very possible that by going around the shell/interpreter, the API
transport would not be subject to this problem. I'll have to see if I
can get that tested. Even if it's fine, I really need the SSH stuff to
work because of the legacy scripts I have.

--
Darren


ddunham at taos

Jun 10, 2011, 11:47 AM

Post #6 of 11 (2372 views)
Permalink
Re: Truncated SSH ouptut on 8.x? [In reply to]

On Fri, Jun 10, 2011 at 10:57:00AM -0700, Learmonth, Peter wrote:
> Hi Guys
> I ran into this one last year doing some early testing on pre-release
> 8.0.1. Never occurred to me this is a bug that might be experienced in
> the field - my bad. It has since been reported as bug 485715
> http://now.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=485715.
> Although the bug report lists no workaround, I found that appending a
> command that takes a few seconds, like ping <something that isn't there>
> or "sysstat -c 1 3' gave the command you wanted time to complete output
> before the SSH connection is closed.

That's not a workaround for what I'm seeing. I actually get the output
of the second command, but the output from the first command is still
truncated. It doesn't seem to be SSH related to me.

Here's another one I just ran:

# ssh <filer> options
acp.domain 16820416
acp.enabled on
ndmpd.enable on
#

Note that I didn't get the first three lines, I got the first two lines
and one random line from the middle.

> Please open a case with NetApp support and have them add you/your case
> as call_rec to burt 485715.

Done. Thanks for the pointer.

--
Darren


netbacker at gmail

Jun 14, 2011, 8:55 AM

Post #7 of 11 (2351 views)
Permalink
Re: Truncated SSH ouptut on 8.x? [In reply to]

I had the same problem with a set of new filers all running 8.0 code.
Searching NOW I found this KB
https://kb.netapp.com/support/index?page=content&id=2013198
Indeed on the ONTAP 8.0x systems the default value for
ssh.idle.timeout is set to 0
whereas on all 7.x filers it was set to 600.
After changing it on the 8.0x filers, things seems to be working OK.
Hope this helps

-net


On Fri, Jun 10, 2011 at 11:47 AM, A Darren Dunham <ddunham [at] taos> wrote:
> On Fri, Jun 10, 2011 at 10:57:00AM -0700, Learmonth, Peter wrote:
>> Hi Guys
>> I ran into this one last year doing some early testing on pre-release
>> 8.0.1.  Never occurred to me this is a bug that might be experienced in
>> the field - my bad.  It has since been reported as bug 485715
>> http://now.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=485715.
>> Although the bug report lists no workaround, I found that appending a
>> command that takes a few seconds, like ping <something that isn't there>
>> or "sysstat -c 1 3' gave the command you wanted time to complete output
>> before the SSH connection is closed.
>
> That's not a workaround for what I'm seeing.  I actually get the output
> of the second command, but the output from the first command is still
> truncated.  It doesn't seem to be SSH related to me.
>
> Here's another one I just ran:
>
> # ssh <filer> options
> acp.domain                   16820416
> acp.enabled                  on
> ndmpd.enable                 on
> #
>
> Note that I didn't get the first three lines, I got the first two lines
> and one random line from the middle.
>
>> Please open a case with NetApp support and have them add you/your case
>> as call_rec to burt 485715.
>
> Done.  Thanks for the pointer.
>
> --
> Darren
>


ddunham at taos

Jun 14, 2011, 1:16 PM

Post #8 of 11 (2331 views)
Permalink
Re: Truncated SSH ouptut on 8.x? [In reply to]

On Tue, Jun 14, 2011 at 08:55:40AM -0700, Sto Rage? wrote:
> I had the same problem with a set of new filers all running 8.0 code.
> Searching NOW I found this KB
> https://kb.netapp.com/support/index?page=content&id=2013198
> Indeed on the ONTAP 8.0x systems the default value for
> ssh.idle.timeout is set to 0
> whereas on all 7.x filers it was set to 600.
> After changing it on the 8.0x filers, things seems to be working OK.
> Hope this helps

I will agree that a lot of these filers I'm seeing it on had the idle
set to "0", but changing it to "600" didn't seem to change anything. I
can still reproduce it with that change.

--
Darren


netbacker at gmail

Jun 14, 2011, 3:00 PM

Post #9 of 11 (2331 views)
Permalink
Re: Truncated SSH ouptut on 8.x? [In reply to]

Yeah, I spoke too soon ;( Initially they seemed to work fine but not anymore.
This is a serious bug, I mean many of the scripts have begun to fail.
I have one that checks the volume status, and now with the way it
returns results from a grep statement, the scripts report the volumes
don't exist anymore. Scary :)

This is specific to 8.0x release. I first thought it was platform
specific ( we just installed a bunch of 3270s) and this week we
upgrade some older 3070s to 8.0x and they too began behaving this way.

-net

On Tue, Jun 14, 2011 at 1:16 PM, A Darren Dunham <ddunham [at] taos> wrote:
> On Tue, Jun 14, 2011 at 08:55:40AM -0700, Sto Rage?  wrote:
>> I had the same problem with a set of new filers all running 8.0 code.
>> Searching NOW I found this KB
>> https://kb.netapp.com/support/index?page=content&id=2013198
>> Indeed on the ONTAP 8.0x systems the default value for
>> ssh.idle.timeout is set to 0
>> whereas on all 7.x filers it was set to 600.
>> After changing it on the 8.0x filers, things seems to be working OK.
>> Hope this helps
>
> I will agree that a lot of these filers I'm seeing it on had the idle
> set to "0", but changing it to "600" didn't seem to change anything.  I
> can still reproduce it with that change.
>
> --
> Darren
>


Peter.Learmonth at netapp

Jun 14, 2011, 4:19 PM

Post #10 of 11 (2331 views)
Permalink
RE: Truncated SSH ouptut on 8.x? [In reply to]

Like I said earlier, for anybody encountering this, please open a case and have them add your case #, company name, etc to the call_rec in the BURT.

Please also spell out, (and have them note in the burt) any specifics like whether it's the output in the middle or end or random/multiple chunks that's missing.

Peter

-----Original Message-----
From: Sto Rage© [mailto:netbacker [at] gmail]
Sent: Tuesday, June 14, 2011 3:01 PM
To: A Darren Dunham
Cc: toasters [at] mathworks
Subject: Re: Truncated SSH ouptut on 8.x?

Yeah, I spoke too soon ;( Initially they seemed to work fine but not anymore.
This is a serious bug, I mean many of the scripts have begun to fail.
I have one that checks the volume status, and now with the way it
returns results from a grep statement, the scripts report the volumes
don't exist anymore. Scary :)

This is specific to 8.0x release. I first thought it was platform
specific ( we just installed a bunch of 3270s) and this week we
upgrade some older 3070s to 8.0x and they too began behaving this way.

-net

On Tue, Jun 14, 2011 at 1:16 PM, A Darren Dunham <ddunham [at] taos> wrote:
> On Tue, Jun 14, 2011 at 08:55:40AM -0700, Sto Rage?  wrote:
>> I had the same problem with a set of new filers all running 8.0 code.
>> Searching NOW I found this KB
>> https://kb.netapp.com/support/index?page=content&id=2013198
>> Indeed on the ONTAP 8.0x systems the default value for
>> ssh.idle.timeout is set to 0
>> whereas on all 7.x filers it was set to 600.
>> After changing it on the 8.0x filers, things seems to be working OK.
>> Hope this helps
>
> I will agree that a lot of these filers I'm seeing it on had the idle
> set to "0", but changing it to "600" didn't seem to change anything.  I
> can still reproduce it with that change.
>
> --
> Darren
>


ddunham at taos

Aug 31, 2011, 9:37 AM

Post #11 of 11 (1906 views)
Permalink
Re: Truncated SSH ouptut on 8.x? [In reply to]

On Tue, Jun 14, 2011 at 04:19:13PM -0700, Learmonth, Peter wrote:
> Like I said earlier, for anybody encountering this, please open a case and have them add your case #, company name, etc to the call_rec in the BURT.
>
> Please also spell out, (and have them note in the burt) any specifics like whether it's the output in the middle or end or random/multiple chunks that's missing.
>
> Peter

Just saw that 8.0.2P1 is showing first fix for 485715. I haven't tested
it yet, but I will be pulling it down and trying it pretty soon.

--
Darren
_______________________________________________
Toasters mailing list
Toasters [at] teaparty
http://www.teaparty.net/mailman/listinfo/toasters

Netapp toasters RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.