Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Netapp: toasters

SMVI / VMWare Experiences...

 

 

Netapp toasters RSS feed   Index | Next | Previous | View Threaded


kwillia at smud

Aug 26, 2009, 2:32 PM

Post #1 of 18 (2110 views)
Permalink
SMVI / VMWare Experiences...

I'm looking for some experiences people out there may have with SMVI
with NetApp. We're currently experiencing major issues with SMVI
snapshots failing. I've had open tickets with NetApp/VMWare/Microsoft
for 3 months and still have yet to have a solution.

My environment looks like such:
* 6 x HP DL380 G5 (32gb Ram) in a ESX Cluster
* Dual Emulex 10000 Cards in each host.
* Cisco MDS SAN
* Netapp FAS3070 Cluster ~9tb aggregate for VMWare.
* VMFS Datastores ~10-15 VMs per datastore. ~50gb per VM.
* ASIS Turned on
* Volume and LUNspace reservation turned off
* OnTap 7.2.5.1
* Windows 2003 Guest OS.

I cant see us reaching any limitation on the Filers or the SAN. Yet we
have random VMs failing snapshots every night. Are other people seeing
these issues? (I've gone through the gamut of troubleshooting, version
management of ESX/VMWareTools/etc). Snapshots timeout and fail at the
VMWare/Guest level, not at the Netapp snapshot level.

We want to have SMVI function with VSS enabled.

Has anyone had failing snapshots been able to resolve a similar issue?
Or does anyone have SMVI working properly that we could use as a
reference to compare configuration?

__________________________________________________________
Ken Williams
Storage Administrator, Business Technology Operations
Sacramento Municipal Utility District
E-Mail: kwillia[at]smud.org
Phone: (916) 732-6744
Cell: (916) 240-4213


kwillia at smud

Aug 26, 2009, 5:18 PM

Post #2 of 18 (2016 views)
Permalink
RE: SMVI / VMWare Experiences... [In reply to]

Thank you for the input.

I disagree with the "If you can do a VM snapshot, then its an issue with
SMVI." statement. VM Snapshots do not do the same functions as a SMVI
snapshot call to the ESX API (As per VMWare Technical Support). This is
definably a communication between VSS/GuestOS/ESX Host issue. Or some
greater misconfiguration...

-----Original Message-----
From: Klise, Steve [mailto:klises[at]pamf.org]
Sent: Wednesday, August 26, 2009 3:09 PM
To: Ken Williams; toasters[at]mathworks.com
Subject: RE: SMVI / VMWare Experiences...

Couple things you have hit on, but I will regurgitate,


*
Make sure you have the latest tools installed WITH THE VSS
OPTION. A reboot is required
*
check for any SMVI snapshots. We run a morning monitoring
report that has this. Its great and anyone running ESX should use it.
*
I have had issues with timeouts. If you can do a VM snapshot,
then its an issue with SMVI. If you can't you need to start there.
*
I have seen issues with older 2.5.x and 3.x that neededt the
hardware upgraded on the VM.
*
check disk timeouts

here were a couple of other things I ran across:



Solution



SnapManager for VI utilizes an internal database to keep track of these
locks and provides persistence across reboots. Simply rebooting the
SnapManager for VI host will not clear these locks.



If you want to remove all currently running tasks in SMVI, perform the
following:



1. Stop SnapManager for VI service.

2. Remove the <SMVI dir>/server/crashdb directory.

3. Start SnapManager for VI service.

Performing these steps will not affect the scheduled jobs nor remove
them from the interface. It will kill and remove any outstanding or in
process tasks



________________________________

From: owner-toasters[at]mathworks.com on behalf of Ken Williams
Sent: Wed 8/26/2009 2:32 PM
To: toasters[at]mathworks.com
Subject: SMVI / VMWare Experiences...



I'm looking for some experiences people out there may have with SMVI
with NetApp. We're currently experiencing major issues with SMVI
snapshots failing. I've had open tickets with NetApp/VMWare/Microsoft
for 3 months and still have yet to have a solution.

My environment looks like such:

* 6 x HP DL380 G5 (32gb Ram) in a ESX Cluster
* Dual Emulex 10000 Cards in each host.
* Cisco MDS SAN
* Netapp FAS3070 Cluster ~9tb aggregate for VMWare.
* VMFS Datastores ~10-15 VMs per datastore. ~50gb per VM.
* ASIS Turned on
* Volume and LUNspace reservation turned off
* OnTap 7.2.5.1
* Windows 2003 Guest OS.


I cant see us reaching any limitation on the Filers or the SAN. Yet we
have random VMs failing snapshots every night. Are other people seeing
these issues? (I've gone through the gamut of troubleshooting, version
management of ESX/VMWareTools/etc). Snapshots timeout and fail at the
VMWare/Guest level, not at the Netapp snapshot level.

We want to have SMVI function with VSS enabled.

Has anyone had failing snapshots been able to resolve a similar issue?
Or does anyone have SMVI working properly that we could use as a
reference to compare configuration?

__________________________________________________________
Ken Williams
Storage Administrator, Business Technology Operations Sacramento
Municipal Utility District
E-Mail: kwillia[at]smud.org
Phone: (916) 732-6744
Cell: (916) 240-4213


kwillia at smud

Aug 27, 2009, 9:34 AM

Post #3 of 18 (2010 views)
Permalink
RE: SMVI / VMWare Experiences... [In reply to]

Its much appreciated. I'm at my wits end. I've been working with
VMWare/Netapp/Microsoft for quite some time on this issue.

ESX 3.5 Up4
Not always the same VMs, low load vms at that.

Yes it is the 15 min timeout window.

-----Original Message-----
From: Klise, Steve [mailto:klises[at]pamf.org]
Sent: Wednesday, August 26, 2009 5:50 PM
To: Ken Williams; toasters[at]mathworks.com
Subject: Re: SMVI / VMWare Experiences...

The comment was more of a troubleshooting step. My bad.

I have had problems with "busy" servers. I had to stop the "busy
making" service. For example. I am running dfm 3.8 and I have to stop
the db service before the snapshot. Seemed to band aid the problem.

What version of esx and release r u on?

Is it always the same vm's that fail?

Is it the 15 minute timeout during the snapshot?

----- Original Message -----
From: Ken Williams <kwillia[at]smud.org>
To: Klise, Steve; toasters[at]mathworks.com <toasters[at]mathworks.com>
Sent: Wed Aug 26 17:18:12 2009
Subject: RE: SMVI / VMWare Experiences...

Thank you for the input.

I disagree with the "If you can do a VM snapshot, then its an issue with
SMVI." statement. VM Snapshots do not do the same functions as a SMVI
snapshot call to the ESX API (As per VMWare Technical Support). This is
definably a communication between VSS/GuestOS/ESX Host issue. Or some
greater misconfiguration...

-----Original Message-----
From: Klise, Steve [mailto:klises[at]pamf.org]
Sent: Wednesday, August 26, 2009 3:09 PM
To: Ken Williams; toasters[at]mathworks.com
Subject: RE: SMVI / VMWare Experiences...

Couple things you have hit on, but I will regurgitate,


*
Make sure you have the latest tools installed WITH THE VSS
OPTION. A reboot is required
*
check for any SMVI snapshots. We run a morning monitoring
report that has this. Its great and anyone running ESX should use it.
*
I have had issues with timeouts. If you can do a VM snapshot,
then its an issue with SMVI. If you can't you need to start there.
*
I have seen issues with older 2.5.x and 3.x that neededt the
hardware upgraded on the VM.
*
check disk timeouts

here were a couple of other things I ran across:



Solution



SnapManager for VI utilizes an internal database to keep track of these
locks and provides persistence across reboots. Simply rebooting the
SnapManager for VI host will not clear these locks.



If you want to remove all currently running tasks in SMVI, perform the
following:



1. Stop SnapManager for VI service.

2. Remove the <SMVI dir>/server/crashdb directory.

3. Start SnapManager for VI service.

Performing these steps will not affect the scheduled jobs nor remove
them from the interface. It will kill and remove any outstanding or in
process tasks



________________________________

From: owner-toasters[at]mathworks.com on behalf of Ken Williams
Sent: Wed 8/26/2009 2:32 PM
To: toasters[at]mathworks.com
Subject: SMVI / VMWare Experiences...



I'm looking for some experiences people out there may have with SMVI
with NetApp. We're currently experiencing major issues with SMVI
snapshots failing. I've had open tickets with NetApp/VMWare/Microsoft
for 3 months and still have yet to have a solution.

My environment looks like such:

* 6 x HP DL380 G5 (32gb Ram) in a ESX Cluster
* Dual Emulex 10000 Cards in each host.
* Cisco MDS SAN
* Netapp FAS3070 Cluster ~9tb aggregate for VMWare.
* VMFS Datastores ~10-15 VMs per datastore. ~50gb per VM.
* ASIS Turned on
* Volume and LUNspace reservation turned off
* OnTap 7.2.5.1
* Windows 2003 Guest OS.


I cant see us reaching any limitation on the Filers or the SAN. Yet we
have random VMs failing snapshots every night. Are other people seeing
these issues? (I've gone through the gamut of troubleshooting, version
management of ESX/VMWareTools/etc). Snapshots timeout and fail at the
VMWare/Guest level, not at the Netapp snapshot level.

We want to have SMVI function with VSS enabled.

Has anyone had failing snapshots been able to resolve a similar issue?
Or does anyone have SMVI working properly that we could use as a
reference to compare configuration?

__________________________________________________________
Ken Williams
Storage Administrator, Business Technology Operations Sacramento
Municipal Utility District
E-Mail: kwillia[at]smud.org
Phone: (916) 732-6744
Cell: (916) 240-4213


kwillia at smud

Aug 27, 2009, 9:42 AM

Post #4 of 18 (2000 views)
Permalink
RE: SMVI / VMWare Experiences... [In reply to]

Thanks for the response!

No iSCSI here, I've been over the best practices, we're pretty close to
what's laid out there.

-----Original Message-----
From: Sels Roger [mailto:roger.sels[at]uptimegroup.be]
Sent: Wednesday, August 26, 2009 11:30 PM
To: Ken Williams
Cc: Klise, Steve; toasters[at]mathworks.com
Subject: Re: SMVI / VMWare Experiences...

Hi,

you might be hitting the "VMware bug" as described in
https://now.netapp.com/Knowledgebase/solutionarea.asp?id=kb48102
.

Also take a look at chapter 8 of
http://media.netapp.com/documents/tr-3737.pdf
.

Cheers,
Roger


On 27-aug-09, at 02:18, Ken Williams wrote:

> Thank you for the input.
>
> I disagree with the "If you can do a VM snapshot, then its an issue
> with SMVI." statement. VM Snapshots do not do the same functions as a
> SMVI snapshot call to the ESX API (As per VMWare Technical Support).
> This is definably a communication between VSS/GuestOS/ESX Host issue.
> Or some greater misconfiguration...
>
> -----Original Message-----
> From: Klise, Steve [mailto:klises[at]pamf.org]
> Sent: Wednesday, August 26, 2009 3:09 PM
> To: Ken Williams; toasters[at]mathworks.com
> Subject: RE: SMVI / VMWare Experiences...
>
> Couple things you have hit on, but I will regurgitate,
>
>
> *
> Make sure you have the latest tools installed WITH THE VSS
> OPTION. A reboot is required
> *
> check for any SMVI snapshots. We run a morning monitoring
> report that has this. Its great and anyone running ESX should use it.
> *
> I have had issues with timeouts. If you can do a VM snapshot,
> then its an issue with SMVI. If you can't you need to start there.
> *
> I have seen issues with older 2.5.x and 3.x that neededt the
> hardware upgraded on the VM.
> *
> check disk timeouts
>
> here were a couple of other things I ran across:
>
>
>
> Solution
>
>
>
> SnapManager for VI utilizes an internal database to keep track of
> these locks and provides persistence across reboots. Simply rebooting
> the SnapManager for VI host will not clear these locks.
>
>
>
> If you want to remove all currently running tasks in SMVI, perform the
> following:
>
>
>
> 1. Stop SnapManager for VI service.
>
> 2. Remove the <SMVI dir>/server/crashdb directory.
>
> 3. Start SnapManager for VI service.
>
> Performing these steps will not affect the scheduled jobs nor remove
> them from the interface. It will kill and remove any outstanding or in
> process tasks
>
>
>
> ________________________________
>
> From: owner-toasters[at]mathworks.com on behalf of Ken Williams
> Sent: Wed 8/26/2009 2:32 PM
> To: toasters[at]mathworks.com
> Subject: SMVI / VMWare Experiences...
>
>
>
> I'm looking for some experiences people out there may have with SMVI
> with NetApp. We're currently experiencing major issues with SMVI
> snapshots failing. I've had open tickets with NetApp/VMWare/Microsoft
> for 3 months and still have yet to have a solution.
>
> My environment looks like such:
>
> * 6 x HP DL380 G5 (32gb Ram) in a ESX Cluster
> * Dual Emulex 10000 Cards in each host.
> * Cisco MDS SAN
> * Netapp FAS3070 Cluster ~9tb aggregate for VMWare.
> * VMFS Datastores ~10-15 VMs per datastore. ~50gb per VM.
> * ASIS Turned on
> * Volume and LUNspace reservation turned off
> * OnTap 7.2.5.1
> * Windows 2003 Guest OS.
>
>
> I cant see us reaching any limitation on the Filers or the SAN. Yet we
> have random VMs failing snapshots every night. Are other people seeing
> these issues? (I've gone through the gamut of troubleshooting, version
> management of ESX/VMWareTools/etc). Snapshots timeout and fail at the
> VMWare/Guest level, not at the Netapp snapshot level.
>
> We want to have SMVI function with VSS enabled.
>
> Has anyone had failing snapshots been able to resolve a similar issue?
> Or does anyone have SMVI working properly that we could use as a
> reference to compare configuration?
>
> __________________________________________________________
> Ken Williams
> Storage Administrator, Business Technology Operations Sacramento
> Municipal Utility District
> E-Mail: kwillia[at]smud.org
> Phone: (916) 732-6744
> Cell: (916) 240-4213
>
>
>
>


kheal at hotmail

Aug 27, 2009, 10:10 AM

Post #5 of 18 (1995 views)
Permalink
RE: SMVI / VMWare Experiences... [In reply to]

Hi

This sounds a lot like Bug 324112: SMVI does not backup VMs if snapshot creation takes longer than default timeout period of 15 minutes
http://now.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=324112

Do you have VMs managed by different ESX hosts but stored in same VMFS datastore?
Could you let us know the exact error messages you see in the VC server logs and the ESX server logs.

Also when this fails do you see anything in the Windows system event logs; if it is this bug then the second question is why it is taking intermittently so long to create the snapshot.

Sorry to answer the question with just another bunch of questions.

cheers
Kenneth

----------------------------------------
> Subject: RE: SMVI / VMWare Experiences...
> Date: Thu, 27 Aug 2009 09:42:01 -0700
> From: kwillia[at]smud.org
> To: roger.sels[at]uptimegroup.be
> CC: klises[at]pamf.org; toasters[at]mathworks.com
>
> Thanks for the response!
>
> No iSCSI here, I've been over the best practices, we're pretty close to
> what's laid out there.
>
> -----Original Message-----
> From: Sels Roger [mailto:roger.sels[at]uptimegroup.be]
> Sent: Wednesday, August 26, 2009 11:30 PM
> To: Ken Williams
> Cc: Klise, Steve; toasters[at]mathworks.com
> Subject: Re: SMVI / VMWare Experiences...
>
> Hi,
>
> you might be hitting the "VMware bug" as described in
> https://now.netapp.com/Knowledgebase/solutionarea.asp?id=kb48102
> .
>
> Also take a look at chapter 8 of
> http://media.netapp.com/documents/tr-3737.pdf
> .
>
> Cheers,
> Roger
>
>
> On 27-aug-09, at 02:18, Ken Williams wrote:
>
>> Thank you for the input.
>>
>> I disagree with the "If you can do a VM snapshot, then its an issue
>> with SMVI." statement. VM Snapshots do not do the same functions as a
>> SMVI snapshot call to the ESX API (As per VMWare Technical Support).
>> This is definably a communication between VSS/GuestOS/ESX Host issue.
>> Or some greater misconfiguration...
>>
>> -----Original Message-----
>> From: Klise, Steve [mailto:klises[at]pamf.org]
>> Sent: Wednesday, August 26, 2009 3:09 PM
>> To: Ken Williams; toasters[at]mathworks.com
>> Subject: RE: SMVI / VMWare Experiences...
>>
>> Couple things you have hit on, but I will regurgitate,
>>
>>
>> *
>> Make sure you have the latest tools installed WITH THE VSS
>> OPTION. A reboot is required
>> *
>> check for any SMVI snapshots. We run a morning monitoring
>> report that has this. Its great and anyone running ESX should use it.
>> *
>> I have had issues with timeouts. If you can do a VM snapshot,
>> then its an issue with SMVI. If you can't you need to start there.
>> *
>> I have seen issues with older 2.5.x and 3.x that neededt the
>> hardware upgraded on the VM.
>> *
>> check disk timeouts
>>
>> here were a couple of other things I ran across:
>>
>>
>>
>> Solution
>>
>>
>>
>> SnapManager for VI utilizes an internal database to keep track of
>> these locks and provides persistence across reboots. Simply rebooting
>> the SnapManager for VI host will not clear these locks.
>>
>>
>>
>> If you want to remove all currently running tasks in SMVI, perform the
>> following:
>>
>>
>>
>> 1. Stop SnapManager for VI service.
>>
>> 2. Remove the /server/crashdb directory.
>>
>> 3. Start SnapManager for VI service.
>>
>> Performing these steps will not affect the scheduled jobs nor remove
>> them from the interface. It will kill and remove any outstanding or in
>> process tasks
>>
>>
>>
>> ________________________________
>>
>> From: owner-toasters[at]mathworks.com on behalf of Ken Williams
>> Sent: Wed 8/26/2009 2:32 PM
>> To: toasters[at]mathworks.com
>> Subject: SMVI / VMWare Experiences...
>>
>>
>>
>> I'm looking for some experiences people out there may have with SMVI
>> with NetApp. We're currently experiencing major issues with SMVI
>> snapshots failing. I've had open tickets with NetApp/VMWare/Microsoft
>> for 3 months and still have yet to have a solution.
>>
>> My environment looks like such:
>>
>> * 6 x HP DL380 G5 (32gb Ram) in a ESX Cluster
>> * Dual Emulex 10000 Cards in each host.
>> * Cisco MDS SAN
>> * Netapp FAS3070 Cluster ~9tb aggregate for VMWare.
>> * VMFS Datastores ~10-15 VMs per datastore. ~50gb per VM.
>> * ASIS Turned on
>> * Volume and LUNspace reservation turned off
>> * OnTap 7.2.5.1
>> * Windows 2003 Guest OS.
>>
>>
>> I cant see us reaching any limitation on the Filers or the SAN. Yet we
>> have random VMs failing snapshots every night. Are other people seeing
>> these issues? (I've gone through the gamut of troubleshooting, version
>> management of ESX/VMWareTools/etc). Snapshots timeout and fail at the
>> VMWare/Guest level, not at the Netapp snapshot level.
>>
>> We want to have SMVI function with VSS enabled.
>>
>> Has anyone had failing snapshots been able to resolve a similar issue?
>> Or does anyone have SMVI working properly that we could use as a
>> reference to compare configuration?
>>
>> __________________________________________________________
>> Ken Williams
>> Storage Administrator, Business Technology Operations Sacramento
>> Municipal Utility District
>> E-Mail: kwillia[at]smud.org
>> Phone: (916) 732-6744
>> Cell: (916) 240-4213
>>
>>
>>
>>
>
>
>
>

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/


fredgrieco at yahoo

Aug 27, 2009, 12:58 PM

Post #6 of 18 (1991 views)
Permalink
Re: SMVI / VMWare Experiences... [In reply to]

Kenneth,

What sorts of things would having VMs managed by different hosts on the same VMFS data store cause? I have many of the same issues with VMWare backups, but using VCB with Netbackup. I was hoping to replace this setup with SMVI, and I have a "shotgun" layout in VMWare currently.

TIA-- sorry for fragmenting the thread.

Fred




________________________________
From: Kenneth Heal <kheal[at]hotmail.com>
To: kwillia[at]smud.org; roger.sels[at]uptimegroup.be
Cc: klises[at]pamf.org; toasters[at]mathworks.com
Sent: Thursday, August 27, 2009 1:10:57 PM
Subject: RE: SMVI / VMWare Experiences...


Hi

This sounds a lot like Bug 324112: SMVI does not backup VMs if snapshot creation takes longer than default timeout period of 15 minutes
http://now.netapp.com/NOW/cgi-bin/bol?Type=Detail&Display=324112

Do you have VMs managed by different ESX hosts but stored in same VMFS datastore?
Could you let us know the exact error messages you see in the VC server logs and the ESX server logs.

Also when this fails do you see anything in the Windows system event logs; if it is this bug then the second question is why it is taking intermittently so long to create the snapshot.

Sorry to answer the question with just another bunch of questions.

cheers
Kenneth

----------------------------------------
> Subject: RE: SMVI / VMWare Experiences...
> Date: Thu, 27 Aug 2009 09:42:01 -0700
> From: kwillia[at]smud.org
> To: roger.sels[at]uptimegroup.be
> CC: klises[at]pamf.org; toasters[at]mathworks.com
>
> Thanks for the response!
>
> No iSCSI here, I've been over the best practices, we're pretty close to
> what's laid out there.
>
> -----Original Message-----
> From: Sels Roger [mailto:roger.sels[at]uptimegroup.be]
> Sent: Wednesday, August 26, 2009 11:30 PM
> To: Ken Williams
> Cc: Klise, Steve; toasters[at]mathworks.com
> Subject: Re: SMVI / VMWare Experiences...
>
> Hi,
>
> you might be hitting the "VMware bug" as described in
> https://now.netapp.com/Knowledgebase/solutionarea.asp?id=kb48102
> .
>
> Also take a look at chapter 8 of
> http://media.netapp.com/documents/tr-3737.pdf
> .
>
> Cheers,
> Roger
>
>
> On 27-aug-09, at 02:18, Ken Williams wrote:
>
>> Thank you for the input.
>>
>> I disagree with the "If you can do a VM snapshot, then its an issue
>> with SMVI." statement. VM Snapshots do not do the same functions as a
>> SMVI snapshot call to the ESX API (As per VMWare Technical Support).
>> This is definably a communication between VSS/GuestOS/ESX Host issue.
>> Or some greater misconfiguration...
>>
>> -----Original Message-----
>> From: Klise, Steve [mailto:klises[at]pamf.org]
>> Sent: Wednesday, August 26, 2009 3:09 PM
>> To: Ken Williams; toasters[at]mathworks.com
>> Subject: RE: SMVI / VMWare Experiences...
>>
>> Couple things you have hit on, but I will regurgitate,
>>
>>
>> *
>> Make sure you have the latest tools installed WITH THE VSS
>> OPTION. A reboot is required
>> *
>> check for any SMVI snapshots. We run a morning monitoring
>> report that has this. Its great and anyone running ESX should use it.
>> *
>> I have had issues with timeouts. If you can do a VM snapshot,
>> then its an issue with SMVI. If you can't you need to start there.
>> *
>> I have seen issues with older 2.5.x and 3.x that neededt the
>> hardware upgraded on the VM.
>> *
>> check disk timeouts
>>
>> here were a couple of other things I ran across:
>>
>>
>>
>> Solution
>>
>>
>>
>> SnapManager for VI utilizes an internal database to keep track of
>> these locks and provides persistence across reboots. Simply rebooting
>> the SnapManager for VI host will not clear these locks.
>>
>>
>>
>> If you want to remove all currently running tasks in SMVI, perform the
>> following:
>>
>>
>>
>> 1. Stop SnapManager for VI service.
>>
>> 2. Remove the /server/crashdb directory.
>>
>> 3. Start SnapManager for VI service.
>>
>> Performing these steps will not affect the scheduled jobs nor remove
>> them from the interface. It will kill and remove any outstanding or in
>> process tasks
>>
>>
>>
>> ________________________________
>>
>> From: owner-toasters[at]mathworks.com on behalf of Ken Williams
>> Sent: Wed 8/26/2009 2:32 PM
>> To: toasters[at]mathworks.com
>> Subject: SMVI / VMWare Experiences...
>>
>>
>>
>> I'm looking for some experiences people out there may have with SMVI
>> with NetApp. We're currently experiencing major issues with SMVI
>> snapshots failing. I've had open tickets with NetApp/VMWare/Microsoft
>> for 3 months and still have yet to have a solution.
>>
>> My environment looks like such:
>>
>> * 6 x HP DL380 G5 (32gb Ram) in a ESX Cluster
>> * Dual Emulex 10000 Cards in each host.
>> * Cisco MDS SAN
>> * Netapp FAS3070 Cluster ~9tb aggregate for VMWare.
>> * VMFS Datastores ~10-15 VMs per datastore. ~50gb per VM.
>> * ASIS Turned on
>> * Volume and LUNspace reservation turned off
>> * OnTap 7.2.5.1
>> * Windows 2003 Guest OS.
>>
>>
>> I cant see us reaching any limitation on the Filers or the SAN. Yet we
>> have random VMs failing snapshots every night. Are other people seeing
>> these issues? (I've gone through the gamut of troubleshooting, version
>> management of ESX/VMWareTools/etc). Snapshots timeout and fail at the
>> VMWare/Guest level, not at the Netapp snapshot level.
>>
>> We want to have SMVI function with VSS enabled.
>>
>> Has anyone had failing snapshots been able to resolve a similar issue?
>> Or does anyone have SMVI working properly that we could use as a
>> reference to compare configuration?
>>
>> __________________________________________________________
>> Ken Williams
>> Storage Administrator, Business Technology Operations Sacramento
>> Municipal Utility District
>> E-Mail: kwillia[at]smud.org
>> Phone: (916) 732-6744
>> Cell: (916) 240-4213
>>
>>
>>
>>
>
>
>
>

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/


silkey at ece

Sep 14, 2009, 7:16 AM

Post #7 of 18 (1708 views)
Permalink
Re: SMVI / VMWare Experiences... [In reply to]

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ken Williams wrote:
> I'm looking for some experiences people out there may have with SMVI
> with NetApp. We're currently experiencing major issues with SMVI
> snapshots failing. I've had open tickets with NetApp/VMWare/Microsoft
> for 3 months and still have yet to have a solution.
>
> My environment looks like such:
>
> * 6 x HP DL380 G5 (32gb Ram) in a ESX Cluster
> * Dual Emulex 10000 Cards in each host.
> * Cisco MDS SAN
> * Netapp FAS3070 Cluster ~9tb aggregate for VMWare.
> * VMFS Datastores ~10-15 VMs per datastore. ~50gb per VM.
> * ASIS Turned on
> * Volume and LUNspace reservation turned off
> * OnTap 7.2.5.1
> * Windows 2003 Guest OS.
>
> I cant see us reaching any limitation on the Filers or the SAN. Yet we
> have random VMs failing snapshots every night. Are other people seeing
> these issues? (I've gone through the gamut of troubleshooting, version
> management of ESX/VMWareTools/etc). Snapshots timeout and fail at the
> VMWare/Guest level, not at the Netapp snapshot level.
>
> We want to have SMVI function with VSS enabled.
>
> Has anyone had failing snapshots been able to resolve a similar issue?
> Or does anyone have SMVI working properly that we could use as a
> reference to compare configuration?

Ken --

We too are experiencing issues with SMVI 1.2 bombing out when
attempting to perform a VMware quiesce snap on _some_ RHEL5.3 32-bit
VMs. A couple of notables:

- - These problematic VMs have a 100% success rate at taking VMware
quiesce snaps within vCenter, independent of SMVI.
- - The problem is 100% reproducible during night, day, etc.
- - We will deploy several VMs at a crack, all the same build. When the
next SMVI schedule hits, some fail while others succeed. Bizarre.
- - Over time (weve been experiencing this issue for several weeks now),
the 'problem' VMs change. Example: VMs abc and xyz will fail for
days; without intervention, VM abc will stop failing while VM xyz
continues to fail ... even if theyre part of the same deploy base
template/kickstart.
- - We are nowhere near our snap limit on the volumes.
- - These problematic VMs only bomb when attempting a quiesce.
Non-quiesce SMVI snaps work like a champ.

Been working with NetApp and VMware for some time now. Were at ESX
3.5u4+ to an 3160-R5 @ 7.2.6.1P3 via NFS + vCenter 4.0 + synch
SnapMirror to another 3160-R5 @ 7.2.6.1P3. The only thing revealing
is SMVI + vCenter logs of "cannot create a quiesced snapshot because
the (user-supplied) custom prefreeze script in the virtual machine
exited with a nonzero return code".

- --
Nick


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkquUEwACgkQrDQjhjXrMeIJCgCg5/3X/iZbjjYjWHzLrfLAg9jA
PhYAoK3B0OIibVec+Y8bUt1pl70WFloB
=Kk4e
-----END PGP SIGNATURE-----


kwillia at smud

Sep 14, 2009, 9:51 AM

Post #8 of 18 (1710 views)
Permalink
RE: SMVI / VMWare Experiences... [In reply to]

Sounds like whatever user-defined script you have is failing sometimes?
Or perhaps it's a VMWare tools issue.

We've been able to track our issue down to the Guest OS level (win2k3
specifically). Looks like its an issue with VSS or LUN alignment.

I would recommend ensuring your LUNs are aligned (use the VMWare host
util kit, mbrscan / mbralign). There is detailed documentation on the
NOW.netapp.com site.

-----Original Message-----
From: Nick Silkey [mailto:nick[at]silkey.org]
Sent: Friday, September 11, 2009 7:15 PM
To: Ken Williams
Cc: toasters[at]mathworks.com
Subject: Re: SMVI / VMWare Experiences...

Ken --

We too are experiencing issues with SMVI 1.2 bombing out when attempting
to perform a VMware quiesce snap on _some_ RHEL5.3 32-bit VMs. A couple
of notables:

- These problematic VMs have a 100% success rate at taking VMware
quiesce snaps within vCenter, independent of SMVI.
- The problem is 100% reproducible during night, day, etc.
- We will deploy several VMs at a crack, all the same build. When the
next SMVI schedule hits, some fail while others succeed. Bizarre.
- Over time (weve been experiencing this issue for several weeks now),
the 'problem' VMs change. Example: VMs abc and xyz will fail for days;
without intervention, VM abc will stop failing while VM xyz continues to
fail ... even if theyre part of the same deploy base template/kickstart.
- We are nowhere near our snap limit on the volumes.
- These problematic VMs only bomb when attempting a quiesce.
Non-quiesce SMVI snaps work like a champ.

Been working with NetApp and VMware for some time now. Were at ESX
3.5u4+ to an 3160-R5 @ 7.2.6.1P3 via NFS + vCenter 4.0 + synch
SnapMirror to another 3160-R5 @ 7.2.6.1P3. The only thing revealing is
SMVI + vCenter logs of "cannot create a quiesced snapshot because the
(user-supplied) custom prefreeze script in the virtual machine exited
with a nonzero return code".

--
Nick

On Wed, Aug 26, 2009 at 5:32 PM, Ken Williams <kwillia[at]smud.org> wrote:
> I'm looking for some experiences people out there may have with SMVI
> with NetApp. We're currently experiencing major issues with SMVI
> snapshots failing. I've had open tickets with NetApp/VMWare/Microsoft
> for 3 months and still have yet to have a solution.
>
> My environment looks like such:
>
> 6 x HP DL380 G5 (32gb Ram) in a ESX Cluster Dual Emulex 10000 Cards in

> each host.
> Cisco MDS SAN
> Netapp FAS3070 Cluster ~9tb aggregate for VMWare.
> VMFS Datastores ~10-15 VMs per datastore. ~50gb per VM.
> ASIS Turned on
> Volume and LUNspace reservation turned off OnTap 7.2.5.1 Windows 2003
> Guest OS.
>
> I cant see us reaching any limitation on the Filers or the SAN. Yet we

> have random VMs failing snapshots every night. Are other people seeing

> these issues? (I've gone through the gamut of troubleshooting, version

> management of ESX/VMWareTools/etc). Snapshots timeout and fail at the
> VMWare/Guest level, not at the Netapp snapshot level.
>
> We want to have SMVI function with VSS enabled.
>
> Has anyone had failing snapshots been able to resolve a similar issue?

> Or does anyone have SMVI working properly that we could use as a
> reference to compare configuration?
>
> __________________________________________________________
> Ken Williams
> Storage Administrator, Business Technology Operations Sacramento
> Municipal Utility District
> E-Mail: kwillia[at]smud.org
> Phone: (916) 732-6744
> Cell: (916) 240-4213


steffen.kammerer at brainlab

Nov 3, 2009, 6:41 AM

Post #9 of 18 (792 views)
Permalink
RE: SMVI / VMWare Experiences... [In reply to]

Hi there,

We have the same issue with SMVI 2.0 on nfs datastores... on some VMs we get the following error:

Cannot create a quiesced snapshot because the create snapshot
operation exceeded the time limit for holding off I/O in the
frozen virtual machine.


Does anybody approach the same error?!

Thanks and best regards,


Steffen



-----Original Message-----
From: owner-toasters[at]mathworks.com [mailto:owner-toasters[at]mathworks.com] On Behalf Of Ken Williams
Sent: Monday, September 14, 2009 6:51 PM
To: Nick Silkey
Cc: toasters[at]mathworks.com
Subject: RE: SMVI / VMWare Experiences...

Sounds like whatever user-defined script you have is failing sometimes?
Or perhaps it's a VMWare tools issue.

We've been able to track our issue down to the Guest OS level (win2k3
specifically). Looks like its an issue with VSS or LUN alignment.

I would recommend ensuring your LUNs are aligned (use the VMWare host
util kit, mbrscan / mbralign). There is detailed documentation on the
NOW.netapp.com site.

-----Original Message-----
From: Nick Silkey [mailto:nick[at]silkey.org]
Sent: Friday, September 11, 2009 7:15 PM
To: Ken Williams
Cc: toasters[at]mathworks.com
Subject: Re: SMVI / VMWare Experiences...

Ken --

We too are experiencing issues with SMVI 1.2 bombing out when attempting
to perform a VMware quiesce snap on _some_ RHEL5.3 32-bit VMs. A couple
of notables:

- These problematic VMs have a 100% success rate at taking VMware
quiesce snaps within vCenter, independent of SMVI.
- The problem is 100% reproducible during night, day, etc.
- We will deploy several VMs at a crack, all the same build. When the
next SMVI schedule hits, some fail while others succeed. Bizarre.
- Over time (weve been experiencing this issue for several weeks now),
the 'problem' VMs change. Example: VMs abc and xyz will fail for days;
without intervention, VM abc will stop failing while VM xyz continues to
fail ... even if theyre part of the same deploy base template/kickstart.
- We are nowhere near our snap limit on the volumes.
- These problematic VMs only bomb when attempting a quiesce.
Non-quiesce SMVI snaps work like a champ.

Been working with NetApp and VMware for some time now. Were at ESX
3.5u4+ to an 3160-R5 @ 7.2.6.1P3 via NFS + vCenter 4.0 + synch
SnapMirror to another 3160-R5 @ 7.2.6.1P3. The only thing revealing is
SMVI + vCenter logs of "cannot create a quiesced snapshot because the
(user-supplied) custom prefreeze script in the virtual machine exited
with a nonzero return code".

--
Nick

On Wed, Aug 26, 2009 at 5:32 PM, Ken Williams <kwillia[at]smud.org> wrote:
> I'm looking for some experiences people out there may have with SMVI
> with NetApp. We're currently experiencing major issues with SMVI
> snapshots failing. I've had open tickets with NetApp/VMWare/Microsoft
> for 3 months and still have yet to have a solution.
>
> My environment looks like such:
>
> 6 x HP DL380 G5 (32gb Ram) in a ESX Cluster Dual Emulex 10000 Cards in

> each host.
> Cisco MDS SAN
> Netapp FAS3070 Cluster ~9tb aggregate for VMWare.
> VMFS Datastores ~10-15 VMs per datastore. ~50gb per VM.
> ASIS Turned on
> Volume and LUNspace reservation turned off OnTap 7.2.5.1 Windows 2003
> Guest OS.
>
> I cant see us reaching any limitation on the Filers or the SAN. Yet we

> have random VMs failing snapshots every night. Are other people seeing

> these issues? (I've gone through the gamut of troubleshooting, version

> management of ESX/VMWareTools/etc). Snapshots timeout and fail at the
> VMWare/Guest level, not at the Netapp snapshot level.
>
> We want to have SMVI function with VSS enabled.
>
> Has anyone had failing snapshots been able to resolve a similar issue?

> Or does anyone have SMVI working properly that we could use as a
> reference to compare configuration?
>
> __________________________________________________________
> Ken Williams
> Storage Administrator, Business Technology Operations Sacramento
> Municipal Utility District
> E-Mail: kwillia[at]smud.org
> Phone: (916) 732-6744
> Cell: (916) 240-4213


Stetson.Webster at netapp

Nov 3, 2009, 7:26 AM

Post #10 of 18 (791 views)
Permalink
RE: SMVI / VMWare Experiences... [In reply to]

This is very commonly an alignment issue. Snapshots exacerbate the
already intense I/O caused by misalignment. Essentially, the existence
of VMware snapshots (although they are 100% successful), cause the I/O
on the system to intensify while ONTAP is actually trying to quiesce for
a snapshot. So we are having a bad situation that intensifies at the
wrong time.

Have you tried to take the NetApp snapshot manually (sans SMVI) after
the VMware snapshots have all completed? My guess is that you will find
that the snapshot takes a very long time to complete (if it does).

Here is a document that discusses block alignment and the breadth of
it's impact:

Best Practices for File System Alignment in Virtual Environments:
http://www.netapp.com/us/library/technical-reports/tr-3747.html

We have a tool called 'mbrscan' for identifying misalignment which is a
part of our ESX Host Utilities available here:

FC Host Utilities for ESX(r):
http://now.netapp.com/NOW/download/software/sanhost_esx/ESX

We also have this KB article for identifying misalignment:

How to diagnose misaligned I/O on Windows hosts:
https://now.netapp.com/Knowledgebase/solutionarea.asp?id=kb36108

Finally, while mbralign is now part of the ESX Host Utilities, there is
good, detailed documentation here as you navigate to the download page:

mbralign:
http://now.netapp.com/NOW/download/tools/mbralign


Stetson M. Webster
Professional Services Consultant
NCIE-SAN, NCIE-B&R, SNIA-SCSN-E
NetApp Professional Services - East
919.250.0052 Mobile
Stetson.Webster[at]netapp.com
Learn how: netapp.com/guarantee




-----Original Message-----
From: Steffen Kammerer [mailto:steffen.kammerer[at]brainlab.com]
Sent: Tuesday, November 03, 2009 9:41 AM
To: Ken Williams; Nick Silkey
Cc: toasters[at]mathworks.com
Subject: RE: SMVI / VMWare Experiences...

Hi there,

We have the same issue with SMVI 2.0 on nfs datastores... on some VMs we
get the following error:

Cannot create a quiesced snapshot because the create snapshot
operation exceeded the time limit for holding off I/O in the
frozen virtual machine.


Does anybody approach the same error?!

Thanks and best regards,


Steffen



-----Original Message-----
From: owner-toasters[at]mathworks.com [mailto:owner-toasters[at]mathworks.com]
On Behalf Of Ken Williams
Sent: Monday, September 14, 2009 6:51 PM
To: Nick Silkey
Cc: toasters[at]mathworks.com
Subject: RE: SMVI / VMWare Experiences...

Sounds like whatever user-defined script you have is failing sometimes?
Or perhaps it's a VMWare tools issue.

We've been able to track our issue down to the Guest OS level (win2k3
specifically). Looks like its an issue with VSS or LUN alignment.

I would recommend ensuring your LUNs are aligned (use the VMWare host
util kit, mbrscan / mbralign). There is detailed documentation on the
NOW.netapp.com site.

-----Original Message-----
From: Nick Silkey [mailto:nick[at]silkey.org]
Sent: Friday, September 11, 2009 7:15 PM
To: Ken Williams
Cc: toasters[at]mathworks.com
Subject: Re: SMVI / VMWare Experiences...

Ken --

We too are experiencing issues with SMVI 1.2 bombing out when attempting
to perform a VMware quiesce snap on _some_ RHEL5.3 32-bit VMs. A couple
of notables:

- These problematic VMs have a 100% success rate at taking VMware
quiesce snaps within vCenter, independent of SMVI.
- The problem is 100% reproducible during night, day, etc.
- We will deploy several VMs at a crack, all the same build. When the
next SMVI schedule hits, some fail while others succeed. Bizarre.
- Over time (weve been experiencing this issue for several weeks now),
the 'problem' VMs change. Example: VMs abc and xyz will fail for days;
without intervention, VM abc will stop failing while VM xyz continues to
fail ... even if theyre part of the same deploy base template/kickstart.
- We are nowhere near our snap limit on the volumes.
- These problematic VMs only bomb when attempting a quiesce.
Non-quiesce SMVI snaps work like a champ.

Been working with NetApp and VMware for some time now. Were at ESX
3.5u4+ to an 3160-R5 @ 7.2.6.1P3 via NFS + vCenter 4.0 + synch
SnapMirror to another 3160-R5 @ 7.2.6.1P3. The only thing revealing is
SMVI + vCenter logs of "cannot create a quiesced snapshot because the
(user-supplied) custom prefreeze script in the virtual machine exited
with a nonzero return code".

--
Nick

On Wed, Aug 26, 2009 at 5:32 PM, Ken Williams <kwillia[at]smud.org> wrote:
> I'm looking for some experiences people out there may have with SMVI
> with NetApp. We're currently experiencing major issues with SMVI
> snapshots failing. I've had open tickets with NetApp/VMWare/Microsoft
> for 3 months and still have yet to have a solution.
>
> My environment looks like such:
>
> 6 x HP DL380 G5 (32gb Ram) in a ESX Cluster Dual Emulex 10000 Cards in

> each host.
> Cisco MDS SAN
> Netapp FAS3070 Cluster ~9tb aggregate for VMWare.
> VMFS Datastores ~10-15 VMs per datastore. ~50gb per VM.
> ASIS Turned on
> Volume and LUNspace reservation turned off OnTap 7.2.5.1 Windows 2003
> Guest OS.
>
> I cant see us reaching any limitation on the Filers or the SAN. Yet we

> have random VMs failing snapshots every night. Are other people seeing

> these issues? (I've gone through the gamut of troubleshooting, version

> management of ESX/VMWareTools/etc). Snapshots timeout and fail at the
> VMWare/Guest level, not at the Netapp snapshot level.
>
> We want to have SMVI function with VSS enabled.
>
> Has anyone had failing snapshots been able to resolve a similar issue?

> Or does anyone have SMVI working properly that we could use as a
> reference to compare configuration?
>
> __________________________________________________________
> Ken Williams
> Storage Administrator, Business Technology Operations Sacramento
> Municipal Utility District
> E-Mail: kwillia[at]smud.org
> Phone: (916) 732-6744
> Cell: (916) 240-4213


jeremy.page at gilbarco

Nov 3, 2009, 7:38 AM

Post #11 of 18 (791 views)
Permalink
RE: SMVI / VMWare Experiences... [In reply to]

If your VMs are Windows it's relatively simple to do a WMI scan against
them for the partition offset and then divide by 4096 (?) and make sure
it's an even #. If not your not set up ideally. Something like below
(please excuse my scriptomatic generated code).

Set objWMIService = GetObject("winmgmts:\\" & strComputer &
"\root\CIMV2")
Set colItems = objWMIService.ExecQuery("SELECT * FROM
Win32_DiskPartition", "WQL", _
wbemFlagReturnImmediately +
wbemFlagForwardOnly)

For Each objItem In colItems
WScript.Echo "StartingOffset: " & objItem.StartingOffset
Next

-----Original Message-----
From: owner-toasters[at]mathworks.com [mailto:owner-toasters[at]mathworks.com]
On Behalf Of Webster, Stetson
Sent: Tuesday, November 03, 2009 10:26 AM
To: Steffen Kammerer; Ken Williams; Nick Silkey
Cc: toasters[at]mathworks.com
Subject: RE: SMVI / VMWare Experiences...

This is very commonly an alignment issue. Snapshots exacerbate the
already intense I/O caused by misalignment. Essentially, the existence
of VMware snapshots (although they are 100% successful), cause the I/O
on the system to intensify while ONTAP is actually trying to quiesce for
a snapshot. So we are having a bad situation that intensifies at the
wrong time.

Have you tried to take the NetApp snapshot manually (sans SMVI) after
the VMware snapshots have all completed? My guess is that you will find
that the snapshot takes a very long time to complete (if it does).

Here is a document that discusses block alignment and the breadth of
it's impact:

Best Practices for File System Alignment in Virtual Environments:
http://www.netapp.com/us/library/technical-reports/tr-3747.html

We have a tool called 'mbrscan' for identifying misalignment which is a
part of our ESX Host Utilities available here:

FC Host Utilities for ESX(r):
http://now.netapp.com/NOW/download/software/sanhost_esx/ESX

We also have this KB article for identifying misalignment:

How to diagnose misaligned I/O on Windows hosts:
https://now.netapp.com/Knowledgebase/solutionarea.asp?id=kb36108

Finally, while mbralign is now part of the ESX Host Utilities, there is
good, detailed documentation here as you navigate to the download page:

mbralign:
http://now.netapp.com/NOW/download/tools/mbralign


Stetson M. Webster
Professional Services Consultant
NCIE-SAN, NCIE-B&R, SNIA-SCSN-E
NetApp Professional Services - East
919.250.0052 Mobile
Stetson.Webster[at]netapp.com
Learn how: netapp.com/guarantee




-----Original Message-----
From: Steffen Kammerer [mailto:steffen.kammerer[at]brainlab.com]
Sent: Tuesday, November 03, 2009 9:41 AM
To: Ken Williams; Nick Silkey
Cc: toasters[at]mathworks.com
Subject: RE: SMVI / VMWare Experiences...

Hi there,

We have the same issue with SMVI 2.0 on nfs datastores... on some VMs we
get the following error:

Cannot create a quiesced snapshot because the create snapshot
operation exceeded the time limit for holding off I/O in the
frozen virtual machine.


Does anybody approach the same error?!

Thanks and best regards,


Steffen



-----Original Message-----
From: owner-toasters[at]mathworks.com [mailto:owner-toasters[at]mathworks.com]
On Behalf Of Ken Williams
Sent: Monday, September 14, 2009 6:51 PM
To: Nick Silkey
Cc: toasters[at]mathworks.com
Subject: RE: SMVI / VMWare Experiences...

Sounds like whatever user-defined script you have is failing sometimes?
Or perhaps it's a VMWare tools issue.

We've been able to track our issue down to the Guest OS level (win2k3
specifically). Looks like its an issue with VSS or LUN alignment.

I would recommend ensuring your LUNs are aligned (use the VMWare host
util kit, mbrscan / mbralign). There is detailed documentation on the
NOW.netapp.com site.

-----Original Message-----
From: Nick Silkey [mailto:nick[at]silkey.org]
Sent: Friday, September 11, 2009 7:15 PM
To: Ken Williams
Cc: toasters[at]mathworks.com
Subject: Re: SMVI / VMWare Experiences...

Ken --

We too are experiencing issues with SMVI 1.2 bombing out when attempting
to perform a VMware quiesce snap on _some_ RHEL5.3 32-bit VMs. A couple
of notables:

- These problematic VMs have a 100% success rate at taking VMware
quiesce snaps within vCenter, independent of SMVI.
- The problem is 100% reproducible during night, day, etc.
- We will deploy several VMs at a crack, all the same build. When the
next SMVI schedule hits, some fail while others succeed. Bizarre.
- Over time (weve been experiencing this issue for several weeks now),
the 'problem' VMs change. Example: VMs abc and xyz will fail for days;
without intervention, VM abc will stop failing while VM xyz continues to
fail ... even if theyre part of the same deploy base template/kickstart.
- We are nowhere near our snap limit on the volumes.
- These problematic VMs only bomb when attempting a quiesce.
Non-quiesce SMVI snaps work like a champ.

Been working with NetApp and VMware for some time now. Were at ESX
3.5u4+ to an 3160-R5 @ 7.2.6.1P3 via NFS + vCenter 4.0 + synch
SnapMirror to another 3160-R5 @ 7.2.6.1P3. The only thing revealing is
SMVI + vCenter logs of "cannot create a quiesced snapshot because the
(user-supplied) custom prefreeze script in the virtual machine exited
with a nonzero return code".

--
Nick

On Wed, Aug 26, 2009 at 5:32 PM, Ken Williams <kwillia[at]smud.org> wrote:
> I'm looking for some experiences people out there may have with SMVI
> with NetApp. We're currently experiencing major issues with SMVI
> snapshots failing. I've had open tickets with NetApp/VMWare/Microsoft
> for 3 months and still have yet to have a solution.
>
> My environment looks like such:
>
> 6 x HP DL380 G5 (32gb Ram) in a ESX Cluster Dual Emulex 10000 Cards in

> each host.
> Cisco MDS SAN
> Netapp FAS3070 Cluster ~9tb aggregate for VMWare.
> VMFS Datastores ~10-15 VMs per datastore. ~50gb per VM.
> ASIS Turned on
> Volume and LUNspace reservation turned off OnTap 7.2.5.1 Windows 2003
> Guest OS.
>
> I cant see us reaching any limitation on the Filers or the SAN. Yet we

> have random VMs failing snapshots every night. Are other people seeing

> these issues? (I've gone through the gamut of troubleshooting, version

> management of ESX/VMWareTools/etc). Snapshots timeout and fail at the
> VMWare/Guest level, not at the Netapp snapshot level.
>
> We want to have SMVI function with VSS enabled.
>
> Has anyone had failing snapshots been able to resolve a similar issue?

> Or does anyone have SMVI working properly that we could use as a
> reference to compare configuration?
>
> __________________________________________________________
> Ken Williams
> Storage Administrator, Business Technology Operations Sacramento
> Municipal Utility District
> E-Mail: kwillia[at]smud.org
> Phone: (916) 732-6744
> Cell: (916) 240-4213







Please be advised that this email may contain confidential information.
If you are not the intended recipient, please do not read, copy or
re-transmit this email. If you have received this email in error,
please notify us by email by replying to the sender and by telephone
(call us collect at +1 202-828-0850) and delete this message and any
attachments. Thank you in advance for your cooperation and assistance.

In addition, Danaher and its subsidiaries disclaim that the content of
this email constitutes an offer to enter into, or the acceptance of,
any
contract or agreement or any amendment thereto; provided that the
foregoing disclaimer does not invalidate the binding effect of any
digital or other electronic reproduction of a manual signature that is
included in any attachment to this email.


jeremy.page at gilbarco

Nov 3, 2009, 7:42 AM

Post #12 of 18 (791 views)
Permalink
RE: SMVI / VMWare Experiences... [In reply to]

Crud. Make sure the partition offset/4096 is an integer, not even.
Sorry.

And I'm not too certain about the 4096 #, read the TR :)

-----Original Message-----
From: owner-toasters[at]mathworks.com [mailto:owner-toasters[at]mathworks.com]
On Behalf Of Page, Jeremy
Sent: Tuesday, November 03, 2009 10:38 AM
To: Webster, Stetson; Steffen Kammerer; Ken Williams; Nick Silkey
Cc: toasters[at]mathworks.com
Subject: RE: SMVI / VMWare Experiences...

If your VMs are Windows it's relatively simple to do a WMI scan against
them for the partition offset and then divide by 4096 (?) and make sure
it's an even #. If not your not set up ideally. Something like below
(please excuse my scriptomatic generated code).

Set objWMIService = GetObject("winmgmts:\\" & strComputer &
"\root\CIMV2")
Set colItems = objWMIService.ExecQuery("SELECT * FROM
Win32_DiskPartition", "WQL", _
wbemFlagReturnImmediately +
wbemFlagForwardOnly)

For Each objItem In colItems
WScript.Echo "StartingOffset: " & objItem.StartingOffset
Next

-----Original Message-----
From: owner-toasters[at]mathworks.com [mailto:owner-toasters[at]mathworks.com]
On Behalf Of Webster, Stetson
Sent: Tuesday, November 03, 2009 10:26 AM
To: Steffen Kammerer; Ken Williams; Nick Silkey
Cc: toasters[at]mathworks.com
Subject: RE: SMVI / VMWare Experiences...

This is very commonly an alignment issue. Snapshots exacerbate the
already intense I/O caused by misalignment. Essentially, the existence
of VMware snapshots (although they are 100% successful), cause the I/O
on the system to intensify while ONTAP is actually trying to quiesce for
a snapshot. So we are having a bad situation that intensifies at the
wrong time.

Have you tried to take the NetApp snapshot manually (sans SMVI) after
the VMware snapshots have all completed? My guess is that you will find
that the snapshot takes a very long time to complete (if it does).

Here is a document that discusses block alignment and the breadth of
it's impact:

Best Practices for File System Alignment in Virtual Environments:
http://www.netapp.com/us/library/technical-reports/tr-3747.html

We have a tool called 'mbrscan' for identifying misalignment which is a
part of our ESX Host Utilities available here:

FC Host Utilities for ESX(r):
http://now.netapp.com/NOW/download/software/sanhost_esx/ESX

We also have this KB article for identifying misalignment:

How to diagnose misaligned I/O on Windows hosts:
https://now.netapp.com/Knowledgebase/solutionarea.asp?id=kb36108

Finally, while mbralign is now part of the ESX Host Utilities, there is
good, detailed documentation here as you navigate to the download page:

mbralign:
http://now.netapp.com/NOW/download/tools/mbralign


Stetson M. Webster
Professional Services Consultant
NCIE-SAN, NCIE-B&R, SNIA-SCSN-E
NetApp Professional Services - East
919.250.0052 Mobile
Stetson.Webster[at]netapp.com
Learn how: netapp.com/guarantee




-----Original Message-----
From: Steffen Kammerer [mailto:steffen.kammerer[at]brainlab.com]
Sent: Tuesday, November 03, 2009 9:41 AM
To: Ken Williams; Nick Silkey
Cc: toasters[at]mathworks.com
Subject: RE: SMVI / VMWare Experiences...

Hi there,

We have the same issue with SMVI 2.0 on nfs datastores... on some VMs we
get the following error:

Cannot create a quiesced snapshot because the create snapshot
operation exceeded the time limit for holding off I/O in the
frozen virtual machine.


Does anybody approach the same error?!

Thanks and best regards,


Steffen



-----Original Message-----
From: owner-toasters[at]mathworks.com [mailto:owner-toasters[at]mathworks.com]
On Behalf Of Ken Williams
Sent: Monday, September 14, 2009 6:51 PM
To: Nick Silkey
Cc: toasters[at]mathworks.com
Subject: RE: SMVI / VMWare Experiences...

Sounds like whatever user-defined script you have is failing sometimes?
Or perhaps it's a VMWare tools issue.

We've been able to track our issue down to the Guest OS level (win2k3
specifically). Looks like its an issue with VSS or LUN alignment.

I would recommend ensuring your LUNs are aligned (use the VMWare host
util kit, mbrscan / mbralign). There is detailed documentation on the
NOW.netapp.com site.

-----Original Message-----
From: Nick Silkey [mailto:nick[at]silkey.org]
Sent: Friday, September 11, 2009 7:15 PM
To: Ken Williams
Cc: toasters[at]mathworks.com
Subject: Re: SMVI / VMWare Experiences...

Ken --

We too are experiencing issues with SMVI 1.2 bombing out when attempting
to perform a VMware quiesce snap on _some_ RHEL5.3 32-bit VMs. A couple
of notables:

- These problematic VMs have a 100% success rate at taking VMware
quiesce snaps within vCenter, independent of SMVI.
- The problem is 100% reproducible during night, day, etc.
- We will deploy several VMs at a crack, all the same build. When the
next SMVI schedule hits, some fail while others succeed. Bizarre.
- Over time (weve been experiencing this issue for several weeks now),
the 'problem' VMs change. Example: VMs abc and xyz will fail for days;
without intervention, VM abc will stop failing while VM xyz continues to
fail ... even if theyre part of the same deploy base template/kickstart.
- We are nowhere near our snap limit on the volumes.
- These problematic VMs only bomb when attempting a quiesce.
Non-quiesce SMVI snaps work like a champ.

Been working with NetApp and VMware for some time now. Were at ESX
3.5u4+ to an 3160-R5 @ 7.2.6.1P3 via NFS + vCenter 4.0 + synch
SnapMirror to another 3160-R5 @ 7.2.6.1P3. The only thing revealing is
SMVI + vCenter logs of "cannot create a quiesced snapshot because the
(user-supplied) custom prefreeze script in the virtual machine exited
with a nonzero return code".

--
Nick

On Wed, Aug 26, 2009 at 5:32 PM, Ken Williams <kwillia[at]smud.org> wrote:
> I'm looking for some experiences people out there may have with SMVI
> with NetApp. We're currently experiencing major issues with SMVI
> snapshots failing. I've had open tickets with NetApp/VMWare/Microsoft
> for 3 months and still have yet to have a solution.
>
> My environment looks like such:
>
> 6 x HP DL380 G5 (32gb Ram) in a ESX Cluster Dual Emulex 10000 Cards in

> each host.
> Cisco MDS SAN
> Netapp FAS3070 Cluster ~9tb aggregate for VMWare.
> VMFS Datastores ~10-15 VMs per datastore. ~50gb per VM.
> ASIS Turned on
> Volume and LUNspace reservation turned off OnTap 7.2.5.1 Windows 2003
> Guest OS.
>
> I cant see us reaching any limitation on the Filers or the SAN. Yet we

> have random VMs failing snapshots every night. Are other people seeing

> these issues? (I've gone through the gamut of troubleshooting, version

> management of ESX/VMWareTools/etc). Snapshots timeout and fail at the
> VMWare/Guest level, not at the Netapp snapshot level.
>
> We want to have SMVI function with VSS enabled.
>
> Has anyone had failing snapshots been able to resolve a similar issue?

> Or does anyone have SMVI working properly that we could use as a
> reference to compare configuration?
>
> __________________________________________________________
> Ken Williams
> Storage Administrator, Business Technology Operations Sacramento
> Municipal Utility District
> E-Mail: kwillia[at]smud.org
> Phone: (916) 732-6744
> Cell: (916) 240-4213







Please be advised that this email may contain confidential information.
If you are not the intended recipient, please do not read, copy or
re-transmit this email. If you have received this email in error,
please notify us by email by replying to the sender and by telephone
(call us collect at +1 202-828-0850) and delete this message and any
attachments. Thank you in advance for your cooperation and assistance.

In addition, Danaher and its subsidiaries disclaim that the content of
this email constitutes an offer to enter into, or the acceptance of,
any
contract or agreement or any amendment thereto; provided that the
foregoing disclaimer does not invalidate the binding effect of any
digital or other electronic reproduction of a manual signature that is
included in any attachment to this email.



Please be advised that this email may contain confidential information.
If you are not the intended recipient, please do not read, copy or
re-transmit this email. If you have received this email in error,
please notify us by email by replying to the sender and by telephone
(call us collect at +1 202-828-0850) and delete this message and any
attachments. Thank you in advance for your cooperation and assistance.

In addition, Danaher and its subsidiaries disclaim that the content of
this email constitutes an offer to enter into, or the acceptance of,
any
contract or agreement or any amendment thereto; provided that the
foregoing disclaimer does not invalidate the binding effect of any
digital or other electronic reproduction of a manual signature that is
included in any attachment to this email.


jeremy.page at gilbarco

Nov 3, 2009, 7:42 AM

Post #13 of 18 (481 views)
Permalink
RE: SMVI / VMWare Experiences... [In reply to]

Crud. Make sure the partition offset/4096 is an integer, not even.
Sorry.

And I'm not too certain about the 4096 #, read the TR :)

-----Original Message-----
From: owner-toasters[at]mathworks.com [mailto:owner-toasters[at]mathworks.com]
On Behalf Of Page, Jeremy
Sent: Tuesday, November 03, 2009 10:38 AM
To: Webster, Stetson; Steffen Kammerer; Ken Williams; Nick Silkey
Cc: toasters[at]mathworks.com
Subject: RE: SMVI / VMWare Experiences...

If your VMs are Windows it's relatively simple to do a WMI scan against
them for the partition offset and then divide by 4096 (?) and make sure
it's an even #. If not your not set up ideally. Something like below
(please excuse my scriptomatic generated code).

Set objWMIService = GetObject("winmgmts:\\" & strComputer &
"\root\CIMV2")
Set colItems = objWMIService.ExecQuery("SELECT * FROM
Win32_DiskPartition", "WQL", _
wbemFlagReturnImmediately +
wbemFlagForwardOnly)

For Each objItem In colItems
WScript.Echo "StartingOffset: " & objItem.StartingOffset
Next

-----Original Message-----
From: owner-toasters[at]mathworks.com [mailto:owner-toasters[at]mathworks.com]
On Behalf Of Webster, Stetson
Sent: Tuesday, November 03, 2009 10:26 AM
To: Steffen Kammerer; Ken Williams; Nick Silkey
Cc: toasters[at]mathworks.com
Subject: RE: SMVI / VMWare Experiences...

This is very commonly an alignment issue. Snapshots exacerbate the
already intense I/O caused by misalignment. Essentially, the existence
of VMware snapshots (although they are 100% successful), cause the I/O
on the system to intensify while ONTAP is actually trying to quiesce for
a snapshot. So we are having a bad situation that intensifies at the
wrong time.

Have you tried to take the NetApp snapshot manually (sans SMVI) after
the VMware snapshots have all completed? My guess is that you will find
that the snapshot takes a very long time to complete (if it does).

Here is a document that discusses block alignment and the breadth of
it's impact:

Best Practices for File System Alignment in Virtual Environments:
http://www.netapp.com/us/library/technical-reports/tr-3747.html

We have a tool called 'mbrscan' for identifying misalignment which is a
part of our ESX Host Utilities available here:

FC Host Utilities for ESX(r):
http://now.netapp.com/NOW/download/software/sanhost_esx/ESX

We also have this KB article for identifying misalignment:

How to diagnose misaligned I/O on Windows hosts:
https://now.netapp.com/Knowledgebase/solutionarea.asp?id=kb36108

Finally, while mbralign is now part of the ESX Host Utilities, there is
good, detailed documentation here as you navigate to the download page:

mbralign:
http://now.netapp.com/NOW/download/tools/mbralign


Stetson M. Webster
Professional Services Consultant
NCIE-SAN, NCIE-B&R, SNIA-SCSN-E
NetApp Professional Services - East
919.250.0052 Mobile
Stetson.Webster[at]netapp.com
Learn how: netapp.com/guarantee




-----Original Message-----
From: Steffen Kammerer [mailto:steffen.kammerer[at]brainlab.com]
Sent: Tuesday, November 03, 2009 9:41 AM
To: Ken Williams; Nick Silkey
Cc: toasters[at]mathworks.com
Subject: RE: SMVI / VMWare Experiences...

Hi there,

We have the same issue with SMVI 2.0 on nfs datastores... on some VMs we
get the following error:

Cannot create a quiesced snapshot because the create snapshot
operation exceeded the time limit for holding off I/O in the
frozen virtual machine.


Does anybody approach the same error?!

Thanks and best regards,


Steffen



-----Original Message-----
From: owner-toasters[at]mathworks.com [mailto:owner-toasters[at]mathworks.com]
On Behalf Of Ken Williams
Sent: Monday, September 14, 2009 6:51 PM
To: Nick Silkey
Cc: toasters[at]mathworks.com
Subject: RE: SMVI / VMWare Experiences...

Sounds like whatever user-defined script you have is failing sometimes?
Or perhaps it's a VMWare tools issue.

We've been able to track our issue down to the Guest OS level (win2k3
specifically). Looks like its an issue with VSS or LUN alignment.

I would recommend ensuring your LUNs are aligned (use the VMWare host
util kit, mbrscan / mbralign). There is detailed documentation on the
NOW.netapp.com site.

-----Original Message-----
From: Nick Silkey [mailto:nick[at]silkey.org]
Sent: Friday, September 11, 2009 7:15 PM
To: Ken Williams
Cc: toasters[at]mathworks.com
Subject: Re: SMVI / VMWare Experiences...

Ken --

We too are experiencing issues with SMVI 1.2 bombing out when attempting
to perform a VMware quiesce snap on _some_ RHEL5.3 32-bit VMs. A couple
of notables:

- These problematic VMs have a 100% success rate at taking VMware
quiesce snaps within vCenter, independent of SMVI.
- The problem is 100% reproducible during night, day, etc.
- We will deploy several VMs at a crack, all the same build. When the
next SMVI schedule hits, some fail while others succeed. Bizarre.
- Over time (weve been experiencing this issue for several weeks now),
the 'problem' VMs change. Example: VMs abc and xyz will fail for days;
without intervention, VM abc will stop failing while VM xyz continues to
fail ... even if theyre part of the same deploy base template/kickstart.
- We are nowhere near our snap limit on the volumes.
- These problematic VMs only bomb when attempting a quiesce.
Non-quiesce SMVI snaps work like a champ.

Been working with NetApp and VMware for some time now. Were at ESX
3.5u4+ to an 3160-R5 @ 7.2.6.1P3 via NFS + vCenter 4.0 + synch
SnapMirror to another 3160-R5 @ 7.2.6.1P3. The only thing revealing is
SMVI + vCenter logs of "cannot create a quiesced snapshot because the
(user-supplied) custom prefreeze script in the virtual machine exited
with a nonzero return code".

--
Nick

On Wed, Aug 26, 2009 at 5:32 PM, Ken Williams <kwillia[at]smud.org> wrote:
> I'm looking for some experiences people out there may have with SMVI
> with NetApp. We're currently experiencing major issues with SMVI
> snapshots failing. I've had open tickets with NetApp/VMWare/Microsoft
> for 3 months and still have yet to have a solution.
>
> My environment looks like such:
>
> 6 x HP DL380 G5 (32gb Ram) in a ESX Cluster Dual Emulex 10000 Cards in

> each host.
> Cisco MDS SAN
> Netapp FAS3070 Cluster ~9tb aggregate for VMWare.
> VMFS Datastores ~10-15 VMs per datastore. ~50gb per VM.
> ASIS Turned on
> Volume and LUNspace reservation turned off OnTap 7.2.5.1 Windows 2003
> Guest OS.
>
> I cant see us reaching any limitation on the Filers or the SAN. Yet we

> have random VMs failing snapshots every night. Are other people seeing

> these issues? (I've gone through the gamut of troubleshooting, version

> management of ESX/VMWareTools/etc). Snapshots timeout and fail at the
> VMWare/Guest level, not at the Netapp snapshot level.
>
> We want to have SMVI function with VSS enabled.
>
> Has anyone had failing snapshots been able to resolve a similar issue?

> Or does anyone have SMVI working properly that we could use as a
> reference to compare configuration?
>
> __________________________________________________________
> Ken Williams
> Storage Administrator, Business Technology Operations Sacramento
> Municipal Utility District
> E-Mail: kwillia[at]smud.org
> Phone: (916) 732-6744
> Cell: (916) 240-4213







Please be advised that this email may contain confidential information.
If you are not the intended recipient, please do not read, copy or
re-transmit this email. If you have received this email in error,
please notify us by email by replying to the sender and by telephone
(call us collect at +1 202-828-0850) and delete this message and any
attachments. Thank you in advance for your cooperation and assistance.

In addition, Danaher and its subsidiaries disclaim that the content of
this email constitutes an offer to enter into, or the acceptance of,
any
contract or agreement or any amendment thereto; provided that the
foregoing disclaimer does not invalidate the binding effect of any
digital or other electronic reproduction of a manual signature that is
included in any attachment to this email.



Please be advised that this email may contain confidential information.
If you are not the intended recipient, please do not read, copy or
re-transmit this email. If you have received this email in error,
please notify us by email by replying to the sender and by telephone
(call us collect at +1 202-828-0850) and delete this message and any
attachments. Thank you in advance for your cooperation and assistance.

In addition, Danaher and its subsidiaries disclaim that the content of
this email constitutes an offer to enter into, or the acceptance of,
any
contract or agreement or any amendment thereto; provided that the
foregoing disclaimer does not invalidate the binding effect of any
digital or other electronic reproduction of a manual signature that is
included in any attachment to this email.


kwillia at smud

Nov 3, 2009, 9:21 AM

Post #14 of 18 (787 views)
Permalink
RE: SMVI / VMWare Experiences... [In reply to]

Yes, we have the same issue.

Came down to a few things:

1. Update to 7.3.2p7. There are algorithm changes to WAFL that help with
VMFS/VMDK reading/writing. I saw a HUGE performance change with this.

2. Check your CPU (systat 0), ours was pegged due to NDMP backups;
changing the times helped out a bunch.

3. Disable File Sync in VMWare tools on each guest. The File Sync driver
is problematic and not recommended. This is on each guest in add/remove
programs for VMWare tools.

4. VMWare admitted this is a problem; most users accept the work around
to not do quiesced backups. There is a checkbox in SMVI that will allow
you to not do VMWare level snaps.

Otherwise try snaping smaller groups (10 max) of VMs. We're at about 80%
success right now; not great but moving in the right direction.


-----Original Message-----
From: Steffen Kammerer [mailto:steffen.kammerer[at]brainlab.com]
Sent: Tuesday, November 03, 2009 6:41 AM
To: Ken Williams; Nick Silkey
Cc: toasters[at]mathworks.com
Subject: RE: SMVI / VMWare Experiences...

Hi there,

We have the same issue with SMVI 2.0 on nfs datastores... on some VMs we
get the following error:

Cannot create a quiesced snapshot because the create snapshot
operation exceeded the time limit for holding off I/O in the
frozen virtual machine.


Does anybody approach the same error?!

Thanks and best regards,


Steffen



-----Original Message-----
From: owner-toasters[at]mathworks.com [mailto:owner-toasters[at]mathworks.com]
On Behalf Of Ken Williams
Sent: Monday, September 14, 2009 6:51 PM
To: Nick Silkey
Cc: toasters[at]mathworks.com
Subject: RE: SMVI / VMWare Experiences...

Sounds like whatever user-defined script you have is failing sometimes?
Or perhaps it's a VMWare tools issue.

We've been able to track our issue down to the Guest OS level (win2k3
specifically). Looks like its an issue with VSS or LUN alignment.

I would recommend ensuring your LUNs are aligned (use the VMWare host
util kit, mbrscan / mbralign). There is detailed documentation on the
NOW.netapp.com site.

-----Original Message-----
From: Nick Silkey [mailto:nick[at]silkey.org]
Sent: Friday, September 11, 2009 7:15 PM
To: Ken Williams
Cc: toasters[at]mathworks.com
Subject: Re: SMVI / VMWare Experiences...

Ken --

We too are experiencing issues with SMVI 1.2 bombing out when attempting
to perform a VMware quiesce snap on _some_ RHEL5.3 32-bit VMs. A couple
of notables:

- These problematic VMs have a 100% success rate at taking VMware
quiesce snaps within vCenter, independent of SMVI.
- The problem is 100% reproducible during night, day, etc.
- We will deploy several VMs at a crack, all the same build. When the
next SMVI schedule hits, some fail while others succeed. Bizarre.
- Over time (weve been experiencing this issue for several weeks now),
the 'problem' VMs change. Example: VMs abc and xyz will fail for days;
without intervention, VM abc will stop failing while VM xyz continues to
fail ... even if theyre part of the same deploy base template/kickstart.
- We are nowhere near our snap limit on the volumes.
- These problematic VMs only bomb when attempting a quiesce.
Non-quiesce SMVI snaps work like a champ.

Been working with NetApp and VMware for some time now. Were at ESX
3.5u4+ to an 3160-R5 @ 7.2.6.1P3 via NFS + vCenter 4.0 + synch
SnapMirror to another 3160-R5 @ 7.2.6.1P3. The only thing revealing is
SMVI + vCenter logs of "cannot create a quiesced snapshot because the
(user-supplied) custom prefreeze script in the virtual machine exited
with a nonzero return code".

--
Nick

On Wed, Aug 26, 2009 at 5:32 PM, Ken Williams <kwillia[at]smud.org> wrote:
> I'm looking for some experiences people out there may have with SMVI
> with NetApp. We're currently experiencing major issues with SMVI
> snapshots failing. I've had open tickets with NetApp/VMWare/Microsoft
> for 3 months and still have yet to have a solution.
>
> My environment looks like such:
>
> 6 x HP DL380 G5 (32gb Ram) in a ESX Cluster Dual Emulex 10000 Cards in

> each host.
> Cisco MDS SAN
> Netapp FAS3070 Cluster ~9tb aggregate for VMWare.
> VMFS Datastores ~10-15 VMs per datastore. ~50gb per VM.
> ASIS Turned on
> Volume and LUNspace reservation turned off OnTap 7.2.5.1 Windows 2003
> Guest OS.
>
> I cant see us reaching any limitation on the Filers or the SAN. Yet we

> have random VMs failing snapshots every night. Are other people seeing

> these issues? (I've gone through the gamut of troubleshooting, version

> management of ESX/VMWareTools/etc). Snapshots timeout and fail at the
> VMWare/Guest level, not at the Netapp snapshot level.
>
> We want to have SMVI function with VSS enabled.
>
> Has anyone had failing snapshots been able to resolve a similar issue?

> Or does anyone have SMVI working properly that we could use as a
> reference to compare configuration?
>
> __________________________________________________________
> Ken Williams
> Storage Administrator, Business Technology Operations Sacramento
> Municipal Utility District
> E-Mail: kwillia[at]smud.org
> Phone: (916) 732-6744
> Cell: (916) 240-4213


kwillia at smud

Nov 3, 2009, 9:52 AM

Post #15 of 18 (787 views)
Permalink
RE: SMVI / VMWare Experiences... [In reply to]

Data ONTap version should read 7.3.1.1p7.

-----Original Message-----
From: Ken Williams
Sent: Tuesday, November 03, 2009 9:21 AM
To: 'Steffen Kammerer'; Nick Silkey
Cc: toasters[at]mathworks.com
Subject: RE: SMVI / VMWare Experiences...

Yes, we have the same issue.

Came down to a few things:

1. Update to 7.3.2p7. There are algorithm changes to WAFL that help with
VMFS/VMDK reading/writing. I saw a HUGE performance change with this.

2. Check your CPU (systat 0), ours was pegged due to NDMP backups;
changing the times helped out a bunch.

3. Disable File Sync in VMWare tools on each guest. The File Sync driver
is problematic and not recommended. This is on each guest in add/remove
programs for VMWare tools.

4. VMWare admitted this is a problem; most users accept the work around
to not do quiesced backups. There is a checkbox in SMVI that will allow
you to not do VMWare level snaps.

Otherwise try snaping smaller groups (10 max) of VMs. We're at about 80%
success right now; not great but moving in the right direction.


-----Original Message-----
From: Steffen Kammerer [mailto:steffen.kammerer[at]brainlab.com]
Sent: Tuesday, November 03, 2009 6:41 AM
To: Ken Williams; Nick Silkey
Cc: toasters[at]mathworks.com
Subject: RE: SMVI / VMWare Experiences...

Hi there,

We have the same issue with SMVI 2.0 on nfs datastores... on some VMs we
get the following error:

Cannot create a quiesced snapshot because the create snapshot
operation exceeded the time limit for holding off I/O in the
frozen virtual machine.


Does anybody approach the same error?!

Thanks and best regards,


Steffen



-----Original Message-----
From: owner-toasters[at]mathworks.com [mailto:owner-toasters[at]mathworks.com]
On Behalf Of Ken Williams
Sent: Monday, September 14, 2009 6:51 PM
To: Nick Silkey
Cc: toasters[at]mathworks.com
Subject: RE: SMVI / VMWare Experiences...

Sounds like whatever user-defined script you have is failing sometimes?
Or perhaps it's a VMWare tools issue.

We've been able to track our issue down to the Guest OS level (win2k3
specifically). Looks like its an issue with VSS or LUN alignment.

I would recommend ensuring your LUNs are aligned (use the VMWare host
util kit, mbrscan / mbralign). There is detailed documentation on the
NOW.netapp.com site.

-----Original Message-----
From: Nick Silkey [mailto:nick[at]silkey.org]
Sent: Friday, September 11, 2009 7:15 PM
To: Ken Williams
Cc: toasters[at]mathworks.com
Subject: Re: SMVI / VMWare Experiences...

Ken --

We too are experiencing issues with SMVI 1.2 bombing out when attempting
to perform a VMware quiesce snap on _some_ RHEL5.3 32-bit VMs. A couple
of notables:

- These problematic VMs have a 100% success rate at taking VMware
quiesce snaps within vCenter, independent of SMVI.
- The problem is 100% reproducible during night, day, etc.
- We will deploy several VMs at a crack, all the same build. When the
next SMVI schedule hits, some fail while others succeed. Bizarre.
- Over time (weve been experiencing this issue for several weeks now),
the 'problem' VMs change. Example: VMs abc and xyz will fail for days;
without intervention, VM abc will stop failing while VM xyz continues to
fail ... even if theyre part of the same deploy base template/kickstart.
- We are nowhere near our snap limit on the volumes.
- These problematic VMs only bomb when attempting a quiesce.
Non-quiesce SMVI snaps work like a champ.

Been working with NetApp and VMware for some time now. Were at ESX
3.5u4+ to an 3160-R5 @ 7.2.6.1P3 via NFS + vCenter 4.0 + synch
SnapMirror to another 3160-R5 @ 7.2.6.1P3. The only thing revealing is
SMVI + vCenter logs of "cannot create a quiesced snapshot because the
(user-supplied) custom prefreeze script in the virtual machine exited
with a nonzero return code".

--
Nick

On Wed, Aug 26, 2009 at 5:32 PM, Ken Williams <kwillia[at]smud.org> wrote:
> I'm looking for some experiences people out there may have with SMVI
> with NetApp. We're currently experiencing major issues with SMVI
> snapshots failing. I've had open tickets with NetApp/VMWare/Microsoft
> for 3 months and still have yet to have a solution.
>
> My environment looks like such:
>
> 6 x HP DL380 G5 (32gb Ram) in a ESX Cluster Dual Emulex 10000 Cards in

> each host.
> Cisco MDS SAN
> Netapp FAS3070 Cluster ~9tb aggregate for VMWare.
> VMFS Datastores ~10-15 VMs per datastore. ~50gb per VM.
> ASIS Turned on
> Volume and LUNspace reservation turned off OnTap 7.2.5.1 Windows 2003
> Guest OS.
>
> I cant see us reaching any limitation on the Filers or the SAN. Yet we

> have random VMs failing snapshots every night. Are other people seeing

> these issues? (I've gone through the gamut of troubleshooting, version

> management of ESX/VMWareTools/etc). Snapshots timeout and fail at the
> VMWare/Guest level, not at the Netapp snapshot level.
>
> We want to have SMVI function with VSS enabled.
>
> Has anyone had failing snapshots been able to resolve a similar issue?

> Or does anyone have SMVI working properly that we could use as a
> reference to compare configuration?
>
> __________________________________________________________
> Ken Williams
> Storage Administrator, Business Technology Operations Sacramento
> Municipal Utility District
> E-Mail: kwillia[at]smud.org
> Phone: (916) 732-6744
> Cell: (916) 240-4213


steffen.kammerer at brainlab

Nov 4, 2009, 12:59 AM

Post #16 of 18 (761 views)
Permalink
RE: SMVI / VMWare Experiences... [In reply to]

Thanks for all your answers...

We made some tests yesterday with esx4 and cloning and snapshotting...

It seems that the challenge is not because of SMVI. If we try to clone these machines which failed to create a snapshot (with the error below) we get the same failure.

The error appears after 5 seconds...

But if do not quiesce we maybe get inconsistent snapshots... do you have any experience restoring not quiesced snapshots??

Thanks and best regards,

Steffen



-----Original Message-----
From: Ken Williams [mailto:kwillia[at]smud.org]
Sent: Tuesday, November 03, 2009 6:52 PM
To: Ken Williams; Steffen Kammerer; Nick Silkey
Cc: toasters[at]mathworks.com
Subject: RE: SMVI / VMWare Experiences...

Data ONTap version should read 7.3.1.1p7.

-----Original Message-----
From: Ken Williams
Sent: Tuesday, November 03, 2009 9:21 AM
To: 'Steffen Kammerer'; Nick Silkey
Cc: toasters[at]mathworks.com
Subject: RE: SMVI / VMWare Experiences...

Yes, we have the same issue.

Came down to a few things:

1. Update to 7.3.2p7. There are algorithm changes to WAFL that help with
VMFS/VMDK reading/writing. I saw a HUGE performance change with this.

2. Check your CPU (systat 0), ours was pegged due to NDMP backups;
changing the times helped out a bunch.

3. Disable File Sync in VMWare tools on each guest. The File Sync driver
is problematic and not recommended. This is on each guest in add/remove
programs for VMWare tools.

4. VMWare admitted this is a problem; most users accept the work around
to not do quiesced backups. There is a checkbox in SMVI that will allow
you to not do VMWare level snaps.

Otherwise try snaping smaller groups (10 max) of VMs. We're at about 80%
success right now; not great but moving in the right direction.


-----Original Message-----
From: Steffen Kammerer [mailto:steffen.kammerer[at]brainlab.com]
Sent: Tuesday, November 03, 2009 6:41 AM
To: Ken Williams; Nick Silkey
Cc: toasters[at]mathworks.com
Subject: RE: SMVI / VMWare Experiences...

Hi there,

We have the same issue with SMVI 2.0 on nfs datastores... on some VMs we
get the following error:

Cannot create a quiesced snapshot because the create snapshot
operation exceeded the time limit for holding off I/O in the
frozen virtual machine.


Does anybody approach the same error?!

Thanks and best regards,


Steffen



-----Original Message-----
From: owner-toasters[at]mathworks.com [mailto:owner-toasters[at]mathworks.com]
On Behalf Of Ken Williams
Sent: Monday, September 14, 2009 6:51 PM
To: Nick Silkey
Cc: toasters[at]mathworks.com
Subject: RE: SMVI / VMWare Experiences...

Sounds like whatever user-defined script you have is failing sometimes?
Or perhaps it's a VMWare tools issue.

We've been able to track our issue down to the Guest OS level (win2k3
specifically). Looks like its an issue with VSS or LUN alignment.

I would recommend ensuring your LUNs are aligned (use the VMWare host
util kit, mbrscan / mbralign). There is detailed documentation on the
NOW.netapp.com site.

-----Original Message-----
From: Nick Silkey [mailto:nick[at]silkey.org]
Sent: Friday, September 11, 2009 7:15 PM
To: Ken Williams
Cc: toasters[at]mathworks.com
Subject: Re: SMVI / VMWare Experiences...

Ken --

We too are experiencing issues with SMVI 1.2 bombing out when attempting
to perform a VMware quiesce snap on _some_ RHEL5.3 32-bit VMs. A couple
of notables:

- These problematic VMs have a 100% success rate at taking VMware
quiesce snaps within vCenter, independent of SMVI.
- The problem is 100% reproducible during night, day, etc.
- We will deploy several VMs at a crack, all the same build. When the
next SMVI schedule hits, some fail while others succeed. Bizarre.
- Over time (weve been experiencing this issue for several weeks now),
the 'problem' VMs change. Example: VMs abc and xyz will fail for days;
without intervention, VM abc will stop failing while VM xyz continues to
fail ... even if theyre part of the same deploy base template/kickstart.
- We are nowhere near our snap limit on the volumes.
- These problematic VMs only bomb when attempting a quiesce.
Non-quiesce SMVI snaps work like a champ.

Been working with NetApp and VMware for some time now. Were at ESX
3.5u4+ to an 3160-R5 @ 7.2.6.1P3 via NFS + vCenter 4.0 + synch
SnapMirror to another 3160-R5 @ 7.2.6.1P3. The only thing revealing is
SMVI + vCenter logs of "cannot create a quiesced snapshot because the
(user-supplied) custom prefreeze script in the virtual machine exited
with a nonzero return code".

--
Nick

On Wed, Aug 26, 2009 at 5:32 PM, Ken Williams <kwillia[at]smud.org> wrote:
> I'm looking for some experiences people out there may have with SMVI
> with NetApp. We're currently experiencing major issues with SMVI
> snapshots failing. I've had open tickets with NetApp/VMWare/Microsoft
> for 3 months and still have yet to have a solution.
>
> My environment looks like such:
>
> 6 x HP DL380 G5 (32gb Ram) in a ESX Cluster Dual Emulex 10000 Cards in

> each host.
> Cisco MDS SAN
> Netapp FAS3070 Cluster ~9tb aggregate for VMWare.
> VMFS Datastores ~10-15 VMs per datastore. ~50gb per VM.
> ASIS Turned on
> Volume and LUNspace reservation turned off OnTap 7.2.5.1 Windows 2003
> Guest OS.
>
> I cant see us reaching any limitation on the Filers or the SAN. Yet we

> have random VMs failing snapshots every night. Are other people seeing

> these issues? (I've gone through the gamut of troubleshooting, version

> management of ESX/VMWareTools/etc). Snapshots timeout and fail at the
> VMWare/Guest level, not at the Netapp snapshot level.
>
> We want to have SMVI function with VSS enabled.
>
> Has anyone had failing snapshots been able to resolve a similar issue?

> Or does anyone have SMVI working properly that we could use as a
> reference to compare configuration?
>
> __________________________________________________________
> Ken Williams
> Storage Administrator, Business Technology Operations Sacramento
> Municipal Utility District
> E-Mail: kwillia[at]smud.org
> Phone: (916) 732-6744
> Cell: (916) 240-4213


kwillia at smud

Nov 4, 2009, 8:47 AM

Post #17 of 18 (749 views)
Permalink
RE: SMVI / VMWare Experiences... [In reply to]

If you don't get VMWare snapshots in your SMVI process then your backups
would be inconsistent. So conceptually the restore would be akin to a
traditional physical server backup. The state of the VM will be unknown
and thus there could be a fsck/chkdsk process that would need to occur.
I find it perfectly acceptable for systems to be backed up
"inconsistent"; this is the old methodology for backups. The only
problem you run into is application awareness (i.e. VMWare pre/post
snapshot scripts to quiesce applications or databases).

We sent one of our VMs that was consistently erroring with snapshot
backups to NetApp; they were able to recreate the problem in their lab
with our VM.

On a side note:
VM disk alignment is HUGE, make sure you're aligned (I bet you're
hearing a lot of this; it can really make a performance difference). I
recommend the tools from NetApp: mbrscan/mbralign.

-----Original Message-----
From: Steffen Kammerer [mailto:steffen.kammerer[at]brainlab.com]
Sent: Wednesday, November 04, 2009 1:00 AM
To: Ken Williams; Nick Silkey
Cc: toasters[at]mathworks.com
Subject: RE: SMVI / VMWare Experiences...

Thanks for all your answers...

We made some tests yesterday with esx4 and cloning and snapshotting...

It seems that the challenge is not because of SMVI. If we try to clone
these machines which failed to create a snapshot (with the error below)
we get the same failure.

The error appears after 5 seconds...

But if do not quiesce we maybe get inconsistent snapshots... do you have
any experience restoring not quiesced snapshots??

Thanks and best regards,

Steffen



-----Original Message-----
From: Ken Williams [mailto:kwillia[at]smud.org]
Sent: Tuesday, November 03, 2009 6:52 PM
To: Ken Williams; Steffen Kammerer; Nick Silkey
Cc: toasters[at]mathworks.com
Subject: RE: SMVI / VMWare Experiences...

Data ONTap version should read 7.3.1.1p7.

-----Original Message-----
From: Ken Williams
Sent: Tuesday, November 03, 2009 9:21 AM
To: 'Steffen Kammerer'; Nick Silkey
Cc: toasters[at]mathworks.com
Subject: RE: SMVI / VMWare Experiences...

Yes, we have the same issue.

Came down to a few things:

1. Update to 7.3.2p7. There are algorithm changes to WAFL that help with
VMFS/VMDK reading/writing. I saw a HUGE performance change with this.

2. Check your CPU (systat 0), ours was pegged due to NDMP backups;
changing the times helped out a bunch.

3. Disable File Sync in VMWare tools on each guest. The File Sync driver
is problematic and not recommended. This is on each guest in add/remove
programs for VMWare tools.

4. VMWare admitted this is a problem; most users accept the work around
to not do quiesced backups. There is a checkbox in SMVI that will allow
you to not do VMWare level snaps.

Otherwise try snaping smaller groups (10 max) of VMs. We're at about 80%
success right now; not great but moving in the right direction.


-----Original Message-----
From: Steffen Kammerer [mailto:steffen.kammerer[at]brainlab.com]
Sent: Tuesday, November 03, 2009 6:41 AM
To: Ken Williams; Nick Silkey
Cc: toasters[at]mathworks.com
Subject: RE: SMVI / VMWare Experiences...

Hi there,

We have the same issue with SMVI 2.0 on nfs datastores... on some VMs we
get the following error:

Cannot create a quiesced snapshot because the create snapshot
operation exceeded the time limit for holding off I/O in the
frozen virtual machine.


Does anybody approach the same error?!

Thanks and best regards,


Steffen



-----Original Message-----
From: owner-toasters[at]mathworks.com [mailto:owner-toasters[at]mathworks.com]
On Behalf Of Ken Williams
Sent: Monday, September 14, 2009 6:51 PM
To: Nick Silkey
Cc: toasters[at]mathworks.com
Subject: RE: SMVI / VMWare Experiences...

Sounds like whatever user-defined script you have is failing sometimes?
Or perhaps it's a VMWare tools issue.

We've been able to track our issue down to the Guest OS level (win2k3
specifically). Looks like its an issue with VSS or LUN alignment.

I would recommend ensuring your LUNs are aligned (use the VMWare host
util kit, mbrscan / mbralign). There is detailed documentation on the
NOW.netapp.com site.

-----Original Message-----
From: Nick Silkey [mailto:nick[at]silkey.org]
Sent: Friday, September 11, 2009 7:15 PM
To: Ken Williams
Cc: toasters[at]mathworks.com
Subject: Re: SMVI / VMWare Experiences...

Ken --

We too are experiencing issues with SMVI 1.2 bombing out when attempting
to perform a VMware quiesce snap on _some_ RHEL5.3 32-bit VMs. A couple
of notables:

- These problematic VMs have a 100% success rate at taking VMware
quiesce snaps within vCenter, independent of SMVI.
- The problem is 100% reproducible during night, day, etc.
- We will deploy several VMs at a crack, all the same build. When the
next SMVI schedule hits, some fail while others succeed. Bizarre.
- Over time (weve been experiencing this issue for several weeks now),
the 'problem' VMs change. Example: VMs abc and xyz will fail for days;
without intervention, VM abc will stop failing while VM xyz continues to
fail ... even if theyre part of the same deploy base template/kickstart.
- We are nowhere near our snap limit on the volumes.
- These problematic VMs only bomb when attempting a quiesce.
Non-quiesce SMVI snaps work like a champ.

Been working with NetApp and VMware for some time now. Were at ESX
3.5u4+ to an 3160-R5 @ 7.2.6.1P3 via NFS + vCenter 4.0 + synch
SnapMirror to another 3160-R5 @ 7.2.6.1P3. The only thing revealing is
SMVI + vCenter logs of "cannot create a quiesced snapshot because the
(user-supplied) custom prefreeze script in the virtual machine exited
with a nonzero return code".

--
Nick

On Wed, Aug 26, 2009 at 5:32 PM, Ken Williams <kwillia[at]smud.org> wrote:
> I'm looking for some experiences people out there may have with SMVI
> with NetApp. We're currently experiencing major issues with SMVI
> snapshots failing. I've had open tickets with NetApp/VMWare/Microsoft
> for 3 months and still have yet to have a solution.
>
> My environment looks like such:
>
> 6 x HP DL380 G5 (32gb Ram) in a ESX Cluster Dual Emulex 10000 Cards in

> each host.
> Cisco MDS SAN
> Netapp FAS3070 Cluster ~9tb aggregate for VMWare.
> VMFS Datastores ~10-15 VMs per datastore. ~50gb per VM.
> ASIS Turned on
> Volume and LUNspace reservation turned off OnTap 7.2.5.1 Windows 2003
> Guest OS.
>
> I cant see us reaching any limitation on the Filers or the SAN. Yet we

> have random VMs failing snapshots every night. Are other people seeing

> these issues? (I've gone through the gamut of troubleshooting, version

> management of ESX/VMWareTools/etc). Snapshots timeout and fail at the
> VMWare/Guest level, not at the Netapp snapshot level.
>
> We want to have SMVI function with VSS enabled.
>
> Has anyone had failing snapshots been able to resolve a similar issue?

> Or does anyone have SMVI working properly that we could use as a
> reference to compare configuration?
>
> __________________________________________________________
> Ken Williams
> Storage Administrator, Business Technology Operations Sacramento
> Municipal Utility District
> E-Mail: kwillia[at]smud.org
> Phone: (916) 732-6744
> Cell: (916) 240-4213


kwillia at smud

Nov 4, 2009, 8:47 AM

Post #18 of 18 (476 views)
Permalink
RE: SMVI / VMWare Experiences... [In reply to]

If you don't get VMWare snapshots in your SMVI process then your backups
would be inconsistent. So conceptually the restore would be akin to a
traditional physical server backup. The state of the VM will be unknown
and thus there could be a fsck/chkdsk process that would need to occur.
I find it perfectly acceptable for systems to be backed up
"inconsistent"; this is the old methodology for backups. The only
problem you run into is application awareness (i.e. VMWare pre/post
snapshot scripts to quiesce applications or databases).

We sent one of our VMs that was consistently erroring with snapshot
backups to NetApp; they were able to recreate the problem in their lab
with our VM.

On a side note:
VM disk alignment is HUGE, make sure you're aligned (I bet you're
hearing a lot of this; it can really make a performance difference). I
recommend the tools from NetApp: mbrscan/mbralign.

-----Original Message-----
From: Steffen Kammerer [mailto:steffen.kammerer[at]brainlab.com]
Sent: Wednesday, November 04, 2009 1:00 AM
To: Ken Williams; Nick Silkey
Cc: toasters[at]mathworks.com
Subject: RE: SMVI / VMWare Experiences...

Thanks for all your answers...

We made some tests yesterday with esx4 and cloning and snapshotting...

It seems that the challenge is not because of SMVI. If we try to clone
these machines which failed to create a snapshot (with the error below)
we get the same failure.

The error appears after 5 seconds...

But if do not quiesce we maybe get inconsistent snapshots... do you have
any experience restoring not quiesced snapshots??

Thanks and best regards,

Steffen



-----Original Message-----
From: Ken Williams [mailto:kwillia[at]smud.org]
Sent: Tuesday, November 03, 2009 6:52 PM
To: Ken Williams; Steffen Kammerer; Nick Silkey
Cc: toasters[at]mathworks.com
Subject: RE: SMVI / VMWare Experiences...

Data ONTap version should read 7.3.1.1p7.

-----Original Message-----
From: Ken Williams
Sent: Tuesday, November 03, 2009 9:21 AM
To: 'Steffen Kammerer'; Nick Silkey
Cc: toasters[at]mathworks.com
Subject: RE: SMVI / VMWare Experiences...

Yes, we have the same issue.

Came down to a few things:

1. Update to 7.3.2p7. There are algorithm changes to WAFL that help with
VMFS/VMDK reading/writing. I saw a HUGE performance change with this.

2. Check your CPU (systat 0), ours was pegged due to NDMP backups;
changing the times helped out a bunch.

3. Disable File Sync in VMWare tools on each guest. The File Sync driver
is problematic and not recommended. This is on each guest in add/remove
programs for VMWare tools.

4. VMWare admitted this is a problem; most users accept the work around
to not do quiesced backups. There is a checkbox in SMVI that will allow
you to not do VMWare level snaps.

Otherwise try snaping smaller groups (10 max) of VMs. We're at about 80%
success right now; not great but moving in the right direction.


-----Original Message-----
From: Steffen Kammerer [mailto:steffen.kammerer[at]brainlab.com]
Sent: Tuesday, November 03, 2009 6:41 AM
To: Ken Williams; Nick Silkey
Cc: toasters[at]mathworks.com
Subject: RE: SMVI / VMWare Experiences...

Hi there,

We have the same issue with SMVI 2.0 on nfs datastores... on some VMs we
get the following error:

Cannot create a quiesced snapshot because the create snapshot
operation exceeded the time limit for holding off I/O in the
frozen virtual machine.


Does anybody approach the same error?!

Thanks and best regards,


Steffen



-----Original Message-----
From: owner-toasters[at]mathworks.com [mailto:owner-toasters[at]mathworks.com]
On Behalf Of Ken Williams
Sent: Monday, September 14, 2009 6:51 PM
To: Nick Silkey
Cc: toasters[at]mathworks.com
Subject: RE: SMVI / VMWare Experiences...

Sounds like whatever user-defined script you have is failing sometimes?
Or perhaps it's a VMWare tools issue.

We've been able to track our issue down to the Guest OS level (win2k3
specifically). Looks like its an issue with VSS or LUN alignment.

I would recommend ensuring your LUNs are aligned (use the VMWare host
util kit, mbrscan / mbralign). There is detailed documentation on the
NOW.netapp.com site.

-----Original Message-----
From: Nick Silkey [mailto:nick[at]silkey.org]
Sent: Friday, September 11, 2009 7:15 PM
To: Ken Williams
Cc: toasters[at]mathworks.com
Subject: Re: SMVI / VMWare Experiences...

Ken --

We too are experiencing issues with SMVI 1.2 bombing out when attempting
to perform a VMware quiesce snap on _some_ RHEL5.3 32-bit VMs. A couple
of notables:

- These problematic VMs have a 100% success rate at taking VMware
quiesce snaps within vCenter, independent of SMVI.
- The problem is 100% reproducible during night, day, etc.
- We will deploy several VMs at a crack, all the same build. When the
next SMVI schedule hits, some fail while others succeed. Bizarre.
- Over time (weve been experiencing this issue for several weeks now),
the 'problem' VMs change. Example: VMs abc and xyz will fail for days;
without intervention, VM abc will stop failing while VM xyz continues to
fail ... even if theyre part of the same deploy base template/kickstart.
- We are nowhere near our snap limit on the volumes.
- These problematic VMs only bomb when attempting a quiesce.
Non-quiesce SMVI snaps work like a champ.

Been working with NetApp and VMware for some time now. Were at ESX
3.5u4+ to an 3160-R5 @ 7.2.6.1P3 via NFS + vCenter 4.0 + synch
SnapMirror to another 3160-R5 @ 7.2.6.1P3. The only thing revealing is
SMVI + vCenter logs of "cannot create a quiesced snapshot because the
(user-supplied) custom prefreeze script in the virtual machine exited
with a nonzero return code".

--
Nick

On Wed, Aug 26, 2009 at 5:32 PM, Ken Williams <kwillia[at]smud.org> wrote:
> I'm looking for some experiences people out there may have with SMVI
> with NetApp. We're currently experiencing major issues with SMVI
> snapshots failing. I've had open tickets with NetApp/VMWare/Microsoft
> for 3 months and still have yet to have a solution.
>
> My environment looks like such:
>
> 6 x HP DL380 G5 (32gb Ram) in a ESX Cluster Dual Emulex 10000 Cards in

> each host.
> Cisco MDS SAN
> Netapp FAS3070 Cluster ~9tb aggregate for VMWare.
> VMFS Datastores ~10-15 VMs per datastore. ~50gb per VM.
> ASIS Turned on
> Volume and LUNspace reservation turned off OnTap 7.2.5.1 Windows 2003
> Guest OS.
>
> I cant see us reaching any limitation on the Filers or the SAN. Yet we

> have random VMs failing snapshots every night. Are other people seeing

> these issues? (I've gone through the gamut of troubleshooting, version

> management of ESX/VMWareTools/etc). Snapshots timeout and fail at the
> VMWare/Guest level, not at the Netapp snapshot level.
>
> We want to have SMVI function with VSS enabled.
>
> Has anyone had failing snapshots been able to resolve a similar issue?

> Or does anyone have SMVI working properly that we could use as a
> reference to compare configuration?
>
> __________________________________________________________
> Ken Williams
> Storage Administrator, Business Technology Operations Sacramento
> Municipal Utility District
> E-Mail: kwillia[at]smud.org
> Phone: (916) 732-6744
> Cell: (916) 240-4213

Netapp toasters RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.