Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Netapp: toasters

SQL 2005 reacts badly to a cluster giveback ?

 

 

Netapp toasters RSS feed   Index | Next | Previous | View Threaded


phigmov at gmail

Aug 30, 2009, 5:05 PM

Post #1 of 8 (2405 views)
Permalink
SQL 2005 reacts badly to a cluster giveback ?

Hi.

We've had a couple of cluster-failover events on our FAS270c (watchdog
errors every time) on 7.2.5.1

The failover is fine (AFAIK) when one of the nodes reboots - however in the
Giveback it appears that the SQL server has a couple of initiator errors
events logged and although the drives are visible (and working in terms of
I/O) and the SQL services are still running any SQL dependent applications
just don't work after the giveback. As soon as I stop/start the SQL services
its all back to normal (or I reboot the box).

Server is Windows 2003sp2, its a VM on ESX3.5, the iSCSI interface goes
through a dedicated iSCSI NIC (a virtual switch which also carries the ESX
iSCSI LUN's) Snapdrive is 6.01, iSCSI initiator is 2.03 and its a 32bit VM.

Oddly Exchange didn't miss a beat (they're physical Windows 2008 64 bit
servers) but SQL was definitely unhappy (even though the SQL service itself
carried on - ie it didn't stop).

Any ideas ? I note theres a newer iSCSI initiator available (2.08) from
Microsoft. I'm pretty sure we haven't had this Giveback issue with our old
SnapDrive 4.2.1 setup on the same server.

Thanks in advance,
Raj.


filip.sneppe at gmail

Aug 31, 2009, 9:25 AM

Post #2 of 8 (2295 views)
Permalink
Re: SQL 2005 reacts badly to a cluster giveback ? [In reply to]

Hi,

I can add a "me too" message to this post. I've had more or less the
same experience at two customer sites (albeit on physical machines,
where I ran into issues with the MSSQL servers and their iSCSI disks.

I can't say that I've experienced the same sort of problems with eg.
Exchange setups. Generally, when things are setup correctly wrt.
disk timeouts, everything works fine.

The SQL setups I had issues with have more recent versions of the MS
iSCSI initiator (around 2.05/2.06 iirc), and I've also thought about upgrading
to a more recent version. One thing I came across when investigating, is that
Windows can have a very large ARP caching timeout, and during
one test, it took the Windows SQL box until long after the filer
had booted before the new MAC address was learned from the network.
I think Windows 2000 and 2003 can cache an ARP entry for up to 10 minutes,
so I really don't know how a disk timeout of 190 seconds is theoretically
sufficient for NetApp cluster failovers.

So I would like to know if anyone has experienced the same sort of
things, in particular with MS SQL servers and iSCSI.

Regards,
Filip

On Mon, Aug 31, 2009 at 2:05 AM, Raj Patel<phigmov [at] gmail> wrote:
> Hi.
>
> We've had a couple of cluster-failover events on our FAS270c (watchdog
> errors every time) on 7.2.5.1
>
> The failover is fine (AFAIK) when one of the nodes reboots - however in the
> Giveback it appears that the SQL server has a couple of initiator errors
> events logged and although the drives are visible (and working in terms of
> I/O) and the SQL services are still running any SQL dependent applications
> just don't work after the giveback. As soon as I stop/start the SQL services
> its all back to normal (or I reboot the box).
>
> Server is Windows 2003sp2, its a VM on ESX3.5, the iSCSI interface goes
> through a dedicated iSCSI NIC (a virtual switch which also carries the ESX
> iSCSI LUN's) Snapdrive is 6.01, iSCSI initiator is 2.03 and its a 32bit VM.
>
> Oddly Exchange didn't miss a beat (they're physical Windows 2008 64 bit
> servers) but SQL was definitely unhappy (even though the SQL service itself
> carried on - ie it didn't stop).
>
> Any ideas ? I note theres a newer iSCSI initiator available (2.08) from
> Microsoft. I'm pretty sure we haven't had this Giveback issue with our old
> SnapDrive 4.2.1 setup on the same server.
>
> Thanks in advance,
> Raj.
>


Darren.Sykes at csr

Sep 1, 2009, 3:26 AM

Post #3 of 8 (2275 views)
Permalink
RE: SQL 2005 reacts badly to a cluster giveback ? [In reply to]

The ARP cache issue wouldn't really explain why Exchange reacts better.

However, I suppose you could verify that theory by attempting a failover
on a cluster than is not on the same subnet as the iSCSI client, or
decrease the ARP timeout (an entry under
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters]
IIRC).

Darren


-----Original Message-----
From: owner-toasters [at] mathworks [mailto:owner-toasters [at] mathworks]
On Behalf Of Filip Sneppe
Sent: 31 August 2009 17:26
To: Raj Patel
Cc: toasters [at] mathworks
Subject: Re: SQL 2005 reacts badly to a cluster giveback ?

Hi,

I can add a "me too" message to this post. I've had more or less the
same experience at two customer sites (albeit on physical machines,
where I ran into issues with the MSSQL servers and their iSCSI disks.

I can't say that I've experienced the same sort of problems with eg.
Exchange setups. Generally, when things are setup correctly wrt.
disk timeouts, everything works fine.

The SQL setups I had issues with have more recent versions of the MS
iSCSI initiator (around 2.05/2.06 iirc), and I've also thought about
upgrading
to a more recent version. One thing I came across when investigating, is
that
Windows can have a very large ARP caching timeout, and during
one test, it took the Windows SQL box until long after the filer
had booted before the new MAC address was learned from the network.
I think Windows 2000 and 2003 can cache an ARP entry for up to 10
minutes,
so I really don't know how a disk timeout of 190 seconds is
theoretically
sufficient for NetApp cluster failovers.

So I would like to know if anyone has experienced the same sort of
things, in particular with MS SQL servers and iSCSI.

Regards,
Filip

On Mon, Aug 31, 2009 at 2:05 AM, Raj Patel<phigmov [at] gmail> wrote:
> Hi.
>
> We've had a couple of cluster-failover events on our FAS270c (watchdog
> errors every time) on 7.2.5.1
>
> The failover is fine (AFAIK) when one of the nodes reboots - however
in the
> Giveback it appears that the SQL server has a couple of initiator
errors
> events logged and although the drives are visible (and working in
terms of
> I/O) and the SQL services are still running any SQL dependent
applications
> just don't work after the giveback. As soon as I stop/start the SQL
services
> its all back to normal (or I reboot the box).
>
> Server is Windows 2003sp2, its a VM on ESX3.5, the iSCSI interface
goes
> through a dedicated iSCSI NIC (a virtual switch which also carries the
ESX
> iSCSI LUN's) Snapdrive is 6.01, iSCSI initiator is 2.03 and its a
32bit VM.
>
> Oddly Exchange didn't miss a beat (they're physical Windows 2008 64
bit
> servers) but SQL was definitely unhappy (even though the SQL service
itself
> carried on - ie it didn't stop).
>
> Any ideas ? I note theres a newer iSCSI initiator available (2.08)
from
> Microsoft. I'm pretty sure we haven't had this Giveback issue with our
old
> SnapDrive 4.2.1 setup on the same server.
>
> Thanks in advance,
> Raj.
>


To report this email as spam click
https://www.mailcontrol.com/sr/wQw0zmjPoHdJTZGyOCrrhg==
DPJR0BclKWgOsHu6LKDaZ!IFATt2KLQNAhmYIqzE2R4VA== .


Member of the CSR plc group of companies. CSR plc registered in England and Wales, registered number 4187346, registered office Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, United Kingdom


jack1729 at gmail

Sep 1, 2009, 3:50 AM

Post #4 of 8 (2269 views)
Permalink
Re: SQL 2005 reacts badly to a cluster giveback ? [In reply to]

We are about to implement a new SQL 2005 (x64) on NetApp so I will
follow this thread pretty closely. We have built a few virtual-virtual
active-active clusters and virtual-physical active-active clusters for
in house developed software with no issues.

I assume that the iscsi NIC is on the same segment as the storage?
I assume you are not using jumbo frames?
I assume your netapp cluster is configure for single image mode?


Raj Patel wrote:
> Hi.
>
> We've had a couple of cluster-failover events on our FAS270c (watchdog
> errors every time) on 7.2.5.1
>
> The failover is fine (AFAIK) when one of the nodes reboots - however
> in the Giveback it appears that the SQL server has a couple of
> initiator errors events logged and although the drives are visible
> (and working in terms of I/O) and the SQL services are still running
> any SQL dependent applications just don't work after the giveback. As
> soon as I stop/start the SQL services its all back to normal (or I
> reboot the box).
>
> Server is Windows 2003sp2, its a VM on ESX3.5, the iSCSI interface
> goes through a dedicated iSCSI NIC (a virtual switch which also
> carries the ESX iSCSI LUN's) Snapdrive is 6.01, iSCSI initiator is
> 2.03 and its a 32bit VM.
>
> Oddly Exchange didn't miss a beat (they're physical Windows 2008 64
> bit servers) but SQL was definitely unhappy (even though the SQL
> service itself carried on - ie it didn't stop).
>
> Any ideas ? I note theres a newer iSCSI initiator available (2.08)
> from Microsoft. I'm pretty sure we haven't had this Giveback issue
> with our old SnapDrive 4.2.1 setup on the same server.
>
> Thanks in advance,
> Raj.


filip.sneppe at gmail

Sep 1, 2009, 3:57 AM

Post #5 of 8 (2275 views)
Permalink
Re: SQL 2005 reacts badly to a cluster giveback ? [In reply to]

Hi,

Yes, all hosts are on the same subnet, no jumbo frames are involved, and
for iSCSI, single_image mode isn't really relevant...

Best regards,
Filip

On Tue, Sep 1, 2009 at 12:50 PM, Jack Lyons<jack1729 [at] gmail> wrote:
> We are about to implement a new SQL 2005 (x64) on NetApp so I will follow
> this thread pretty closely.  We have built a few virtual-virtual
> active-active clusters and virtual-physical active-active clusters for in
> house developed software with no issues.
>
> I assume that the iscsi NIC is on the same segment as the storage?
> I assume you are not using jumbo frames?
> I assume your netapp cluster is configure for single image mode?
>
>
> Raj Patel wrote:
>>
>> Hi.
>>
>> We've had a couple of cluster-failover events on our FAS270c (watchdog
>> errors every time) on 7.2.5.1
>>
>> The failover is fine (AFAIK) when one of the nodes reboots - however in
>> the Giveback it appears that the SQL server has a couple of initiator errors
>> events logged and although the drives are visible (and working in terms of
>> I/O) and the SQL services are still running any SQL dependent applications
>> just don't work after the giveback. As soon as I stop/start the SQL services
>> its all back to normal (or I reboot the box).
>>
>> Server is Windows 2003sp2, its a VM on ESX3.5, the iSCSI interface goes
>> through a dedicated iSCSI NIC (a virtual switch which also carries the ESX
>> iSCSI LUN's) Snapdrive is 6.01, iSCSI initiator is 2.03 and its a 32bit VM.
>>
>> Oddly Exchange didn't miss a beat (they're physical Windows 2008 64 bit
>> servers) but SQL was definitely unhappy (even though the SQL service itself
>> carried on - ie it didn't stop).
>>
>> Any ideas ? I note theres a newer iSCSI initiator available (2.08) from
>> Microsoft. I'm pretty sure we haven't had this Giveback issue with our old
>> SnapDrive 4.2.1 setup on the same server.
>>
>> Thanks in advance,
>> Raj.
>
>
>


jack1729 at gmail

Sep 1, 2009, 4:14 AM

Post #6 of 8 (2278 views)
Permalink
Re: SQL 2005 reacts badly to a cluster giveback ? [In reply to]

Did you set the prefered filer IP address in snapdrive config. Did you
setup your iscsi target using hostname or ip's, could there be name
resolution issues?

Filip Sneppe wrote:
> Hi,
>
> Yes, all hosts are on the same subnet, no jumbo frames are involved, and
> for iSCSI, single_image mode isn't really relevant...
>
> Best regards,
> Filip
>
> On Tue, Sep 1, 2009 at 12:50 PM, Jack Lyons<jack1729 [at] gmail> wrote:
>
>> We are about to implement a new SQL 2005 (x64) on NetApp so I will follow
>> this thread pretty closely. We have built a few virtual-virtual
>> active-active clusters and virtual-physical active-active clusters for in
>> house developed software with no issues.
>>
>> I assume that the iscsi NIC is on the same segment as the storage?
>> I assume you are not using jumbo frames?
>> I assume your netapp cluster is configure for single image mode?
>>
>>
>> Raj Patel wrote:
>>
>>> Hi.
>>>
>>> We've had a couple of cluster-failover events on our FAS270c (watchdog
>>> errors every time) on 7.2.5.1
>>>
>>> The failover is fine (AFAIK) when one of the nodes reboots - however in
>>> the Giveback it appears that the SQL server has a couple of initiator errors
>>> events logged and although the drives are visible (and working in terms of
>>> I/O) and the SQL services are still running any SQL dependent applications
>>> just don't work after the giveback. As soon as I stop/start the SQL services
>>> its all back to normal (or I reboot the box).
>>>
>>> Server is Windows 2003sp2, its a VM on ESX3.5, the iSCSI interface goes
>>> through a dedicated iSCSI NIC (a virtual switch which also carries the ESX
>>> iSCSI LUN's) Snapdrive is 6.01, iSCSI initiator is 2.03 and its a 32bit VM.
>>>
>>> Oddly Exchange didn't miss a beat (they're physical Windows 2008 64 bit
>>> servers) but SQL was definitely unhappy (even though the SQL service itself
>>> carried on - ie it didn't stop).
>>>
>>> Any ideas ? I note theres a newer iSCSI initiator available (2.08) from
>>> Microsoft. I'm pretty sure we haven't had this Giveback issue with our old
>>> SnapDrive 4.2.1 setup on the same server.
>>>
>>> Thanks in advance,
>>> Raj.
>>>
>>
>>
>
>


filip.sneppe at gmail

Sep 1, 2009, 4:30 AM

Post #7 of 8 (2269 views)
Permalink
Re: SQL 2005 reacts badly to a cluster giveback ? [In reply to]

Hi,

Yes, the preferred IP address was set (using hostname/IP address pairs).
I have no indication of name resolution issues, ie. the LAN interfaces of the
filers are statis DNS entries, and the iSCSI IP addresses are set using
the filer preferred IP addresses in SnapDrive.

Best regards,
Filip

On Tue, Sep 1, 2009 at 1:14 PM, Jack Lyons<jack1729 [at] gmail> wrote:
> Did you set the prefered filer IP address in snapdrive config.  Did you
> setup your iscsi target using hostname or ip's, could there be name
> resolution issues?
>
> Filip Sneppe wrote:
>>
>> Hi,
>>
>> Yes, all hosts are on the same subnet, no jumbo frames are involved, and
>> for iSCSI, single_image mode isn't really relevant...
>>
>> Best regards,
>> Filip
>>
>> On Tue, Sep 1, 2009 at 12:50 PM, Jack Lyons<jack1729 [at] gmail> wrote:
>>
>>>
>>> We are about to implement a new SQL 2005 (x64) on NetApp so I will follow
>>> this thread pretty closely.  We have built a few virtual-virtual
>>> active-active clusters and virtual-physical active-active clusters for in
>>> house developed software with no issues.
>>>
>>> I assume that the iscsi NIC is on the same segment as the storage?
>>> I assume you are not using jumbo frames?
>>> I assume your netapp cluster is configure for single image mode?
>>>
>>>
>>> Raj Patel wrote:
>>>
>>>>
>>>> Hi.
>>>>
>>>> We've had a couple of cluster-failover events on our FAS270c (watchdog
>>>> errors every time) on 7.2.5.1
>>>>
>>>> The failover is fine (AFAIK) when one of the nodes reboots - however in
>>>> the Giveback it appears that the SQL server has a couple of initiator
>>>> errors
>>>> events logged and although the drives are visible (and working in terms
>>>> of
>>>> I/O) and the SQL services are still running any SQL dependent
>>>> applications
>>>> just don't work after the giveback. As soon as I stop/start the SQL
>>>> services
>>>> its all back to normal (or I reboot the box).
>>>>
>>>> Server is Windows 2003sp2, its a VM on ESX3.5, the iSCSI interface goes
>>>> through a dedicated iSCSI NIC (a virtual switch which also carries the
>>>> ESX
>>>> iSCSI LUN's) Snapdrive is 6.01, iSCSI initiator is 2.03 and its a 32bit
>>>> VM.
>>>>
>>>> Oddly Exchange didn't miss a beat (they're physical Windows 2008 64 bit
>>>> servers) but SQL was definitely unhappy (even though the SQL service
>>>> itself
>>>> carried on - ie it didn't stop).
>>>>
>>>> Any ideas ? I note theres a newer iSCSI initiator available (2.08) from
>>>> Microsoft. I'm pretty sure we haven't had this Giveback issue with our
>>>> old
>>>> SnapDrive 4.2.1 setup on the same server.
>>>>
>>>> Thanks in advance,
>>>> Raj.
>>>>
>>>
>>>
>>
>>
>
>


olaf at netapp

Sep 1, 2009, 5:50 AM

Post #8 of 8 (2281 views)
Permalink
RE: SQL 2005 reacts badly to a cluster giveback ? [In reply to]

Hi,

The Hostname/IP Address paris in PreferredIPAddresses in
Snapdrive are NOT for the iSCSI traffic, they are meant for the
management (RPC) traffic, and they need to be reverse-resolvable
and normal network ports on the Filer reachable from the Guest-OS
network.

iSCSI traffic is only determined by the setting in the initiator.

For the failover: make sure you use Host Utilities for ESX 5.0R2 or 5.1
and normally the config)hba should be run during the install. A reboot
is then required!
--
Olaf Leimann

-----Original Message-----
From: Filip Sneppe [mailto:filip.sneppe [at] gmail]
Sent: dinsdag 1 september 2009 13:30
To: Jack Lyons
Cc: Raj Patel; toasters [at] mathworks
Subject: Re: SQL 2005 reacts badly to a cluster giveback ?

Hi,

Yes, the preferred IP address was set (using hostname/IP address pairs).
I have no indication of name resolution issues, ie. the LAN interfaces of the
filers are statis DNS entries, and the iSCSI IP addresses are set using
the filer preferred IP addresses in SnapDrive.

Best regards,
Filip

On Tue, Sep 1, 2009 at 1:14 PM, Jack Lyons<jack1729 [at] gmail> wrote:
> Did you set the prefered filer IP address in snapdrive config.  Did you
> setup your iscsi target using hostname or ip's, could there be name
> resolution issues?
>
> Filip Sneppe wrote:
>>
>> Hi,
>>
>> Yes, all hosts are on the same subnet, no jumbo frames are involved, and
>> for iSCSI, single_image mode isn't really relevant...
>>
>> Best regards,
>> Filip
>>
>> On Tue, Sep 1, 2009 at 12:50 PM, Jack Lyons<jack1729 [at] gmail> wrote:
>>
>>>
>>> We are about to implement a new SQL 2005 (x64) on NetApp so I will follow
>>> this thread pretty closely.  We have built a few virtual-virtual
>>> active-active clusters and virtual-physical active-active clusters for in
>>> house developed software with no issues.
>>>
>>> I assume that the iscsi NIC is on the same segment as the storage?
>>> I assume you are not using jumbo frames?
>>> I assume your netapp cluster is configure for single image mode?
>>>
>>>
>>> Raj Patel wrote:
>>>
>>>>
>>>> Hi.
>>>>
>>>> We've had a couple of cluster-failover events on our FAS270c (watchdog
>>>> errors every time) on 7.2.5.1
>>>>
>>>> The failover is fine (AFAIK) when one of the nodes reboots - however in
>>>> the Giveback it appears that the SQL server has a couple of initiator
>>>> errors
>>>> events logged and although the drives are visible (and working in terms
>>>> of
>>>> I/O) and the SQL services are still running any SQL dependent
>>>> applications
>>>> just don't work after the giveback. As soon as I stop/start the SQL
>>>> services
>>>> its all back to normal (or I reboot the box).
>>>>
>>>> Server is Windows 2003sp2, its a VM on ESX3.5, the iSCSI interface goes
>>>> through a dedicated iSCSI NIC (a virtual switch which also carries the
>>>> ESX
>>>> iSCSI LUN's) Snapdrive is 6.01, iSCSI initiator is 2.03 and its a 32bit
>>>> VM.
>>>>
>>>> Oddly Exchange didn't miss a beat (they're physical Windows 2008 64 bit
>>>> servers) but SQL was definitely unhappy (even though the SQL service
>>>> itself
>>>> carried on - ie it didn't stop).
>>>>
>>>> Any ideas ? I note theres a newer iSCSI initiator available (2.08) from
>>>> Microsoft. I'm pretty sure we haven't had this Giveback issue with our
>>>> old
>>>> SnapDrive 4.2.1 setup on the same server.
>>>>
>>>> Thanks in advance,
>>>> Raj.
>>>>
>>>
>>>
>>
>>
>
>

Netapp toasters RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.