Friday, March 30, 2012

Intermittent SQL Server shutdown

I am running a scheduled job through SQL Server Agent every morning, Tuesday
to Saturday, at 3:00 am.
It runs perfectly for ws, sometimes months. Then, one morning, something
happens to cause SQL Server
and SQL Server Agent to stop running. *Nothing* has changed in the scheduled
job.
When this situation occurs, it always occurs while running the scheduled job
(i.e., the machine is not idle).
The scheduled job executes my stored procedure si_ImportMaster.
Furthermore, si_ImportMaster simply does some error checking and then execut
es a series of 10 stored procedures,
and when the error happens, it is always when si_ImportMaster is executing t
he same stored procedure (of those 10).
When I come in to work and realize what has happened, I re-start SQL Server
and SQL Server Agent
and execute the rest of the stored procedures in the process (including the
one it stopped on).
That "re-run" process completes without a hitch.
I would be happy to post the contents of the stored procedure in question, b
ut my thinking was that
if it runs correctly for ws or months, it can't be the fault of the SQL c
ode. Can it . . . ?
I have been given the job of troubleshooting this. I am much more familiar w
ith SQL Server programming
than the DBA side of things, so any and all assistance is appreciated.
We're running SQL 7.0 SP2.
If you need me to post anything else, let me know.
Thanks in advance -- Carl
Following is some error log output. I was able to find 4 relevant log files:
...\LOG\ERRORLOG.1 -- excerpted below
...\LOG\SQLAGENT.1 -- excerpted below
...\LOG\SQL00031.dmp -- 48K of mostly technical stuff (core dump info, etc
.)
...\LOG\SQL00032.dmp -- same as above.
Anyway,
Here's some output from ...\LOG\ERRORLOG.1
2005-03-13 19:05:06.75 kernel Microsoft SQL Server 7.00 - 7.00.842 (Intel
X86)
Mar 2 2000 06:49:37
Copyright (c) 1988-1998 Microsoft Corporation
Standard Edition on Windows NT 5.0 (Build 2195: Service Pack 2)
----
--
2005-03-15 04:18:04.09 spid9 Process 9 unlocking unowned resource: KEY: 7
:1451737566:5 (57028e42a833)
2005-03-15 04:18:04.12 spid9 Process 9 unlocking unowned resource: KEY: 7
:1451737566:5 (57028e42a833)
2005-03-15 04:18:04.28 spid9 Using 'sqlimage.dll' version '4.0.5'
Dump thread - spid = 9, PSS = 0x354df154, EC = 0x23a40084
Stack Dump being sent to E:\log\SQL00031.dmp
****************************************
************************************
***
*
* BEGIN STACK DUMP:
* 03/15/05 04:18:17 spid 9
*
* Input Buffer 40 bytes -
* E X E C s i _ I m p o r t M a s t e r
*
****************************************
************************************
***
----
--
Short Stack Dump
0x77f94956 Module(ntdll+14956) (ZwGetContextThread+b)
0x00784f83 Module(sqlservr+384f83) (utassert_fail+19f)
0x005b572c Module(sqlservr+1b572c) (ExecutionContext::Cleanup+9d)
0x004eb421 Module(sqlservr+eb421) (ExecutionContext::Purge+45)
0x004eb182 Module(sqlservr+eb182) (stopsubprocess+e5)
0x004e9d4e Module(sqlservr+e9d4e) (subproc_main+174)
0x41092a47 Module(ums+2a47) (ProcessWorkRequests+ec)
0x4109326b Module(ums+326b) (ThreadStartRoutine+138)
0x7800c9eb Module(MSVCRT+c9eb) (beginthread+ce)
0x7c57438b Module(KERNEL32+438b) (TlsSetValue+f0)
2005-03-15 04:18:17.81 kernel SQL Server Assertion: File: <proc.c>, line=1
927
Failed Assertion = 'm_activeSdesList.Head () == NULL'.
2005-03-15 04:18:17.87 spid9 Using 'sqlimage.dll' version '4.0.5'
Dump thread - spid = 9, PSS = 0x354df154, EC = 0x23a40084
Stack Dump being sent to E:\log\SQL00032.dmp
****************************************
************************************
***
*
* BEGIN STACK DUMP:
* 03/15/05 04:18:22 spid 9
*
* Input Buffer 40 bytes -
* E X E C s i _ I m p o r t M a s t e r
*
****************************************
************************************
***
----
--
Short Stack Dump
0x77f94956 Module(ntdll+14956) (ZwGetContextThread+b)
0x0077b371 Module(sqlservr+37b371) (ex_raise2+35b)
0x0077b3dd Module(sqlservr+37b3dd) (ex_raise+5f)
0x00784f16 Module(sqlservr+384f16) (utassert_fail+37d)
0x005b572c Module(sqlservr+1b572c) (ExecutionContext::Cleanup+9d)
0x004eb421 Module(sqlservr+eb421) (ExecutionContext::Purge+45)
0x004eb182 Module(sqlservr+eb182) (stopsubprocess+e5)
0x004e9d4e Module(sqlservr+e9d4e) (subproc_main+174)
0x41092a47 Module(ums+2a47) (ProcessWorkRequests+ec)
0x4109326b Module(ums+326b) (ThreadStartRoutine+138)
0x7800c9eb Module(MSVCRT+c9eb) (beginthread+ce)
0x7c57438b Module(KERNEL32+438b) (TlsSetValue+f0)
2005-03-15 04:18:23.32 spid9 Error: 1203, Severity: 20, State: 1
2005-03-15 04:18:23.32 spid9 Process ID 9 attempting to unlock unowned re
source KEY: 7:1451737566:5 (57028e42a833)..
2005-03-15 04:18:23.32 spid9 Error: 1203, Severity: 20, State: 1
2005-03-15 04:18:23.32 spid9 Process ID 9 attempting to unlock unowned re
source KEY: 7:1451737566:5 (57028e42a833)..
Here's the relevant output from ...\LOG\SQLAGENT.1
3/15/2005 4:18:23 AM - ! [298] SQLServer Error: 4, Connection broken. [SQLST
ATE 08S01]
3/15/2005 4:18:23 AM - ! [298] SQLServer Error: 109, ConnectionRead (GetOver
LappedResult()). [SQLSTATE 01000]
3/15/2005 4:18:23 AM - ! [382] Logon to server '(local)' failed (ConnAttempt
CachableOp)
3/15/2005 4:18:28 AM - + [360] SQLServerAgent initiating shutdown following
MSSQLServer shutdown
3/15/2005 4:18:44 AM - + [188] Scheduler engine timed out (after 15 seconds)
waiting for 1 jobs(s) to stop
3/15/2005 4:18:44 AM - ! [359] The local host server is not running
3/15/2005 4:18:45 AM - + [098] SQLServerAgent terminated (forcefully)We had this problem too when I used to work as DBA.
The problem was with the Code that was written in the SP that was being
called in a Job. It used to crash
the SQLServer Engine (and along with it SQLAgent) and this is used to happen
randomly.
Finally we traced it to that part of the code. Fortunately, Microsoft has a
fix for this.
You dont seem to have the latest SQLServer 7 Service Pack. May be your
problem has been addressed
in one of them.
Did you the SQLServer Log file for any errors, Assertions ... etc.
Gopi
"Carl Imthurn" <nospam@.all.com> wrote in message
news:ua9EVkYKFHA.1096@.tk2msftngp13.phx.gbl...
>I am running a scheduled job through SQL Server Agent every morning,
>Tuesday to Saturday, at 3:00 am.
> It runs perfectly for ws, sometimes months. Then, one morning,
> something happens to cause SQL Server
> and SQL Server Agent to stop running. *Nothing* has changed in the
> scheduled job.
> When this situation occurs, it always occurs while running the scheduled
> job (i.e., the machine is not idle).
> The scheduled job executes my stored procedure si_ImportMaster.
> Furthermore, si_ImportMaster simply does some error checking and then
> executes a series of 10 stored procedures,
> and when the error happens, it is always when si_ImportMaster is executing
> the same stored procedure (of those 10).
> When I come in to work and realize what has happened, I re-start SQL
> Server and SQL Server Agent
> and execute the rest of the stored procedures in the process (including
> the one it stopped on).
> That "re-run" process completes without a hitch.
> I would be happy to post the contents of the stored procedure in question,
> but my thinking was that
> if it runs correctly for ws or months, it can't be the fault of the SQL
> code. Can it . . . ?
> I have been given the job of troubleshooting this. I am much more familiar
> with SQL Server programming
> than the DBA side of things, so any and all assistance is appreciated.
> We're running SQL 7.0 SP2.
> If you need me to post anything else, let me know.
> Thanks in advance -- Carl
> Following is some error log output. I was able to find 4 relevant log
> files:
> ...\LOG\ERRORLOG.1 -- excerpted below
> ...\LOG\SQLAGENT.1 -- excerpted below
> ...\LOG\SQL00031.dmp -- 48K of mostly technical stuff (core dump info,
> etc.)
> ...\LOG\SQL00032.dmp -- same as above.
> Anyway,
> Here's some output from ...\LOG\ERRORLOG.1
> 2005-03-13 19:05:06.75 kernel Microsoft SQL Server 7.00 - 7.00.842
> (Intel X86)
> Mar 2 2000 06:49:37
> Copyright (c) 1988-1998 Microsoft Corporation
> Standard Edition on Windows NT 5.0 (Build 2195: Service Pack 2)
> ----
--
> 2005-03-15 04:18:04.09 spid9 Process 9 unlocking unowned resource: KEY:
> 7:1451737566:5 (57028e42a833)
> 2005-03-15 04:18:04.12 spid9 Process 9 unlocking unowned resource: KEY:
> 7:1451737566:5 (57028e42a833)
> 2005-03-15 04:18:04.28 spid9 Using 'sqlimage.dll' version '4.0.5'
> Dump thread - spid = 9, PSS = 0x354df154, EC = 0x23a40084
> Stack Dump being sent to E:\log\SQL00031.dmp
> ****************************************
**********************************
*****
> *
> * BEGIN STACK DUMP:
> * 03/15/05 04:18:17 spid 9
> *
> * Input Buffer 40 bytes -
> * E X E C s i _ I m p o r t M a s t e r
> *
> ****************************************
**********************************
*****
> ----
--
> Short Stack Dump
> 0x77f94956 Module(ntdll+14956) (ZwGetContextThread+b)
> 0x00784f83 Module(sqlservr+384f83) (utassert_fail+19f)
> 0x005b572c Module(sqlservr+1b572c) (ExecutionContext::Cleanup+9d)
> 0x004eb421 Module(sqlservr+eb421) (ExecutionContext::Purge+45)
> 0x004eb182 Module(sqlservr+eb182) (stopsubprocess+e5)
> 0x004e9d4e Module(sqlservr+e9d4e) (subproc_main+174)
> 0x41092a47 Module(ums+2a47) (ProcessWorkRequests+ec)
> 0x4109326b Module(ums+326b) (ThreadStartRoutine+138)
> 0x7800c9eb Module(MSVCRT+c9eb) (beginthread+ce)
> 0x7c57438b Module(KERNEL32+438b) (TlsSetValue+f0)
> 2005-03-15 04:18:17.81 kernel SQL Server Assertion: File: <proc.c>,
> line=1927
> Failed Assertion = 'm_activeSdesList.Head () == NULL'.
> 2005-03-15 04:18:17.87 spid9 Using 'sqlimage.dll' version '4.0.5'
> Dump thread - spid = 9, PSS = 0x354df154, EC = 0x23a40084
> Stack Dump being sent to E:\log\SQL00032.dmp
> ****************************************
**********************************
*****
> *
> * BEGIN STACK DUMP:
> * 03/15/05 04:18:22 spid 9
> *
> * Input Buffer 40 bytes -
> * E X E C s i _ I m p o r t M a s t e r
> *
> ****************************************
**********************************
*****
> ----
--
> Short Stack Dump
> 0x77f94956 Module(ntdll+14956) (ZwGetContextThread+b)
> 0x0077b371 Module(sqlservr+37b371) (ex_raise2+35b)
> 0x0077b3dd Module(sqlservr+37b3dd) (ex_raise+5f)
> 0x00784f16 Module(sqlservr+384f16) (utassert_fail+37d)
> 0x005b572c Module(sqlservr+1b572c) (ExecutionContext::Cleanup+9d)
> 0x004eb421 Module(sqlservr+eb421) (ExecutionContext::Purge+45)
> 0x004eb182 Module(sqlservr+eb182) (stopsubprocess+e5)
> 0x004e9d4e Module(sqlservr+e9d4e) (subproc_main+174)
> 0x41092a47 Module(ums+2a47) (ProcessWorkRequests+ec)
> 0x4109326b Module(ums+326b) (ThreadStartRoutine+138)
> 0x7800c9eb Module(MSVCRT+c9eb) (beginthread+ce)
> 0x7c57438b Module(KERNEL32+438b) (TlsSetValue+f0)
> 2005-03-15 04:18:23.32 spid9 Error: 1203, Severity: 20, State: 1
> 2005-03-15 04:18:23.32 spid9 Process ID 9 attempting to unlock unowned
> resource KEY: 7:1451737566:5 (57028e42a833)..
> 2005-03-15 04:18:23.32 spid9 Error: 1203, Severity: 20, State: 1
> 2005-03-15 04:18:23.32 spid9 Process ID 9 attempting to unlock unowned
> resource KEY: 7:1451737566:5 (57028e42a833)..
>
> Here's the relevant output from ...\LOG\SQLAGENT.1
> 3/15/2005 4:18:23 AM - ! [298] SQLServer Error: 4, Connection broken.
> [SQLSTATE 08S01]
> 3/15/2005 4:18:23 AM - ! [298] SQLServer Error: 109, ConnectionRead
> (GetOverLappedResult()). [SQLSTATE 01000]
> 3/15/2005 4:18:23 AM - ! [382] Logon to server '(local)' failed
> (ConnAttemptCachableOp)
> 3/15/2005 4:18:28 AM - + [360] SQLServerAgent initiating shutdown
> following MSSQLServer shutdown
> 3/15/2005 4:18:44 AM - + [188] Scheduler engine timed out (after 15
> seconds) waiting for 1 jobs(s) to stop
> 3/15/2005 4:18:44 AM - ! [359] The local host server is not running
> 3/15/2005 4:18:45 AM - + [098] SQLServerAgent terminated (forcefully)
>|||Thanks Gopi -- I had wondered about that myself.
I will install the latest SP and see if that cures the problem.
Thanks for your time.
rgn wrote:
> We had this problem too when I used to work as DBA.
> The problem was with the Code that was written in the SP that was being
> called in a Job. It used to crash
> the SQLServer Engine (and along with it SQLAgent) and this is used to happ
en
> randomly.
> Finally we traced it to that part of the code. Fortunately, Microsoft has
a
> fix for this.
> You dont seem to have the latest SQLServer 7 Service Pack. May be your
> problem has been addressed
> in one of them.
> Did you the SQLServer Log file for any errors, Assertions ... etc.
> Gopi
> "Carl Imthurn" <nospam@.all.com> wrote in message
> news:ua9EVkYKFHA.1096@.tk2msftngp13.phx.gbl...
>
>
>sql

No comments:

Post a Comment