Showing posts with label agent. Show all posts
Showing posts with label agent. Show all posts

Friday, March 30, 2012

Intermittent SQL Server shutdown

I am running a scheduled job through SQL Server Agent every morning, Tuesday
to Saturday, at 3:00 am.
It runs perfectly for ws, sometimes months. Then, one morning, something
happens to cause SQL Server
and SQL Server Agent to stop running. *Nothing* has changed in the scheduled
job.
When this situation occurs, it always occurs while running the scheduled job
(i.e., the machine is not idle).
The scheduled job executes my stored procedure si_ImportMaster.
Furthermore, si_ImportMaster simply does some error checking and then execut
es a series of 10 stored procedures,
and when the error happens, it is always when si_ImportMaster is executing t
he same stored procedure (of those 10).
When I come in to work and realize what has happened, I re-start SQL Server
and SQL Server Agent
and execute the rest of the stored procedures in the process (including the
one it stopped on).
That "re-run" process completes without a hitch.
I would be happy to post the contents of the stored procedure in question, b
ut my thinking was that
if it runs correctly for ws or months, it can't be the fault of the SQL c
ode. Can it . . . ?
I have been given the job of troubleshooting this. I am much more familiar w
ith SQL Server programming
than the DBA side of things, so any and all assistance is appreciated.
We're running SQL 7.0 SP2.
If you need me to post anything else, let me know.
Thanks in advance -- Carl
Following is some error log output. I was able to find 4 relevant log files:
...\LOG\ERRORLOG.1 -- excerpted below
...\LOG\SQLAGENT.1 -- excerpted below
...\LOG\SQL00031.dmp -- 48K of mostly technical stuff (core dump info, etc
.)
...\LOG\SQL00032.dmp -- same as above.
Anyway,
Here's some output from ...\LOG\ERRORLOG.1
2005-03-13 19:05:06.75 kernel Microsoft SQL Server 7.00 - 7.00.842 (Intel
X86)
Mar 2 2000 06:49:37
Copyright (c) 1988-1998 Microsoft Corporation
Standard Edition on Windows NT 5.0 (Build 2195: Service Pack 2)
----
--
2005-03-15 04:18:04.09 spid9 Process 9 unlocking unowned resource: KEY: 7
:1451737566:5 (57028e42a833)
2005-03-15 04:18:04.12 spid9 Process 9 unlocking unowned resource: KEY: 7
:1451737566:5 (57028e42a833)
2005-03-15 04:18:04.28 spid9 Using 'sqlimage.dll' version '4.0.5'
Dump thread - spid = 9, PSS = 0x354df154, EC = 0x23a40084
Stack Dump being sent to E:\log\SQL00031.dmp
****************************************
************************************
***
*
* BEGIN STACK DUMP:
* 03/15/05 04:18:17 spid 9
*
* Input Buffer 40 bytes -
* E X E C s i _ I m p o r t M a s t e r
*
****************************************
************************************
***
----
--
Short Stack Dump
0x77f94956 Module(ntdll+14956) (ZwGetContextThread+b)
0x00784f83 Module(sqlservr+384f83) (utassert_fail+19f)
0x005b572c Module(sqlservr+1b572c) (ExecutionContext::Cleanup+9d)
0x004eb421 Module(sqlservr+eb421) (ExecutionContext::Purge+45)
0x004eb182 Module(sqlservr+eb182) (stopsubprocess+e5)
0x004e9d4e Module(sqlservr+e9d4e) (subproc_main+174)
0x41092a47 Module(ums+2a47) (ProcessWorkRequests+ec)
0x4109326b Module(ums+326b) (ThreadStartRoutine+138)
0x7800c9eb Module(MSVCRT+c9eb) (beginthread+ce)
0x7c57438b Module(KERNEL32+438b) (TlsSetValue+f0)
2005-03-15 04:18:17.81 kernel SQL Server Assertion: File: <proc.c>, line=1
927
Failed Assertion = 'm_activeSdesList.Head () == NULL'.
2005-03-15 04:18:17.87 spid9 Using 'sqlimage.dll' version '4.0.5'
Dump thread - spid = 9, PSS = 0x354df154, EC = 0x23a40084
Stack Dump being sent to E:\log\SQL00032.dmp
****************************************
************************************
***
*
* BEGIN STACK DUMP:
* 03/15/05 04:18:22 spid 9
*
* Input Buffer 40 bytes -
* E X E C s i _ I m p o r t M a s t e r
*
****************************************
************************************
***
----
--
Short Stack Dump
0x77f94956 Module(ntdll+14956) (ZwGetContextThread+b)
0x0077b371 Module(sqlservr+37b371) (ex_raise2+35b)
0x0077b3dd Module(sqlservr+37b3dd) (ex_raise+5f)
0x00784f16 Module(sqlservr+384f16) (utassert_fail+37d)
0x005b572c Module(sqlservr+1b572c) (ExecutionContext::Cleanup+9d)
0x004eb421 Module(sqlservr+eb421) (ExecutionContext::Purge+45)
0x004eb182 Module(sqlservr+eb182) (stopsubprocess+e5)
0x004e9d4e Module(sqlservr+e9d4e) (subproc_main+174)
0x41092a47 Module(ums+2a47) (ProcessWorkRequests+ec)
0x4109326b Module(ums+326b) (ThreadStartRoutine+138)
0x7800c9eb Module(MSVCRT+c9eb) (beginthread+ce)
0x7c57438b Module(KERNEL32+438b) (TlsSetValue+f0)
2005-03-15 04:18:23.32 spid9 Error: 1203, Severity: 20, State: 1
2005-03-15 04:18:23.32 spid9 Process ID 9 attempting to unlock unowned re
source KEY: 7:1451737566:5 (57028e42a833)..
2005-03-15 04:18:23.32 spid9 Error: 1203, Severity: 20, State: 1
2005-03-15 04:18:23.32 spid9 Process ID 9 attempting to unlock unowned re
source KEY: 7:1451737566:5 (57028e42a833)..
Here's the relevant output from ...\LOG\SQLAGENT.1
3/15/2005 4:18:23 AM - ! [298] SQLServer Error: 4, Connection broken. [SQLST
ATE 08S01]
3/15/2005 4:18:23 AM - ! [298] SQLServer Error: 109, ConnectionRead (GetOver
LappedResult()). [SQLSTATE 01000]
3/15/2005 4:18:23 AM - ! [382] Logon to server '(local)' failed (ConnAttempt
CachableOp)
3/15/2005 4:18:28 AM - + [360] SQLServerAgent initiating shutdown following
MSSQLServer shutdown
3/15/2005 4:18:44 AM - + [188] Scheduler engine timed out (after 15 seconds)
waiting for 1 jobs(s) to stop
3/15/2005 4:18:44 AM - ! [359] The local host server is not running
3/15/2005 4:18:45 AM - + [098] SQLServerAgent terminated (forcefully)We had this problem too when I used to work as DBA.
The problem was with the Code that was written in the SP that was being
called in a Job. It used to crash
the SQLServer Engine (and along with it SQLAgent) and this is used to happen
randomly.
Finally we traced it to that part of the code. Fortunately, Microsoft has a
fix for this.
You dont seem to have the latest SQLServer 7 Service Pack. May be your
problem has been addressed
in one of them.
Did you the SQLServer Log file for any errors, Assertions ... etc.
Gopi
"Carl Imthurn" <nospam@.all.com> wrote in message
news:ua9EVkYKFHA.1096@.tk2msftngp13.phx.gbl...
>I am running a scheduled job through SQL Server Agent every morning,
>Tuesday to Saturday, at 3:00 am.
> It runs perfectly for ws, sometimes months. Then, one morning,
> something happens to cause SQL Server
> and SQL Server Agent to stop running. *Nothing* has changed in the
> scheduled job.
> When this situation occurs, it always occurs while running the scheduled
> job (i.e., the machine is not idle).
> The scheduled job executes my stored procedure si_ImportMaster.
> Furthermore, si_ImportMaster simply does some error checking and then
> executes a series of 10 stored procedures,
> and when the error happens, it is always when si_ImportMaster is executing
> the same stored procedure (of those 10).
> When I come in to work and realize what has happened, I re-start SQL
> Server and SQL Server Agent
> and execute the rest of the stored procedures in the process (including
> the one it stopped on).
> That "re-run" process completes without a hitch.
> I would be happy to post the contents of the stored procedure in question,
> but my thinking was that
> if it runs correctly for ws or months, it can't be the fault of the SQL
> code. Can it . . . ?
> I have been given the job of troubleshooting this. I am much more familiar
> with SQL Server programming
> than the DBA side of things, so any and all assistance is appreciated.
> We're running SQL 7.0 SP2.
> If you need me to post anything else, let me know.
> Thanks in advance -- Carl
> Following is some error log output. I was able to find 4 relevant log
> files:
> ...\LOG\ERRORLOG.1 -- excerpted below
> ...\LOG\SQLAGENT.1 -- excerpted below
> ...\LOG\SQL00031.dmp -- 48K of mostly technical stuff (core dump info,
> etc.)
> ...\LOG\SQL00032.dmp -- same as above.
> Anyway,
> Here's some output from ...\LOG\ERRORLOG.1
> 2005-03-13 19:05:06.75 kernel Microsoft SQL Server 7.00 - 7.00.842
> (Intel X86)
> Mar 2 2000 06:49:37
> Copyright (c) 1988-1998 Microsoft Corporation
> Standard Edition on Windows NT 5.0 (Build 2195: Service Pack 2)
> ----
--
> 2005-03-15 04:18:04.09 spid9 Process 9 unlocking unowned resource: KEY:
> 7:1451737566:5 (57028e42a833)
> 2005-03-15 04:18:04.12 spid9 Process 9 unlocking unowned resource: KEY:
> 7:1451737566:5 (57028e42a833)
> 2005-03-15 04:18:04.28 spid9 Using 'sqlimage.dll' version '4.0.5'
> Dump thread - spid = 9, PSS = 0x354df154, EC = 0x23a40084
> Stack Dump being sent to E:\log\SQL00031.dmp
> ****************************************
**********************************
*****
> *
> * BEGIN STACK DUMP:
> * 03/15/05 04:18:17 spid 9
> *
> * Input Buffer 40 bytes -
> * E X E C s i _ I m p o r t M a s t e r
> *
> ****************************************
**********************************
*****
> ----
--
> Short Stack Dump
> 0x77f94956 Module(ntdll+14956) (ZwGetContextThread+b)
> 0x00784f83 Module(sqlservr+384f83) (utassert_fail+19f)
> 0x005b572c Module(sqlservr+1b572c) (ExecutionContext::Cleanup+9d)
> 0x004eb421 Module(sqlservr+eb421) (ExecutionContext::Purge+45)
> 0x004eb182 Module(sqlservr+eb182) (stopsubprocess+e5)
> 0x004e9d4e Module(sqlservr+e9d4e) (subproc_main+174)
> 0x41092a47 Module(ums+2a47) (ProcessWorkRequests+ec)
> 0x4109326b Module(ums+326b) (ThreadStartRoutine+138)
> 0x7800c9eb Module(MSVCRT+c9eb) (beginthread+ce)
> 0x7c57438b Module(KERNEL32+438b) (TlsSetValue+f0)
> 2005-03-15 04:18:17.81 kernel SQL Server Assertion: File: <proc.c>,
> line=1927
> Failed Assertion = 'm_activeSdesList.Head () == NULL'.
> 2005-03-15 04:18:17.87 spid9 Using 'sqlimage.dll' version '4.0.5'
> Dump thread - spid = 9, PSS = 0x354df154, EC = 0x23a40084
> Stack Dump being sent to E:\log\SQL00032.dmp
> ****************************************
**********************************
*****
> *
> * BEGIN STACK DUMP:
> * 03/15/05 04:18:22 spid 9
> *
> * Input Buffer 40 bytes -
> * E X E C s i _ I m p o r t M a s t e r
> *
> ****************************************
**********************************
*****
> ----
--
> Short Stack Dump
> 0x77f94956 Module(ntdll+14956) (ZwGetContextThread+b)
> 0x0077b371 Module(sqlservr+37b371) (ex_raise2+35b)
> 0x0077b3dd Module(sqlservr+37b3dd) (ex_raise+5f)
> 0x00784f16 Module(sqlservr+384f16) (utassert_fail+37d)
> 0x005b572c Module(sqlservr+1b572c) (ExecutionContext::Cleanup+9d)
> 0x004eb421 Module(sqlservr+eb421) (ExecutionContext::Purge+45)
> 0x004eb182 Module(sqlservr+eb182) (stopsubprocess+e5)
> 0x004e9d4e Module(sqlservr+e9d4e) (subproc_main+174)
> 0x41092a47 Module(ums+2a47) (ProcessWorkRequests+ec)
> 0x4109326b Module(ums+326b) (ThreadStartRoutine+138)
> 0x7800c9eb Module(MSVCRT+c9eb) (beginthread+ce)
> 0x7c57438b Module(KERNEL32+438b) (TlsSetValue+f0)
> 2005-03-15 04:18:23.32 spid9 Error: 1203, Severity: 20, State: 1
> 2005-03-15 04:18:23.32 spid9 Process ID 9 attempting to unlock unowned
> resource KEY: 7:1451737566:5 (57028e42a833)..
> 2005-03-15 04:18:23.32 spid9 Error: 1203, Severity: 20, State: 1
> 2005-03-15 04:18:23.32 spid9 Process ID 9 attempting to unlock unowned
> resource KEY: 7:1451737566:5 (57028e42a833)..
>
> Here's the relevant output from ...\LOG\SQLAGENT.1
> 3/15/2005 4:18:23 AM - ! [298] SQLServer Error: 4, Connection broken.
> [SQLSTATE 08S01]
> 3/15/2005 4:18:23 AM - ! [298] SQLServer Error: 109, ConnectionRead
> (GetOverLappedResult()). [SQLSTATE 01000]
> 3/15/2005 4:18:23 AM - ! [382] Logon to server '(local)' failed
> (ConnAttemptCachableOp)
> 3/15/2005 4:18:28 AM - + [360] SQLServerAgent initiating shutdown
> following MSSQLServer shutdown
> 3/15/2005 4:18:44 AM - + [188] Scheduler engine timed out (after 15
> seconds) waiting for 1 jobs(s) to stop
> 3/15/2005 4:18:44 AM - ! [359] The local host server is not running
> 3/15/2005 4:18:45 AM - + [098] SQLServerAgent terminated (forcefully)
>|||Thanks Gopi -- I had wondered about that myself.
I will install the latest SP and see if that cures the problem.
Thanks for your time.
rgn wrote:
> We had this problem too when I used to work as DBA.
> The problem was with the Code that was written in the SP that was being
> called in a Job. It used to crash
> the SQLServer Engine (and along with it SQLAgent) and this is used to happ
en
> randomly.
> Finally we traced it to that part of the code. Fortunately, Microsoft has
a
> fix for this.
> You dont seem to have the latest SQLServer 7 Service Pack. May be your
> problem has been addressed
> in one of them.
> Did you the SQLServer Log file for any errors, Assertions ... etc.
> Gopi
> "Carl Imthurn" <nospam@.all.com> wrote in message
> news:ua9EVkYKFHA.1096@.tk2msftngp13.phx.gbl...
>
>
>sql

Wednesday, March 21, 2012

Interesting issue

I tried to create a new SQL Agent job today and couldn't. Got this error msg
.
"Error 0: This server has been disconnected. You must reconnect to perform
this operation. The job was not saved." I got this message on every server I
connected to via EM, dozens - EXCEPT the ones that weren't default instances
.
My colleague had the same problem but no-one else in the team did. My
colleague and I have different PC builds in that we have XP SP2 and have SQL
2005 tools installed as well as 2000. Sensing the problem was something to
do with port 1433 as it only affected default instances, I changed our clien
t
connectivity protocols so that Named Pipes was above TCP/IP, and now we don'
t
have a problem. Strange. Anyone shed any light on why connecting to default
instance SQL Servers via EM using TCP/IP connections stops us saving new job
s
in SQL Agent? Many thanks.Solved it. New security software had a 'MS SQL Job Scheduling' intrusion
Prevention rule. When that was omitted, we could create jobs using TCP/IP
connections.
"Shirley" wrote:

> I tried to create a new SQL Agent job today and couldn't. Got this error m
sg.
> "Error 0: This server has been disconnected. You must reconnect to perform
> this operation. The job was not saved." I got this message on every server
I
> connected to via EM, dozens - EXCEPT the ones that weren't default instanc
es.
> My colleague had the same problem but no-one else in the team did. My
> colleague and I have different PC builds in that we have XP SP2 and have S
QL
> 2005 tools installed as well as 2000. Sensing the problem was something t
o
> do with port 1433 as it only affected default instances, I changed our cli
ent
> connectivity protocols so that Named Pipes was above TCP/IP, and now we do
n't
> have a problem. Strange. Anyone shed any light on why connecting to defaul
t
> instance SQL Servers via EM using TCP/IP connections stops us saving new j
obs
> in SQL Agent? Many thanks.|||Just out of curiosity, what kind of security software?|||Symantec.
"Curt" wrote:

> Just out of curiosity, what kind of security software?
>sql

Interesting issue

I tried to create a new SQL Agent job today and couldn't. Got this error msg.
"Error 0: This server has been disconnected. You must reconnect to perform
this operation. The job was not saved." I got this message on every server I
connected to via EM, dozens - EXCEPT the ones that weren't default instances.
My colleague had the same problem but no-one else in the team did. My
colleague and I have different PC builds in that we have XP SP2 and have SQL
2005 tools installed as well as 2000. Sensing the problem was something to
do with port 1433 as it only affected default instances, I changed our client
connectivity protocols so that Named Pipes was above TCP/IP, and now we don't
have a problem. Strange. Anyone shed any light on why connecting to default
instance SQL Servers via EM using TCP/IP connections stops us saving new jobs
in SQL Agent? Many thanks.Solved it. New security software had a 'MS SQL Job Scheduling' intrusion
Prevention rule. When that was omitted, we could create jobs using TCP/IP
connections.
"Shirley" wrote:
> I tried to create a new SQL Agent job today and couldn't. Got this error msg.
> "Error 0: This server has been disconnected. You must reconnect to perform
> this operation. The job was not saved." I got this message on every server I
> connected to via EM, dozens - EXCEPT the ones that weren't default instances.
> My colleague had the same problem but no-one else in the team did. My
> colleague and I have different PC builds in that we have XP SP2 and have SQL
> 2005 tools installed as well as 2000. Sensing the problem was something to
> do with port 1433 as it only affected default instances, I changed our client
> connectivity protocols so that Named Pipes was above TCP/IP, and now we don't
> have a problem. Strange. Anyone shed any light on why connecting to default
> instance SQL Servers via EM using TCP/IP connections stops us saving new jobs
> in SQL Agent? Many thanks.|||Just out of curiosity, what kind of security software?|||Symantec.
"Curt" wrote:
> Just out of curiosity, what kind of security software?
>

interesting error message

Hi,
I keep geeting this error message in one of my servers.
Replication: agent failed.: Unable to expand message 17055 [-1073724769]
18265 Log backed up: Database: mine, creation date(time):
2004/08/11(14:41:52), first LSN: 10:14151:1, last LSN: 10:14151:1, number of
dump devices: 1, device informa... ..
There is no replication configured on the server. The backup job is also
successful. What is this message about?
I don't know, but if it's really from August 11 of last year, are you sure
it's still relevant?
On 3/2/05 8:45 PM, in article
545CADDF-C5EF-48A2-967F-A4246D13D306@.microsoft.com, "Bharath"
<Bharath@.discussions.microsoft.com> wrote:

> Hi,
> I keep geeting this error message in one of my servers.
> Replication: agent failed.: Unable to expand message 17055 [-1073724769]
> 18265 Log backed up: Database: mine, creation date(time):
> 2004/08/11(14:41:52), first LSN: 10:14151:1, last LSN: 10:14151:1, number of
> dump devices: 1, device informa... ..
>
> There is no replication configured on the server. The backup job is also
> successful. What is this message about?
|||Aaron,
It is relevant - I think the message is of Aug 11 - because the db was
created on Aug 11th 2004 - but it is a current message only.
"Aaron [SQL Server MVP]" wrote:

> I don't know, but if it's really from August 11 of last year, are you sure
> it's still relevant?
>
> On 3/2/05 8:45 PM, in article
> 545CADDF-C5EF-48A2-967F-A4246D13D306@.microsoft.com, "Bharath"
> <Bharath@.discussions.microsoft.com> wrote:
>
>

interesting error message

Hi,
I keep geeting this error message in one of my servers.
Replication: agent failed.: Unable to expand message 17055 [-1073724769]
18265 Log backed up: Database: mine, creation date(time):
2004/08/11(14:41:52), first LSN: 10:14151:1, last LSN: 10:14151:1, number of
dump devices: 1, device informa... ..
There is no replication configured on the server. The backup job is also
successful. What is this message about?I don't know, but if it's really from August 11 of last year, are you sure
it's still relevant?
On 3/2/05 8:45 PM, in article
545CADDF-C5EF-48A2-967F-A4246D13D306@.microsoft.com, "Bharath"
<Bharath@.discussions.microsoft.com> wrote:
> Hi,
> I keep geeting this error message in one of my servers.
> Replication: agent failed.: Unable to expand message 17055 [-1073724769]
> 18265 Log backed up: Database: mine, creation date(time):
> 2004/08/11(14:41:52), first LSN: 10:14151:1, last LSN: 10:14151:1, number of
> dump devices: 1, device informa... ..
>
> There is no replication configured on the server. The backup job is also
> successful. What is this message about?|||Aaron,
It is relevant - I think the message is of Aug 11 - because the db was
created on Aug 11th 2004 - but it is a current message only.
"Aaron [SQL Server MVP]" wrote:
> I don't know, but if it's really from August 11 of last year, are you sure
> it's still relevant?
>
> On 3/2/05 8:45 PM, in article
> 545CADDF-C5EF-48A2-967F-A4246D13D306@.microsoft.com, "Bharath"
> <Bharath@.discussions.microsoft.com> wrote:
> > Hi,
> >
> > I keep geeting this error message in one of my servers.
> >
> > Replication: agent failed.: Unable to expand message 17055 [-1073724769]
> > 18265 Log backed up: Database: mine, creation date(time):
> > 2004/08/11(14:41:52), first LSN: 10:14151:1, last LSN: 10:14151:1, number of
> > dump devices: 1, device informa... ..
> >
> >
> > There is no replication configured on the server. The backup job is also
> > successful. What is this message about?
>

interesting error message

Hi,
I keep geeting this error message in one of my servers.
Replication: agent failed.: Unable to expand message 17055 [-1073724769]
18265 Log backed up: Database: mine, creation date(time):
2004/08/11(14:41:52), first LSN: 10:14151:1, last LSN: 10:14151:1, number of
dump devices: 1, device informa... ..
There is no replication configured on the server. The backup job is also
successful. What is this message about?I don't know, but if it's really from August 11 of last year, are you sure
it's still relevant?
On 3/2/05 8:45 PM, in article
545CADDF-C5EF-48A2-967F-A4246D13D306@.microsoft.com, "Bharath"
<Bharath@.discussions.microsoft.com> wrote:

> Hi,
> I keep geeting this error message in one of my servers.
> Replication: agent failed.: Unable to expand message 17055 [-107372476
9]
> 18265 Log backed up: Database: mine, creation date(time):
> 2004/08/11(14:41:52), first LSN: 10:14151:1, last LSN: 10:14151:1, number
of
> dump devices: 1, device informa... ..
>
> There is no replication configured on the server. The backup job is also
> successful. What is this message about?|||Aaron,
It is relevant - I think the message is of Aug 11 - because the db was
created on Aug 11th 2004 - but it is a current message only.
"Aaron [SQL Server MVP]" wrote:

> I don't know, but if it's really from August 11 of last year, are you sure
> it's still relevant?
>
> On 3/2/05 8:45 PM, in article
> 545CADDF-C5EF-48A2-967F-A4246D13D306@.microsoft.com, "Bharath"
> <Bharath@.discussions.microsoft.com> wrote:
>
>