Example of Custom SQL Job to log to Application Event Viewer for ConfigMgr

This issue:  https://mnscug.org/blogs/sherry-kissinger/477-configmgr-current-branch-topic-id-611-swd-state-messages-flood, where Console admins accidentally checking the box for "Use Server Groups", where all members of that collection then start sending state messages every 1 minute, failing to patch, and potentially affecting State Message processing ... that occurred again in our environment.  We got lax in interactively checking for whether or not a collection accidentally had that done.  Since we have System Center Operations Manager for monitoring, and SCOM can trigger on EventIDs, we made a custom SQL Job to run multiple times a day, and if any collections (not already exempted/expected to have that setting) were to get that setting, that job will drop an Error into the Application Event Log on the Server.  A custom SCOM rule, and we'll now get alerted if this happens again.

In case someone else might find this useful, here is what we did. 

1) TEST first.  Create a collection with 1 member in it, and set that checkbox (so that you can test).  In SSRS itself, as a new query run this...modifying both of the "c3.name not in " entries for any known exceptions your environment already has.  Use Server Groups might be a valid and expected setting for servers in a Server Cluster, for example.  This is only to help assist when that setting is accidentally checked.

if (select case when count(c1.UseCluster) > 0 then 0 else 1 end as [result]
from CEP_CollectionExtendedProperties c1
join collections_g c2 on c2.collectionid=c1.collectionid
join v_collection c3 on c3.collectionid=c2.siteid
where c1.UseCluster=1
and c3.name not in ('Known Collection Name that Should Have that option checked','Another Collection that should be OK'))=0
BEGIN
 DECLARE @CollectionIDAndName nvarchar(200) = (
  select top 1 c2.siteid + ', ' + c3.name as [ValueToLog] from CEP_CollectionExtendedProperties c1
   join collections_g c2 on c2.collectionid=c1.collectionid
   join v_collection c3 on c3.collectionid=c2.siteid
   where c1.UseCluster=1
   and c3.name not in ('Known Collection Name that Should Have that option checked','Another Collection that should be OK')
)
 DECLARE @VALUE nvarchar(MAX) = ('At least 1 collection has Use Server Group Enabled: ' + @CollectionIDAndName + ' To resolve, either edit the SQL Job on the server to include this collection as a known exception, or edit that collection and uncheck Use Server Group. A possible consequence of that setting is machines in that collection may be unable to patch, and may also cause a state message backlog due to submitting state messages every minute')
 RAISERROR (@Value, 16, 1) with LOG
END

When you run that against your cm_xxx database in SSRS, because you created a test collection with that option checked, you should get an Error message in the Application Event Log on the server holding the cm_xxx database.  To test again, edit the properties of the collection and uncheck the box.  Then re-run the query to confirm it does NOT create an Application Event Log error message.  Once you have confirmed that an Event Log Error entry is created and works as you expect it to work, you can continue.

2) In SQL Server Management Studio, Create a new job, give it any name; but make it one you might understand when you or a coworker looks at sql jobs and is trying to make sense of it a year from now.  The Owner is your known standard owner, if you have a standard.  If you do not have a specific standard, try using sa or NT Authority\System.  (You just don't want to use your own individual personal ID--that is poor practice) You may also want to input as much of a description as possible.  For Steps, there will be only 1 step.
Type:  Transact-SQL script (T-SQL)
Run as:  <blank>
Database:  Your CM_xxx database
into the Command: area, input your entire tested sql script you tested successfully.

Under Advanced--use whatever Actions are your standards.  If you don't have a specific standard, we have "on success, quit the job reporting success", on Failure "Quit the job reporting failure".  0 retries, no output file, no log to table, run as user is blank.

Save the job (with no schedules for now).

3) test your job by editing the properties on a test collection and setting the "use Server Group" enabled.  Then in SQL SSRS, Jobs, right-click your job and  Run the Job from Step... and confirm you do get a new EventLog entry.  Remove the setting, and test again--ensuring you do NOT get a new EventLogEntry

4) Once you are satisfied it works as expected, using your favorite monitoring tool, whatever that might be (for us it was SCOM), set it up to monitor for Application Event Log, SOURCE: MSSQLSERVER  EventID: 17063  (that's what we got... confirm that is what you get in Application Event Log--I assume that is universal... but I've been wrong before, many many times)

5) Confirm your monitoring tool correctly reports on any new events, by again enabling Use Server Group on a test collection, run the job, removing, run the job.

6) Once confirmed; edit the job and add a Schedule or Schedules.  How frequently you would like this to run, and potentially alert on the issue, is up to your own standards and discretion.

Done!  With this in place, we hope we will get monitoring alerts about this situation... instead of alerts about a state message backlog.

  • Created on .

Copyright © 2018 - The Minnesota System Center User Group