Promoting MS to RMS : USE WITH CAUTION!!
If you haven’t had the pleasure of testing the promotion of a management server (MS) to be a root management server (RMS) and then back again in the event of a RMS failure then this is a must read.
After I implementing Microsoft’s System Center Operation’s Manager 2007 (SCOM) into our environment I wanted to test the disaster recovery process. Our design is using an RMS and one MS. The test was meant to be simple and I of course didn’t think to create another test environment for this. *what an idiot*
To prove that the steps provided by Microsoft worked I unplugged the RMS and prompted the MS server. Hey this worked like a charm. It was the next step that really got ugly, the next step of course was when I brought the RMS back on-line. Because that is the goal in the end right. To restore the original server. Somewhere I missed a step. And before you know it I was rebuilding the entire environment. I was fairly sure that I hadn’t really missed anything. But I screw up so often who really knows for sure? I then created that test environment that ignored the first time and tried it again. Hey what do you know I had the same results. Then for the third test I opened up a Microsoft support call and had them help me do it right, and again the same results. The Microsoft support tech confessed to me that in there training they only test the promotion . They didn’t actually try to go back to the original solid state.
After a couple of days he for-wards me an internal email from Microsoft that talks about the process and how to avoid this issue.
Read on to see the risk assessment and the detailed steps of how to demote and promote
Risk assessment of promotion
The Root management server (RMS) is the primary SCOM server that is responsible for sending alerts out to a pager. It is also the server that needs to be running in order to access the alerts window within SCOM.
If the RMS fails, the Management Server (MS) will continue to accept the alerts from the agents (client servers that are being monitored). However, the MS will only collect alerts. It will not provide access to the Console to view alerts, nor will it send alerts to pager or email. The result of this is that all alerting will stop until there is a RMS in the environment again.
Options
In our environment, we have three options to handle the RMS to MS fail-over.
-
Don’t promote the MS server. Fix the RMS and bring it back on-line.
During the outage of the RMS server there will be no indication of other service failures in the environment. We would be relying on the business to notify us through the Service Desk.
- Promote the MS to RMS
Benefit: This will provide Alerting and console access.
I have documented and tested the promotion and demotion of RMS servers in our environment. If the steps are followed exact then there should not be an issue.
However, if the procedure is not followed exactly it would mean a complete loss of the SCOM environment. The database may need to be rebuilt and the agents would need to be redeployed to all servers. The SCOM servers would need to be rebuilt from scratch.
The following note is from an internal Microsoft memo that was provided to us during Microsoft support call. This is not included in the official MS documentation. There is the risk that an analyst that responds to this type of incident might consult the official documentation and not complete the fail-over properly.
NOTE: You do NOT want the previous RMS to still be an RMS when it comes back up or becomes reachable again (network back up), as both RMS’s will compete for the DB’s attention, which is not a healthy state. If you do get in this state (old RMS not demoted, back in contact with DB), the correct fix is to demote that old/previous RMS first, as indicated above.
Do not try to demote the new RMS, as you will end up with no RMS indicated in the DB, and would likely have no way to recover the Management Group. If you want to set the previous RMS back to the current RMS, demote it first, then re-promote it, which will then automatically demote the existing RMS.
The promotion should be used only as the last resort and only in the case where there is no other way to restore the original RMS.
- Upgrade the SCOM environment to a Cluster. (The best option but the most costly)
Clustered SCOM environment would ascertain stability of the SCOM environment and eliminate its single point of failure. It would also eliminate the need for the shaky promoting procedure described in item 2.
How to demote the Root Management Server so that you can promote it again later.
The following 3 steps need to happen to promote the Management server.
1. Backup the key on the current RMS to a file share using SecureStorageBackup tool
2. import the key to the MS to be promoted to RMS from the file share
3. Run the management server config tool on the new RMS to promote it
1. Backup the key on the current RMS to a file share
To run the SecureStorageBackup.exe Tool
On the current Root Management Server
a. Copy
SecureStorageBackup.exe and ManagementServerConfigTool.exe from
\ Ops_SP1Rc1\SupportTools.
To
C:\Program Files\System Center Operations Manager 2007
b. Open a command line and navigate to
C:\Program Files\System Center Operations Manager 2007
c. Run the following command:
SecureStorageBackup Backup “Network Path\SCOM_Key_backup\SCOMKEY.bin”
This key must be available to the machine you want to promote.
d. Provide a password
to secure the key and retype the password. The password must be eight characters long. Once the key is exported, you will see a success message.
2. Import Key to the current MS
On the Management Server you want to promote to become the Root Management Server
a. Copy SecureStorageBackup.exe and ManagementServerConfigTool.exe from
Ops_SP1Rc1\SupportTools.
To
C:\Program Files\System Center Operations Manager 2007
b. Open a command line and navigate to
C:\Program Files\System Center Operations Manager 2007
c. Run the following command:
SecureStorageBackup Restore “Network Parch\SCOM_Key_backup\SCOMKEY.bin”
d. Provide password used to access the key and retype the password again. Once the key is exported, you see a success message.
3. On the Management Server you want to promote to become the Root Management Server
a. Copy SecureStorageBackup.exe and ManagementServerConfigTool.exe from
\Ops_SP1Rc1\SupportTools.
To
C:\Program Files\System Center Operations Manager 2007
b. Open a command line and navigate to
C:\Program Files\System Center Operations Manager 2007
c. Run the following command:
ManagementServerConfigTool.exe PromoteRMS
If the Current RMS is currently on-line and reachable you are done. This will automatically update and demote the RMS to a MS. If not it is critical to complete ALL the following steps.
d. Go to the promoted Root Management Server, open the services and stop the Ops HealthService.
a. Go to the installation directory (default: %ProgramFiles%\System Center Operations Manager 2007) and delete the Health Service State folder.
e. Start the MOM Health Service.
f. When you open the Operations Console the first time, specify the name of the new Root Management Server to connect to.
4. Demote the Original RMS server when it is back on-line. This needs to be done before it can be promoted back to RMS.
a. Copy SecureStorageBackup.exe and ManagementServerConfigTool.exe from
\ Ops_SP1Rc1\SupportTools.
To
C:\Program Files\System Center Operations Manager 2007
b. Open a command line and navigate to
C:\Program Files\System Center Operations Manager 2007
c. Run the following command:
ManagementServerConfigTool.exe UpdateDemotedRMS
After the Original RMS is back online you will need to promote it again
5. On the Management Server you want to promote to become the Root Management Server
a. Copy SecureStorageBackup.exe and ManagementServerConfigTool.exe from
\ Ops_SP1Rc1\SupportTools.
To
C:\Program Files\System Center Operations Manager 2007
b. Open a command line and navigate to
C:\Program Files\System Center Operations Manager 2007
c. Run the following command:
ManagementServerConfigTool.exe PromoteRMS
Final Steps
Go to the promoted Root Management Server, open the services and stop the Ops HealthService.
-
Go to the installation directory (default: %ProgramFiles%\System Center Operations Manager 2007) and delete the Health Service State folder.
-
Start the MOM Health Service.
- When you open the Operations Console the first time, specify the name of the new Root Management Server to connect to.
If the Current RMS is currently on-line and reachable you are done. This will automatically update and demote the RMS to a MS.
Comments
6 Responses to “Promoting MS to RMS : USE WITH CAUTION!!”
Got something to say?
You must be logged in to post a comment.
I love it when the MS tech guys haven’t actually tried this stuff themselves…
Do you know if they have sorted this out in SP2 or hav eyou tried it as yet.
Cheers
If the RMS is clustered, but both nodes go down (network failure, rack destruction etc) and you promote a MS to be the temporary RMS, when the cluster is bought back online, do BOTH nodes need to be demoted/promoted to RMS, or just the virtual RMS? Same question applies if the cluster needs to be rebuilt – in that case there would still be a clustered MS to demote/promote.
Thanks.
Hi Chris, Promoting is a simple task. But make sure you follow the steps. If done wrong it can create more issues for you when you are ready to bring the original RMS online. If you need to promote your MS due to a full cluster loss. And you plan to return to the cluster after it has been resorted. Then you will need to to promote the virtual RMS only. Do not promote the nodes. If you were to do this you could introduce multiple RMS’s into AD and will been then be stuck with the ugly job of cleaning up SPN’s.
If you need to rebuild the cluster. You would promote to the MS first. Make sure you perform a full promotion and removal of the old RMS when you do this. Rebuild your cluster and then promote it again. Send me a note if you have any questions through this.
Brad
Hi Brad,
I made the unenviable mistake of desinstalling Scom on the passive node of our RMS cluster. I tried reinstalling but the rms key must have been stale. Then I couldn’t connect to the Scom console when either clustered node was the RMS cluster.
So, I ran the msiexec.exe /i CREATE_NEWKEY=1
I could then connect to the Scom console and see alerts in the console but emailing and paging seems to be broken.
One of the clustered management servers is grey in under “management server” and listed as an RMS. Clearly this is a bad sign.
So, with that said do we need to promote a management server to the RMS. Then deinstall Scom on both RMS. Reinstall and create the cluster resources.
And promote the cluster virtual name. Your last note says: “and removal of the old RMS when you do this”. What does that mean?
If you have exact instructions, that would be great, since I’m not feeling lucky lately.
Thanks,
Paul
Hey Paul,
Sorry about the delay, just getting back from vacation. Have you found a solution to this yet?
Is the reporting server and DW in the same cluster?
“and removal of the old RMS when you do this” I bit of a typo with poor crammer. When you promote the MS to RMS. You also need to make sure you have confirmed that the RMS is completely removed from SCOM before you can promote it again.
Hopefully your are already healthy, let me know if you want to discuss this more.
Brad