Welcome back. Two weeks ago I started blogging about StartReady: an interesting new Microsoft partner that specializes on building Microsoft Based Appliances, including an OCS Appliance. In a couple of episodes I’m going to share some of the technical things these guys are doing. In the first episode I discussed the more general overview of their architecture and choices they made. The second episode was all about the unattended installation and configuration of OCS.
In this episode I have an interview with two specialists from StartReady, Erik and Arjan. Erik is responsible for the remote management architecture and Arjan focuses on the virtualization technology that is used on the appliances.
First of all I have to say that your remote management architecture looks very impressive. It probably was a complex part of the overall design. Erik can you walk me through the architecture and tell us something about the issues you encountered?
You are right – Remote Management is a key part of our appliances. Without it, it doesn’t comply to the definition of an appliance. So, let’s start at the customers site. In the upper right corner of the picture, you see the installed appliance. This machine should be managed remotely by the Value Added Reseller (VAR). To make this possible we automatically deploy and configure System Center Essentials (SCE) during an appliance deployment. One part of the SCE configuration consists of configuring a Management Group. All the other installed virtual machines on the appliance, e.g. Edge Role and Mediation Role are added to this Management Group. Only the machines in this Management Group are managed. Now, the SCE server has two important functions. First, it functions as a Update Server (through WSUS) for all the machines in the Management Group. Second, SCE functions as a gateway to the StartReady System Center Operations Manager environment. This makes it possible to receive information (events) and to remotely manage the appliance.
So with the locally installed WSUS server you have full control of the updates that are pushed to the appliances?
Yes, that’s correct. The WSUS server on the appliance is pointed to the StartReady WSUS upstream server. So, what we do when Microsoft releases an update or patch, is testing it in our own datacenter, before releasing it to the customers appliance. SCE receives the updates, and replicates them to the Management Group. This way we can guarantee the highest service level. By default we release our updates at night. The VAR then can determine when to restart the appliances. The VAR knows when the service windows are available and restarts can be scheduled. By working with this staged deployment every customer can have its specific SLA.
Let´s focus on some more details regarding remote management. In the picture you can see several arrows. Port 5723 has to be opened for outbound communication on the customers firewall. This connects the SCE Server to the OpsMgr GateWay at StartReady. Normally, in an Active Directory based environment, SCE and the OpsMgr Server would authenticate each other using Kerberos, but that´s not possible in this scenario. The servers are not in the same domain and therefore we need the OpsMgr GateWay. Authentication is being done using certificates. These certificates are provisioned by our Certificate Authority (CA) Server hosted at StartReady.
So the SCE server communicates through the OpsMgr GateWay server with OpsMgr. The actual processing of the data that is transferred is done on the OpsMgr server. So, on this server we have all the management packs installed, such as for SQL Server, OCS, etc.. SCE on the appliance has no management packs installed, except for the system management packs. The health state of the appliance is determined on the OpsMgr Server. The Management Packs are responsible for this.
In order to provide the VAR with remote access to our OpsMgr environment, we also host a terminal server and a terminal server gateway. Via https over port 443 the VAR opens a RDP session through our terminal server gateway to a terminal server. On this terminal server all the necessary tools are available to manage the appliances that the VAR maintains. One of the tools is the Ops Manager Console. In the console the VAR can see only his own customers. The Ops Manager Console is the starting point for all management. For example, there is a general health status overview for all the appliances. So a VAR can see in a split second how the customers environment is doing.
Can you tell something about the Remote Web Workplace as shown in your remote management architecture?
The Remote Web Workplace is something Microsoft introduced in Small Business Server 2003. It is now more broadly used.
First the VAR logs on to a webpage through port 443. This is all done from a terminal server session at the StartReady environment. After logging on, the VAR can choose a server on the appliance and start a RDP session over port 4125. For example, he now can access the OCS server. Something to realize is that only the machines on the Appliance are available to connect to. All other servers in the customers environment are not. And that is a good thing.
I agree with you! But for the management of the appliance, the customer needs to open just three different ports on his firewall?
Yes, and to reduce possible risk even more, the customer will only grant access to the machine from just one specific location (StartReady) and all communication is encrypted by using certificates. Further, the OpsMgr server and the SCE server authenticate mutually by using certificates. It is not possible to open a remote desktop session to an appliance directly, as we do not publish the RDP protocol to the internet from the customer firewall. At last, before the VAR can manage a server, he has to authenticate himself several times using different accounts and passwords.
We see that some customers are reluctant in opening up their environment, but with this story, until now, we have convinced them all.
Interesting. I too feel that quality of service and high SLA´s require professional management. You are the ones that know the appliance best, so a customer has many advantages in letting you do it. In more general, I see that this transformation from in-house management to outsourced management is taking place. And furthermore, the way you do it is based on the Best Practice by Microsoft.
Back to technology: what were the issues you encountered?
First of all, out of the box SCE is not suited to be installed unattended. So, we had difficulties to automatically configure SCE on the appliance. However, during development Service Pack 1 of SCE arrived. It solved a few issues we had with the installation and especially the configuration. Our latest issue was the connection of SCE to our OpsMgr production environment. In our test environment everything was working fine, but we couldn’t get it to work in production. We saw packages being dropped on our ISA server for no particular reason. To make a long, and I mean LOOONG story short it was a bug in the Hyper-V RC0 and an upgrade to RC1 fixed it.
Ok. Arjan, your turn. Can you tell me why you have chosen to use Hyper-V and therewith something that’s still in development? Isn’t that a big risk?
Well, first of all, StartReady is founded by two Microsoft guys. They knew Microsoft’s roadmap in virtualization and were keen on jumping in on a moment that others were not yet. They were convinced that the technology would bring them what was necessary to deliver a good product. They believed it would bring them competitive advantage – and it looks like they were right. So we started using Virtual Server 2005, mainly because Hyper-V wasn’t released officially by Microsoft yet. After about two months StartReady got the opportunity to join the Hyper-V Rapid Deployment Program (RDP). This program supports StartReady in different ways. First, it allows us to use Hyper-V in production with our customers because there’s support from Microsoft – other partners cannot give that guarantee. Secondly, a Microsoft consultant supports us in the development process. This really gave us a head start and made the decision to migrate to the Hyper-V much easier.
Did you encounter any specific issues with Hyper-V?
Of course – it is still in beta and making software is hard. But next to that, we use it differently than most partners do. This resulted in behavior that sometimes was hard to reproduce. One of the latest issues was already mentioned by Erik. But I can add a few more. In the beginning we ran into performance issues. After deploying an appliance the performance of the virtual machines would drop very rapidly and unexpectedly. We found that changing the TCP-IP offload setting in the registry fixed this. By default this setting is disabled but after enabling it, we got great network performance in the virtual machines.
Another issue we had is that making a sysprep image of a Windows Server 2008 server with Hyper-V installed is not supported. First, we worked around it by using a bcdedit command. With this command (bcdedit /set hypervisorlaunchtype auto ) you force hypervisor to launch automatically after the mini-setup of a sysprepped image. This method works, but is not a documented feature by Microsoft. Although this worked, we currently deploy the host OS of the appliance with an unattended installation which includes Hyper-V. We made this design change for more flexibility options: it reduces complexity at the time we produce more appliance versions. This change in architecture is the result of us working together with Microsoft. We now have what we call an ‘imaging factory’ for our appliances. It is a fully automatic unattended installation of the host image of an appliance. An important part of our competitive edge.
Erik and Arjan, thanks for your time and your clear answers.
In the next episode I will have an interview with Menno who is responsible for the Web interface and the web services build on Windows Workflow Foundation.
For more information check out their website on http://www.StartReady.com