Difference between revisions of "Historical Hera Config"

From CDOT Wiki
Jump to: navigation, search
(Details: added high latency task)
m (Mozilla@Seneca Cluster Administration moved to Historical Hera Config: This Hera Administration wiki is being archived.)
 
(47 intermediate revisions by 2 users not shown)
Line 1: Line 1:
== Mozilla@Seneca Cluster Administration ==
 
 
 
== Description ==
 
== Description ==
  
Line 7: Line 5:
 
== Leader(s) ==
 
== Leader(s) ==
  
[[User:bhearsum|Ben Hearsum]] (bhearsum)
+
* [[User:bhearsum|Ben Hearsum]] (bhearsum)
 +
 
 +
== News ==
 +
 
 +
[[User:Bhearsum|bhearsum]] 13:16, 15 November 2006 (EST)
 +
:* A redid the Ubuntu server generic image as an edgy install. Along with this I made notes about how to make a generic install. I will be doing an Ubuntu Desktop install shortly, but I'm waiting on Justin's stuff re: VNC.
 +
 
 +
[[User:Bhearsum|bhearsum]] 12:40, 24 November 2006 (EST)
 +
:* The cluster was down for maintenance recently. Some network upgrades were done at that time. Nobody has felt any lag since this happened. The cluster lag is *tentatively* solved.
 +
 
 +
[[User:Bhearsum|bhearsum]] 12:16, 13 December 2006 (EST)
 +
:* I'm playing with Xen today. I just rebooted linux15 with a Xen kernel. I hope the machine comes back up!
 +
 
 +
[[User:Bhearsum|bhearsum]] 12:16, 14 December 2006 (EST)
 +
:* I broke linux15!
 +
:* Also, I looked into using Buildbot Locks for the Places installation but the Builder's are not smart enough to build on the idle machine so that's a dead end.
 +
:* I'm still trying to figure out why warnOnFailure and warnOnWarnings do not work. This is important so that we can report "testfailed" to the Tinderbox instead of "busted".
 +
:* I added the [[Mozilla@Seneca Cluster Administration/Steps for Creating a Generic VM|Steps for Creating a Generic VM]] section
 +
 
 +
[[User:David.humphrey|dave]] 4:24, 21 January 2008 (EST)
 +
:* Planning for a reconfig of cluster to buildbot try.  Setup doc is [[Hera Try Server Setup|here]].
 +
 
 +
== Roadmap ==
 +
 
 +
=== <u>Short Term</u> ===
 +
 
 +
==== Real World IPs ====
 +
This is a high priority item that we do not have direct control over. The people in charge are working to make this happen and hopefully it will happen soon.
 +
 
 +
When we do get them a few things must happen:
 +
* Firewalls must be put into place on '''all''' physical machines. It will be extremely important to lock them down tightly.
 +
* Port forwards from physical machines to VMs must be done.
 +
* The [[Mozilla@Seneca Cluster Administration/Forwarded Ports|Forwarded Ports]] must be updated with the new port numbers.
 +
* We need to find out if ACS needs to be informed when new port forwards are done.
 +
 
 +
==== Windows and Linux Generic VMs ====
 +
As the cluster usage becomes higher and higher it is important to have easy and quick ways to bring up new machines. Generic Ubuntu images have been made already but the quality of them is currently unknown. Some testing should be done to make sure they are stable and complete. I would like to have this done before the start of the next Winter 2007 semester as there will probably be an influx of requests for VMs by new DPS909 students.
 +
 
 +
When the images are created the steps for creating a VM and the steps for bringing up a new VM should be documented. This is listed in the table in the [[#Details|project details]] section.
 +
 
 +
==== VNC access to Linux machines ====
 +
VNC normally runs on a single port running a single X session. This makes it very tough to have multiple users connected at the same time -- or even multiple users using different accounts. This is an issue that we want solved. When someone connects on VNC it should be the same as sitting down at the desktop. GDM should allow a login to that users' desktop. When a second user connects they should see the same thing. This may have to be done through some XDMCP but it '''must''' but compatible with Windows and Linux VNC clients.
  
== Contributor(s) ==
+
==== Look into Networking Management Systems ====
 +
As the cluster is used more and more it will become more difficult to manage by hand. I'd like a network management system such as Nagios to be used to help speed up maintenance.
  
None as of yet.
+
 
 +
=== <u>Medium Term</u> ===
 +
 
 +
==== Implement Network Management System ====
 +
Implement one of the systems described above.
 +
 
 +
 
 +
=== <u>Ongoing</u> ===
 +
 
 +
==== Documentation ====
 +
Documentation is extremely important in this project. Because many people have administrative access to the cluster machines soft copy documentation is necessary. Anyone who needs to know something about the cluster '''should''' be able to find it within the documentation.
 +
 
 +
So far, there are three pages (so far) that must be kept up-to-date: [[Mozilla@Seneca Cluster Administration/Forwarded Ports|Forwarded Ports]], [[Mozilla@Seneca Cluster Administration/List of Cluster Users|List of Cluster Users]], and [[Mozilla@Seneca Cluster Administration/Cluster Machine Tasks|Cluster Machine Tasks]].
 +
 
 +
* Any time that a new port is forwarded the [[Mozilla@Seneca Cluster Administration/Forwarded Ports|Forwarded Ports]] must be updated. This page is mainly used as a reference for users in case they forget what port to connect on for ssh, http, etc.
 +
* Any time a new user is given access to a cluster machine or a VM running on a cluster machine the [[Mozilla@Seneca Cluster Administration/List of Cluster Users|List of Cluster Users]] must be updated. This is just a quick reference page.
 +
* [[Mozilla@Seneca Cluster Administration/Cluster Machine Tasks|Cluster Machine Tasks]] is the most important of all three. It tracks exactly who has access to what machine or VM, what servers are running on them, and in the case of VMs their IP address.
 +
 
 +
This is something that '''everyone''' needs to keep in mind when working on the cluster, not just the administrators.
 +
 
 +
==== High Latency on the Cluster ====
 +
The high latency problem on the cluster seems to be solved but we need to keep an eye on it. If this issues crop up again they should be recognized and reported to ACS very quickly.
  
 
== Details ==
 
== Details ==
Line 19: Line 80:
 
<th style="width: 10%>Priority</th>
 
<th style="width: 10%>Priority</th>
 
<th style="width: 30%">Status</th>
 
<th style="width: 30%">Status</th>
<tr>
 
<td>Test a [[Extending the Buildbot|Buildbot]] TryScheduler with MailNotifier.</td>
 
<td style="background-color: yellow; text-align: center; font-weight: bold; font-size: bigger">Low</td>
 
<td>Getting errors from CVS. Need to e-mail the mailing list again.</td>
 
</tr>
 
 
<tr>
 
<tr>
 
<td>Document all [[List of Cluster Machines|cluster machine tasks]].</td>
 
<td>Document all [[List of Cluster Machines|cluster machine tasks]].</td>
 
<td style="background-color: red; text-align: center; font-weight: bold; font-size: bigger">High</td>
 
<td style="background-color: red; text-align: center; font-weight: bold; font-size: bigger">High</td>
 
<td>Ongoing</td>
 
<td>Ongoing</td>
</tr>
 
<tr>
 
<td><strike>Setup Win2k3 VM for Liz</strike></td>
 
<td style="background-color: green; text-align: center; font-weight: bold; font-size: bigger">Done</td>
 
<td>'''Complete'''</td>
 
 
</tr>
 
</tr>
 
<tr>
 
<tr>
Line 44: Line 95:
 
</tr>
 
</tr>
 
<tr>
 
<tr>
<td><strike>Setup Linux VM for [[Airbag development and server operation|Airbag]] project.</strike></td>
+
<td>Real world IPs for all machines</td>
<td style="background-color: green; text-align: center; font-weight: bold; font-size: bigger">Done</td>
+
<td style="background-color: red; text-align: center; font-weight: bold; font-size: bigger">High</td>
<td>'''Complete'''</td>
+
<td>Waiting on ITT</td>
 
</tr>
 
</tr>
 
<tr>
 
<tr>
<td><strike>Setup NFS server on linux9</strike></td>
+
<td>Investigate high latency on the cluster</td>
<td style="background-color: green; text-align: center; font-weight: bold; font-size: bigger">Done</td>
+
<td style="background-color: yellow; text-align: center; font-weight: bold; font-size: bigger">Low</td>
<td>'''Complete'''</td>
+
<td>ACS did network upgrades on the cluster. This is tentatively solved.</td>
 
</tr>
 
</tr>
 
<tr>
 
<tr>
<td><strike>Setup SMB server on linux9</strike></td>
+
<td>Create clean images of commonly used machines. Ubuntu, Windows 2003, others?</td>
<td style="background-color: green; text-align: center; font-weight: bold; font-size: bigger">Done</td>
+
<td style="background-color: #ff671e; text-align: center; font-weight: bold; font-size: bigger">Medium</td>
<td>'''Complete'''</td>
+
<td>Ubuntu images done. Not sure how to do Windows images yet because of product key issues.</td>
 
</tr>
 
</tr>
 
<tr>
 
<tr>
<td>Real world IPs for all machines</td>
+
<td>Document the steps involved in making a generic VM. Pre-installed software should be included in this.</td>
<td style="background-color: red; text-align: center; font-weight: bold; font-size: bigger">High</td>
+
<td style="background-color: green; text-align: center; font-weight: bold; font-size: bigger">Done</td>
<td>Waiting on ITT</td>
+
<td></td>
 
</tr>
 
</tr>
 
<tr>
 
<tr>
<td><strike>Setup VM for CVS/bonsai mirror</strike></td>
+
<td>Document the steps involved in bringing up a new VM. This should cover configuration changes as well as who needs to be notified. Is there a list of things that ACS needs yet?</td>
<td style="background-color: green; text-align: center; font-weight: bold; font-size: bigger">High</td>
+
<td style="background-color: #ff671e; text-align: center; font-weight: bold; font-size: bigger">Medium</td>
<td>'''Complete'''</td>
+
<td></td>
</tr>
 
<tr>
 
<td>Investigate high latency on the cluster</td>
 
<td style="background-color: red; text-align: center; font-weight: bold; font-size: bigger">High</td>
 
<td>Still not sure why this is happening.</td>
 
 
</tr>
 
</tr>
 
<tr>
 
<tr>
<td>Create clean images of commonly used machines. Ubuntu, Windows 2003, others?</td>
+
<td>Find a better way to give VNC access to VMs. Ideally I want GDM running on a vnc port.</td>
 
<td style="background-color: #ff671e; text-align: center; font-weight: bold; font-size: bigger">Medium</td>
 
<td style="background-color: #ff671e; text-align: center; font-weight: bold; font-size: bigger">Medium</td>
<td>Ubuntu images done. Not sure how to do Windows images yet because of product key issues.</td>
+
<td>Justin is currently looking into this.</td>
 
</tr>
 
</tr>
 
</table>
 
</table>
  
== News ==
+
== Documentation ==
 
 
Sept. 25, 2006
 
:* TryScheduler is setup, need to test it.
 
 
 
Sept. 26, 2006
 
:* Big Errors when attempting to use 'buildbot try'. Posted to the mailing list about it.
 
 
 
Sept. 28, 2006
 
* Win2 was imaged and win4 and win5 brought online.
 
:* linux12 - linux15 were brought online.
 
 
 
Sept. 29, 2006
 
:* Buildbot maintainer replied to my mailing list post. What he says looks promising, I will be testing it out this weekend.
 
 
 
Sept. 30, 2006
 
:* Fixed the first error with 'buildbot try'. Getting a new problem related to 'cvs diff', need to post to the mailing list again.
 
 
 
22:43, 1 October 2006 (EDT)
 
:* New problem can be fixed by adding '-N' flag. Need to post to mailing list and see if this can be changed in trunk.
 
 
 
13:23, 5 October 2006 (EDT)
 
:* Port forwards to the VMs were enabled. I spent a few hours figuring out why I couldn't connect to the Windows 2003 server with RDP, turns out the encryption level was too high.
 
:* [[User:shaver|shaver]] would like to get a list of things we need to do before bringing new VMs online. I think this is a good idea. It would be unfortunate if projects were held up because of politics.
 
 
 
15:00, 6 October 2006 (EDT)
 
:* Ubuntu Desktop and Ubuntu Server generic images are now available in /usr2/Generic VMs. Windows 2003 to come.
 
  
== Related Links ==
+
* [[Mozilla@Seneca Cluster Administration/Cluster Machine Tasks|Cluster Machine Tasks]]
 +
* [[Mozilla@Seneca Cluster Administration/Forwarded Ports|Forwarded Ports]]
 +
* [[Mozilla@Seneca Cluster Administration/List of Cluster Users|List of Cluster Users]]
 +
* [[Mozilla@Seneca Cluster Administration/Steps for Creating a Generic VM|Steps for Creating a Generic VM]]
  
[[List of Cluster Machines]]
+
== Misc. ==
 +
* [[Mozilla@Seneca Cluster Administration/Archive|Archive]]

Latest revision as of 15:58, 1 February 2008

Description

This project page is for all system and network administration tasks on the Mozilla cluster at Seneca.

Leader(s)

News

bhearsum 13:16, 15 November 2006 (EST)

  • A redid the Ubuntu server generic image as an edgy install. Along with this I made notes about how to make a generic install. I will be doing an Ubuntu Desktop install shortly, but I'm waiting on Justin's stuff re: VNC.

bhearsum 12:40, 24 November 2006 (EST)

  • The cluster was down for maintenance recently. Some network upgrades were done at that time. Nobody has felt any lag since this happened. The cluster lag is *tentatively* solved.

bhearsum 12:16, 13 December 2006 (EST)

  • I'm playing with Xen today. I just rebooted linux15 with a Xen kernel. I hope the machine comes back up!

bhearsum 12:16, 14 December 2006 (EST)

  • I broke linux15!
  • Also, I looked into using Buildbot Locks for the Places installation but the Builder's are not smart enough to build on the idle machine so that's a dead end.
  • I'm still trying to figure out why warnOnFailure and warnOnWarnings do not work. This is important so that we can report "testfailed" to the Tinderbox instead of "busted".
  • I added the Steps for Creating a Generic VM section

dave 4:24, 21 January 2008 (EST)

  • Planning for a reconfig of cluster to buildbot try. Setup doc is here.

Roadmap

Short Term

Real World IPs

This is a high priority item that we do not have direct control over. The people in charge are working to make this happen and hopefully it will happen soon.

When we do get them a few things must happen:

  • Firewalls must be put into place on all physical machines. It will be extremely important to lock them down tightly.
  • Port forwards from physical machines to VMs must be done.
  • The Forwarded Ports must be updated with the new port numbers.
  • We need to find out if ACS needs to be informed when new port forwards are done.

Windows and Linux Generic VMs

As the cluster usage becomes higher and higher it is important to have easy and quick ways to bring up new machines. Generic Ubuntu images have been made already but the quality of them is currently unknown. Some testing should be done to make sure they are stable and complete. I would like to have this done before the start of the next Winter 2007 semester as there will probably be an influx of requests for VMs by new DPS909 students.

When the images are created the steps for creating a VM and the steps for bringing up a new VM should be documented. This is listed in the table in the project details section.

VNC access to Linux machines

VNC normally runs on a single port running a single X session. This makes it very tough to have multiple users connected at the same time -- or even multiple users using different accounts. This is an issue that we want solved. When someone connects on VNC it should be the same as sitting down at the desktop. GDM should allow a login to that users' desktop. When a second user connects they should see the same thing. This may have to be done through some XDMCP but it must but compatible with Windows and Linux VNC clients.

Look into Networking Management Systems

As the cluster is used more and more it will become more difficult to manage by hand. I'd like a network management system such as Nagios to be used to help speed up maintenance.


Medium Term

Implement Network Management System

Implement one of the systems described above.


Ongoing

Documentation

Documentation is extremely important in this project. Because many people have administrative access to the cluster machines soft copy documentation is necessary. Anyone who needs to know something about the cluster should be able to find it within the documentation.

So far, there are three pages (so far) that must be kept up-to-date: Forwarded Ports, List of Cluster Users, and Cluster Machine Tasks.

  • Any time that a new port is forwarded the Forwarded Ports must be updated. This page is mainly used as a reference for users in case they forget what port to connect on for ssh, http, etc.
  • Any time a new user is given access to a cluster machine or a VM running on a cluster machine the List of Cluster Users must be updated. This is just a quick reference page.
  • Cluster Machine Tasks is the most important of all three. It tracks exactly who has access to what machine or VM, what servers are running on them, and in the case of VMs their IP address.

This is something that everyone needs to keep in mind when working on the cluster, not just the administrators.

High Latency on the Cluster

The high latency problem on the cluster seems to be solved but we need to keep an eye on it. If this issues crop up again they should be recognized and reported to ACS very quickly.

Details

Task Priority Status
Document all cluster machine tasks. High Ongoing
Create new windows images. Low Waiting for enough things to add to make it worthwhile. So far the following are needed:
  • autoconf
  • ssh
  • wget
Real world IPs for all machines High Waiting on ITT
Investigate high latency on the cluster Low ACS did network upgrades on the cluster. This is tentatively solved.
Create clean images of commonly used machines. Ubuntu, Windows 2003, others? Medium Ubuntu images done. Not sure how to do Windows images yet because of product key issues.
Document the steps involved in making a generic VM. Pre-installed software should be included in this. Done
Document the steps involved in bringing up a new VM. This should cover configuration changes as well as who needs to be notified. Is there a list of things that ACS needs yet? Medium
Find a better way to give VNC access to VMs. Ideally I want GDM running on a vnc port. Medium Justin is currently looking into this.

Documentation

Misc.