System Administrators can be the worst kind of users on your system
The ideal computer system is a system that doesn't have any users on it, or isn't related to any user action. Why? Well, as long as users can't access the computer system or users can't create a load on a system, the system will run smoothly. But this isn't reality. Without users, there wouldn't be any computer systems. And without users, there wouldn't be system administrators. Although users can be idiots and amaze you of all the stupid things they do, it's the collegue system administrators you'll have to really watch out for, because they have root authority and can mess up this quite badly.
The people that install a system, should be responsible for the system
If you install a system, and know you'll have to manage it yourself once it's in production, you'll make sure the system in configured correctly. I've seen it many times: the people responsible for installing the system, aren't responsible for the maintenance of the system. This creates a "throw-it-over-the-fence" effect: people installing a system really don't have a clue what kind of administration nightmares they've created and all the problems administrators run into during the most horrible hours of the day (usually Murphy preferres Sunday night, or when you're about to go to sleep). Make absolutely sure that once a system is installed, the same people have to manage it, at least during the first two months of production.
Poorly designed systems are difficult to administer
Take your time designing a system and learn about a specific application before implementing it. Rapid designs under high time-pressure usually end up causing lots of problems during production (and they give your administrator(s) a long-lasting headache). Make sure the documentation of a system is made, before going into production. And, as an administrator, perform a health check for accepting to manage the newly installed system. Also, keep the previous best-practice tip in mind: if you haven't installed the system, you have no clue what you're getting into. And finally: during the implementation phase of a project, as a system administrator, you should be involved in the project, to be able to help design, install, configure and document the system.
Don't combine different information systems on one server
Sharing one operating system image by different information systems usually leads to problems, as these different information systems conflict in tuning parameters, backup windows, downtime slots, user information, peak usage, etcetera. Systems that do different tasks should be separated from each other to avoid dependencies. A grouping of information systems can be made by use, for example put databases together on one server. Or you can group them by product, for example put a single product and its database on one box. But NEVER, EVER put software from different vendors on 1 system!
Naming conventions should be easy
When choosing names for your hosts, printers, users etcetera, keep a few simple rules in mind:
- Choose names that are easy to remember and are not too long (8 characters max).
- Choose names NOT related to any department or other part of your organisation. Departments keep changing over time; by not naming your systems after departments, it will save you lots of time.
- Choose names NOT related to any location. Locations also change frequently, when assets are moved around.
- Never EVER reuse a name. Choose new names for new assets or users. When migrating from one host to another, don't use the same hostname, choose a new one. Rather use DNS aliases when you wish to keep a hostname. Don't configure hostname aliases on a network interface to keep using the old hostname. And for users and groups, always use a new UID and GID.
- When a system has more than 1 network interface, choose hostnames related to each other: Service address: rembrandt; Boot-address: rembrandt_boot; Standby-address: rembrandt_standby; etcetera.
- Choose a standard naming convention; don't change your naming convention.
- Never use a hostname twice, even though they are in seperate networks.
Never enough disk space
Never give your users too much disk space at once. Users will always find a way to fill up all disk space. Give them small amounts at a time. Encourage your users to clean up their directories, before requesting new disk space. This will save you time, disk space, backup throughput and money.
Temporary space (like /tmp) is TEMPORARY. Make sure your users know that. Clean out temporary space every night and be ruthless about it! Applications can NEVER use temporary space in /tmp; applications should use separate file systems for storing temporary files.
Put your static application data in a file system, SEPERATED from the changing data of an application. Usually every application should use 2 file systems at least: 1 for the application binaries and 1 for the data (and also log files). This will make sure your application file system will never run full.
Versioning
Keep the least amount of versions of applications or operating system levels on your systems. The lesser the number of versions, the less you have to manage and the easier it becomes. Try to standardize on a small amount of versions.
Only use supported versions of applications and operating systems. Check regularly which versions are supported. Upgrade in time, but not too fast! Usually a N-1 best practice should be used (always stay one version behind the released levels from vendors). Don't try to use the newest versions, as these versions usually suffer from all kinds of defects, yet to be discovered. This applies to application software, OS levels, service packs, firmware levels, etcetera.
Know what you're doing!
If you don't know what your doing EXACTLY, just don't do it.
- Get educated. Take time away from your boss for training. Being away from your work will make sure, you won't be disturbed all the time during your studies. Switch your mobile off and tell everybody you're not available.
- Take at least 2 courses a year with at least 40 hours of study each. An employer not wishing to pay for studies is not a good employer. IT is a fast-changing arena, so you have to keep up.
- Take courses related to your work. Don't bother taking courses, where you learn someting, you'll never use.
- Before going on a course, read about the subject. Learning is a lot more easier if you know already something about it.
- Make sure you have all the prerequisites when taking a course.
- After a course, actually use your new-gained knowledge.
- Also, get certified. Good for your career, but also good for the understanding of a subject. Certification requires you to actually use a certain product for an extended period of time thoroughly and also requires you to read books or get training on the subject.
- Don't do good-luck-certifications. Do your learning. Doing a certification 3 times on a single day, just to get certified, won't give you the needed knowledge.
- At least 2 certifications a year!
- Get a test system and try out your knowledge. Don't use production systems for testing. Also, make sure the test system isn't on the same network as your production system.
- Write down what your doing. It is always easier to look something up again instead of guessing what you did.
- Create procedures and stick to procedures. Keep procedures short!
Keep it Simple
We have difficulty understanding systems as they become more complex. Complexity leads to more errors and greater maintenance effort. We want systems to be more understandable, more maintainable, more flexible, and less error-prone. System design is not a haphazard process. There are many factors to consider in any design effort. All design should be as simple as possible, but no simpler. This facilitates having a more easily understood, and easily maintained system. This is not to say that features, even internal features, should be discarded in the name of simplicity. Indeed, the more elegant designs are usually the more simple ones. Simple also does not mean "quick and dirty." In fact, it often takes a lot of thought and work over multiple iterations to simplify. Result: Systems that are more maintainable, understandable, and less error-prone.
Manage the information system as a whole
Don't just administer the operating system and its supporting hardware. The OS is always used to provide some kind of basis for an information system. You need to know the complete picture of the information system itself; its parts, the interfaces, etcetera, to understand the role of the OS in it. System Administration is a lot easier, knowing what the information system is used for.
Therefore, manage a system from the users point of view. How will it affect the users if you change anything on the OS or on the underlying hardware level?
Backup, backup and backup!
Before doing anything on your system, make ABSOLUTELY sure you have a full, working backup of your system! Check this over and over again. A system should be backed up once every day. Determine what you should do when a backup fails. Determine how you should restore your system. Document your backup and restore procedures. And ofcourse, test it at regular intervals, by restoring a backup on a separate system.
Last but not least
Did you know that companies are spending roughly 70 to 90 percent of their complete IT budgets on maintaining their systems? Knowing this, it's a huge responsibility to maintain the systems in the best possible manner!
If you found this useful, here's more on the same topic(s) in our blog:
- Cloning a system using mksysb
- Alternate disk install
- Creating a CSV file from NMON data
- Method error when running cfgmgr
- VGDA out of sync
Interested in learning more?