Discussion:
[lopsa-tech] Swap sizing in Linux HPC cluster nodes.
(too old to reply)
Matthias Birkner
2009-09-04 17:50:20 UTC
Permalink
At $work we've been having a discussion about what the right amount of swap is for a given amount of RAM in our standard linux image and I'm looking for additional input.

The "old school" conventional wisdom says "swap = 2x RAM". The more modern conventional wisdom seems to vary from "swap = 1x RAM + 4G" to "swap = 4G regardless of RAM".


So if you're running/managing a Linux HPC cluster, or you have strong opinions on the subject, or you just want to comment :), I love to hear you're thoughts.

Some info about our environment... We have several HPC clusters scattered around the globe with anywhere
from 100 to somewhat over 1000 systems in each cluster. Workload in
the clusters is managed using LSF and typically they are configured to
have one job-slot per cpu. The memory configs in each system ranges
from 4G RAM up to 512G. Not sure if the OS version matters but in case
it does, we're primarily running RHEL4u5 and starting a migration to
RHEL5u3.

Thanks much,
Matt

===========================================================
"If they are the pillars of our community,
We better keep a sharp eye on the roof."
===========================================================
Doug Hughes
2009-09-04 18:01:11 UTC
Permalink
Post by Matthias Birkner
At $work we've been having a discussion about what the right amount of swap is for a given amount of RAM in our standard linux image and I'm looking for additional input.
The "old school" conventional wisdom says "swap = 2x RAM". The more modern conventional wisdom seems to vary from "swap = 1x RAM + 4G" to "swap = 4G regardless of RAM".
So if you're running/managing a Linux HPC cluster, or you have strong opinions on the subject, or you just want to comment :), I love to hear you're thoughts.
Some info about our environment... We have several HPC clusters scattered around the globe with anywhere
from 100 to somewhat over 1000 systems in each cluster. Workload in
the clusters is managed using LSF and typically they are configured to
have one job-slot per cpu. The memory configs in each system ranges
from 4G RAM up to 512G. Not sure if the OS version matters but in case
it does, we're primarily running RHEL4u5 and starting a migration to
RHEL5u3.
If HPC speed is important, and it sounds like it is, you never want to
swap because it will kill you. So, to some extent, it doesn't matter,
it's kind of like asking what color the deck chairs should be on the
titanic; you know it's going to sink, so whether they are blue or green
seems picayune.

That said, we use swap = mem. (we have way more disk space on the
cluster nodes than will reasonably be used, so making it larger doesn't
matter either)
Yves Dorfsman
2009-09-04 18:40:24 UTC
Permalink
Post by Doug Hughes
Post by Matthias Birkner
At $work we've been having a discussion about what the right amount of swap is for a given amount of RAM in our standard linux image and I'm looking for additional input.
The "old school" conventional wisdom says "swap = 2x RAM". The more modern conventional wisdom seems to vary from "swap = 1x RAM + 4G" to "swap = 4G regardless of RAM".
So if you're running/managing a Linux HPC cluster, or you have strong opinions on the subject, or you just want to comment :), I love to hear you're thoughts.
Some info about our environment... We have several HPC clusters scattered around the globe with anywhere
from 100 to somewhat over 1000 systems in each cluster. Workload in
the clusters is managed using LSF and typically they are configured to
have one job-slot per cpu. The memory configs in each system ranges
from 4G RAM up to 512G. Not sure if the OS version matters but in case
it does, we're primarily running RHEL4u5 and starting a migration to
RHEL5u3.
If HPC speed is important, and it sounds like it is, you never want to
swap because it will kill you. So, to some extent, it doesn't matter,
it's kind of like asking what color the deck chairs should be on the
titanic; you know it's going to sink, so whether they are blue or green
seems picayune.
Yes ! I used to spend hours justifying to other sysadmins why I did not
believe in the 2 x amount of memory, and why I'd run my server with very
little swap. When disk, and memory, was expensive, I would work out how much
memory was needed by the server (which typically would be as much as we can
afford !), and then add some swap in case a process went wrong in order to
be able to login at the console (although if something goes really wrong, it
will up all available swap anyway).

This is true for a server that runs a specific workload. For "generic"
server, the only valid approach I have found is "monitor and adapt". The
only place I have found where a rule make sense so far, is the case of
laptop that can hybernate (swap has to be >= ram).
--
Yves.
http://www.sollers.ca/
Jonathan Billings
2009-09-04 18:09:20 UTC
Permalink
Post by Matthias Birkner
The "old school" conventional wisdom says "swap = 2x RAM". The more
modern conventional wisdom seems to vary from "swap = 1x RAM + 4G"
to "swap = 4G regardless of RAM".
So if you're running/managing a Linux HPC cluster, or you have
strong opinions on the subject, or you just want to comment :), I
love to hear you're thoughts.
For most HPC needs, you almost never want the HPC jobs to use swap for
active processes. Typically, if swap is being used, that means
Something is Seriously Wrong.

You probably want enough for OS and non-job processes to be swapped if
needed, but not enough that large-memory jobs start using it. You
probably don't even need swap for the systems you describe.

Also, for linux memory management, you might want to experiment with
the settings for the sysctl vm.overcommit_memory and
vm.overcommit_ratio, since that controls how the memory manager
defines how much memory is available, related to the amount of swap.
--
Jonathan Billings <***@negate.org>
Yves Dorfsman
2009-09-04 18:42:41 UTC
Permalink
Post by Jonathan Billings
Also, for linux memory management, you might want to experiment with
the settings for the sysctl vm.overcommit_memory and
vm.overcommit_ratio, since that controls how the memory manager
defines how much memory is available, related to the amount of swap.
And the dreaded swappiness (swappiMess ?) !
--
Yves.
http://www.sollers.ca/
d***@lang.hm
2009-09-04 19:27:23 UTC
Permalink
Post by Matthias Birkner
At $work we've been having a discussion about what the right amount of
swap is for a given amount of RAM in our standard linux image and I'm
looking for additional input.
The "old school" conventional wisdom says "swap = 2x RAM". The more
modern conventional wisdom seems to vary from "swap = 1x RAM + 4G" to
"swap = 4G regardless of RAM".
in part this depends heavily on the virtual memory design of the *nix
system that you are using.

some systems allocate a page in swap for every virtual address you ever
can use (including the ones that you have real memory for), and for those
your total memory address space available to your kernel is equal to your
swap size (even if it's less than your ram size), so for those, ram+
sizing is required (if you had 1G of ram and 512M of swap you would only
ever be able to use 512M of ram)

other systems use pages of swap in addition to pages of memory, and so
your total address space is swap + ra

the current linux VM system is in the second category, so you only need as
much swap as you want to allow the system to use.

since swap is _extrememly_ expensive to use, you don't actually want to
use much, if any in a HPC cluster.

HOWEVER, there is the issue of memory overcommit and how you choose to
deal with it.

Linux frequently uses a feature called 'Copy On Write' (COW) where instead
of copying a page of memory it instead marks it read-only and COW, and
allows multiple processes to still access this page. if any of the
processes try to make a change, it triggers a write error that then copies
the page and life continues.

this is a HUGE win for almost all systems. for example, if you are running
firefox and it is using 1.5G of ram, you click on a pdf file. firefox
downloads the file then starts your pdf reader, to do this it first forks
a copy of itself, and then executes the pdf reader. between the time that
it does the fork and makes the exec call to start the pdf reader, you
technically have two identical copies of firefox in ram, each needing 1.5G
of ram. with COW you end up only useing a few K of ram for this process
instead of having to really allocate and copy the 1.5G of ram.

because of this feature, you can have a lot more address space in use than
you actually have memory for (witht he firefox example, with COW you can
do the example above with 2G of ram, without COW you would need 3.5G of
ram + swap). the bad thing is that this can trigger additional memory
getting used long after the malloc call has completed sucessfully. that
additional memory use could push you into swap or run you out of memory
entirely.

by default linux allows unlimited overcommit, and if you actually run out
of memory it triggers the Out Of Memory process (OOM), which tries to
figure out what to kill to try and keep the system running (and as with
any heristics, sometimes it works, sometimes it doesn't)

you can change this default to disable overcommit. in which case if you do
not have the address space available to fully support all possible COW
splits. if you don't have enough swap allocated to support the possible
COW splits, the system will reject the malloc, EVEN IF YOU HAVE UNUSED
RAM.

so you need to either allow unlimited overcommit (which can kill your
system at unexpected times when your run out of ram), or you need to
disable overcommit and have 'enough' swap (which can run the risk of
running you into swap and bringing your system to a crawl)

personally, I choose to leave overcommit on, and have a small amount of
swap, no matter how much ram I have.

for historical reasons (since it used to be the limit on swap partition
size), I have fallen in the habit of creating a 2G swap partition on all
my systems. If I was going to change it I would probably shrink it down
(by the time a system is using 512M to 1G of swap, it's probably slowed to
unusuable levels anyway and so I would just as soon have the system crash
so that my clustering HA solution can kick in instead)

David Lang
Post by Matthias Birkner
So if you're running/managing a Linux HPC cluster, or you have strong opinions on the subject, or you just want to comment :), I love to hear you're thoughts.
Some info about our environment... We have several HPC clusters scattered around the globe with anywhere
from 100 to somewhat over 1000 systems in each cluster. Workload in
the clusters is managed using LSF and typically they are configured to
have one job-slot per cpu. The memory configs in each system ranges
from 4G RAM up to 512G. Not sure if the OS version matters but in case
it does, we're primarily running RHEL4u5 and starting a migration to
RHEL5u3.
Thanks much,
Matt
===========================================================
"If they are the pillars of our community,
We better keep a sharp eye on the roof."
===========================================================
_______________________________________________
Tech mailing list
http://lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
http://lopsa.org/
Edward Ned Harvey
2009-09-05 00:49:19 UTC
Permalink
Post by d***@lang.hm
since swap is _extrememly_ expensive to use, you don't actually want to
use much, if any in a HPC cluster.
I know this seems counterintuitive - but - I have experience to the
contrary. In traditional thinking, of course, swap is slower so you don't
want to use it, but in modern thinking, having swap available boosts your
system performance because the system can trade swap for cache.

Here's the reasoning:

At all times, the kernel will grow to the maximum available ram
(buffering/caching disk reads/writes). So obviously the more memory
available, the better, and the less required in user space the better...
but ... This means at all times the kernel is choosing which disk blocks to
keep in cache, as user processes grow, whatever is deemed to be the least
valuable cached disk block is dropped out of ram.

If you have plenty of swap available, it gives the kernel another degree of
freedom to work with. The kernel now has the option available to page out
some idle process that it deems to be less valuable than the cached disk
blocks.

If you run "free" or "top" on your system (assuming linux)... Soon after
booting, you'll see lots of free memory. But if your system has been up for
a week, you'll see zero free memory, and all the rest is consumed by
buffers.

During the time when there is still "free" memory available, you will get no
performance boost by having swap available (and obviously there would be no
reason to consume any swap). But after the system is up for a long time,
and the kernel has filled all the ram with buffers... Then you get a
performance boost by using swap.
Edward Ned Harvey
2009-09-05 01:37:01 UTC
Permalink
Post by Edward Ned Harvey
If you run "free" or "top" on your system (assuming linux)... Soon after
booting, you'll see lots of free memory. But if your system has been up for
a week, you'll see zero free memory, and all the rest is consumed by
buffers.
If you see the number of buffers has gone to zero - then you are pushing
your system beyond its ram capabilities, and you should either upgrade your
ram, or find a way to decrease your userspace ram usage.

If you see the number of buffers has gone to zero - and the consumed swap is
growing - then you are thrashing, and should seriously fix it. THAT will
kill you.

If you see the number of buffers is not near zero - then you know you are
not thrashing, and therefore not in the "danger" zone. Here, it is ok to
consume swap.

If you see the number of buffers is not near zero - and the swap usage is
higher than zero - then you know at some point you got a performance boost
by swapping out idle processes instead of dropping important disk blocks out
of cache.

The best way to decide how much swap you should have available is:

Take a bunch of samples on live systems. Continually monitor over time,
during normal usage. You'll see a clear pattern, similar to this: In my
systems during periods of extended idle time (vacations & holidays etc) my
swap ranges anywhere up to 1G. During normal usage, it ranges from 512M to
2G. Occasionally in normal usage, some machines will consume as high as 6G,
and the only time it has ever reached 8G or higher was when some user
accidentally had a runaway process.

And then draw your own conclusions...

Personally, I don't have real data such as the above, because disk space is
so cheap, who cares if I give it more swap than it needs. If it can't
benefit from using it, it won't use it. And as long as the buffers stays
significantly nonzero, I know I'm not thrashing, so I might as well let the
system consume all the swap it wants. It's a nice way to allow zombie
processes to sit harmlessly on a shelf.

I therefore conclude, I give all systems at least 16G of swap. If a user
has a runaway process thrashing, the system will all but halt, and I'll
simply reboot it.

The only ground you have to gain by shrinking the size of swap is: Whenever
a user process runs away, it's kind of nice for it to die on its own instead
of rebooting the system. Since all my machines where memory intensive jobs
run are intended for queued jobs only, I don't care. No users interactively
logged in = Just reboot it. It's easy.
Doug Hughes
2009-09-05 15:06:43 UTC
Permalink
Post by Edward Ned Harvey
Post by d***@lang.hm
since swap is _extrememly_ expensive to use, you don't actually want to
use much, if any in a HPC cluster.
I know this seems counterintuitive - but - I have experience to the
contrary. In traditional thinking, of course, swap is slower so you don't
want to use it, but in modern thinking, having swap available boosts your
system performance because the system can trade swap for cache.
At all times, the kernel will grow to the maximum available ram
(buffering/caching disk reads/writes). So obviously the more memory
available, the better, and the less required in user space the better...
but ... This means at all times the kernel is choosing which disk blocks to
keep in cache, as user processes grow, whatever is deemed to be the least
valuable cached disk block is dropped out of ram.
If you have plenty of swap available, it gives the kernel another degree of
freedom to work with. The kernel now has the option available to page out
some idle process that it deems to be less valuable than the cached disk
blocks.
If you run "free" or "top" on your system (assuming linux)... Soon after
booting, you'll see lots of free memory. But if your system has been up for
a week, you'll see zero free memory, and all the rest is consumed by
buffers.
During the time when there is still "free" memory available, you will get no
performance boost by having swap available (and obviously there would be no
reason to consume any swap). But after the system is up for a long time,
and the kernel has filled all the ram with buffers... Then you get a
performance boost by using swap.
This is very tricky and presumes that your local disk is faster than
your back end storage, which is not necessarily the case. A local disk
cache can be your friend or your enemy depending on your job load and
your architecture. If you have a big honking storage farm to serve your
HPC cluster with lots of memory, you can serve things at nearly wire speed.

Again, this depends upon many factors. In our HPC workload, swap is
never used except for a very rare series of jobs computing force fields
between molecules, and it's extremely painful there, so they tune their
workload very carefully.
Jack
2009-09-05 15:53:32 UTC
Permalink
I was wondering if most OSes take the opportunity when they have
'idle i/o time' to move copies of some memory blocks, like disk caches,
to swap to make the 'getting it back faster' in case it needs to use
the memory for a 'more active' use (user programs, 'hotter i/o blocks', etc).
Thus doing a 'pre-emptive strike' before being forced to swap or drop cache.
<> ... Jack
Post by Edward Ned Harvey
Post by d***@lang.hm
since swap is _extrememly_ expensive to use, you don't actually want to
use much, if any in a HPC cluster.
I know this seems counterintuitive - but - I have experience to the
contrary.  In traditional thinking, of course, swap is slower so you don't
want to use it, but in modern thinking, having swap available boosts your
system performance because the system can trade swap for cache.
At all times, the kernel will grow to the maximum available ram
(buffering/caching disk reads/writes).  So obviously the more memory
available, the better, and the less required in user space the better...
but ... This means at all times the kernel is choosing which disk blocks to
keep in cache, as user processes grow, whatever is deemed to be the least
valuable cached disk block is dropped out of ram.
If you have plenty of swap available, it gives the kernel another degree of
freedom to work with.  The kernel now has the option available to page out
some idle process that it deems to be less valuable than the cached disk
blocks.
If you run "free" or "top" on your system (assuming linux)...  Soon after
booting, you'll see lots of free memory.  But if your system has been up for
a week, you'll see zero free memory, and all the rest is consumed by
buffers.
During the time when there is still "free" memory available, you will get no
performance boost by having swap available (and obviously there would be no
reason to consume any swap).  But after the system is up for a long time,
and the kernel has filled all the ram with buffers...  Then you get a
performance boost by using swap.
This is very tricky and presumes that your local disk is faster than
your back end storage, which is not necessarily the case. A local disk
cache can be your friend or your enemy depending on your job load and
your architecture. If you have a big honking storage farm to serve your
HPC cluster with lots of memory, you can serve things at nearly wire speed.
Again, this depends upon many factors. In our HPC workload, swap is
never used except for a very rare series of jobs computing force fields
between molecules, and it's extremely painful there, so they tune their
workload very carefully.
_______________________________________________
Tech mailing list
http://lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/
d***@lang.hm
2009-09-05 18:40:17 UTC
Permalink
Post by Jack
I was wondering if most OSes take the opportunity when they have
'idle i/o time' to move copies of some memory blocks, like disk caches,
to swap to make the 'getting it back faster' in case it needs to use
the memory for a 'more active' use (user programs, 'hotter i/o blocks', etc).
Thus doing a 'pre-emptive strike' before being forced to swap or drop cache.
linux will write dirty pages of cache to disk after a little bit (the
default is something like 5 seconds) so that it can throw those pages away
without having to write to disk if it needs to later (and to get the data
onto permanent media so that it will survive a crash)

David Lang
Post by Jack
<> ... Jack
Post by Edward Ned Harvey
Post by d***@lang.hm
since swap is _extrememly_ expensive to use, you don't actually want to
use much, if any in a HPC cluster.
I know this seems counterintuitive - but - I have experience to the
contrary.  In traditional thinking, of course, swap is slower so you don't
want to use it, but in modern thinking, having swap available boosts your
system performance because the system can trade swap for cache.
At all times, the kernel will grow to the maximum available ram
(buffering/caching disk reads/writes).  So obviously the more memory
available, the better, and the less required in user space the better...
but ... This means at all times the kernel is choosing which disk blocks to
keep in cache, as user processes grow, whatever is deemed to be the least
valuable cached disk block is dropped out of ram.
If you have plenty of swap available, it gives the kernel another degree of
freedom to work with.  The kernel now has the option available to page out
some idle process that it deems to be less valuable than the cached disk
blocks.
If you run "free" or "top" on your system (assuming linux)...  Soon after
booting, you'll see lots of free memory.  But if your system has been up for
a week, you'll see zero free memory, and all the rest is consumed by
buffers.
During the time when there is still "free" memory available, you will get no
performance boost by having swap available (and obviously there would be no
reason to consume any swap).  But after the system is up for a long time,
and the kernel has filled all the ram with buffers...  Then you get a
performance boost by using swap.
This is very tricky and presumes that your local disk is faster than
your back end storage, which is not necessarily the case. A local disk
cache can be your friend or your enemy depending on your job load and
your architecture. If you have a big honking storage farm to serve your
HPC cluster with lots of memory, you can serve things at nearly wire speed.
Again, this depends upon many factors. In our HPC workload, swap is
never used except for a very rare series of jobs computing force fields
between molecules, and it's extremely painful there, so they tune their
workload very carefully.
_______________________________________________
Tech mailing list
http://lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
 http://lopsa.org/
_______________________________________________
Tech mailing list
http://lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
http://lopsa.org/
Edward Ned Harvey
2009-09-06 14:15:45 UTC
Permalink
Post by d***@lang.hm
linux will write dirty pages of cache to disk after a little bit (the
default is something like 5 seconds) so that it can throw those pages
away without having to write to disk if it needs to later (and to get
the data onto permanent media so that it will survive a crash)
I don't understand - In my understanding, a dirty page is when you're using
write-back caching, and some process told the OS "write this to the file,"
and the OS said "Ok it's done" even though the disk write hasn't occurred
yet... So why would the OS ever choose to write that to disk in the swap
space instead of just writing it to disk where it belongs? If the disk is
available to write to the swap, doesn't that imply it's available to write
to the final destination?

Perhaps not, if the destination is on separate storage. Is that the whole
point?
Edward Ned Harvey
2009-09-05 15:56:07 UTC
Permalink
Post by Doug Hughes
This is very tricky and presumes that your local disk is faster than
your back end storage, which is not necessarily the case. A local disk
cache can be your friend or your enemy depending on your job load and
your architecture. If you have a big honking storage farm to serve your
HPC cluster with lots of memory, you can serve things at nearly wire speed.
When you say "local disk" and "backend storage," I assume you're talking
about the local sas/sata disk, and the SAN storage, right? Both of these
are a couple orders of magnitude slower than the physical ram, so, if your
SAN is faster than your sas/sata disk, why not put your swap on the SAN?
Doug Hughes
2009-09-05 16:52:15 UTC
Permalink
Post by Edward Ned Harvey
Post by Doug Hughes
This is very tricky and presumes that your local disk is faster than
your back end storage, which is not necessarily the case. A local disk
cache can be your friend or your enemy depending on your job load and
your architecture. If you have a big honking storage farm to serve your
HPC cluster with lots of memory, you can serve things at nearly wire speed.
When you say "local disk" and "backend storage," I assume you're talking
about the local sas/sata disk, and the SAN storage, right? Both of these
are a couple orders of magnitude slower than the physical ram, so, if your
SAN is faster than your sas/sata disk, why not put your swap on the SAN?
Building a HPC farm of any reasonable size makes direct attach SAN to
every node a very expensive option (prohibtively so in many cases).
Currently we use NFS, though we're considering other alternatives. But,
even so, I wouldn't put swap there. For our work load, anything taking
ram out to disk is a huge loss in performance.

An example.. memory dimms that are just ever-so-slightly out of spec
will cause some jobs to run at half performance because of the other
nodes in the cluster having to wait for this node continuously to do its
calculations and answer. In fact, the node looks identical to others for
stream and qcdstream and qcdstreamV and qcdstreamV --sse --mem (though
some fail this test), but if you run a certain MPI ping pong test on all
8 cores on the machine, the maximum response times of those tests could
be 20-80% than a normal node. It's crazy stuff. hitting disk (network or
flash or memory SSD) at all in any of these sorts of compute and memory
bound applications for any reason is right out.
Skylar Thompson
2009-09-05 17:33:28 UTC
Permalink
Post by Edward Ned Harvey
Post by d***@lang.hm
since swap is _extrememly_ expensive to use, you don't actually want to
use much, if any in a HPC cluster.
I know this seems counterintuitive - but - I have experience to the
contrary. In traditional thinking, of course, swap is slower so you don't
want to use it, but in modern thinking, having swap available boosts your
system performance because the system can trade swap for cache.
At all times, the kernel will grow to the maximum available ram
(buffering/caching disk reads/writes). So obviously the more memory
available, the better, and the less required in user space the better...
but ... This means at all times the kernel is choosing which disk blocks to
keep in cache, as user processes grow, whatever is deemed to be the least
valuable cached disk block is dropped out of ram.
If you have plenty of swap available, it gives the kernel another degree of
freedom to work with. The kernel now has the option available to page out
some idle process that it deems to be less valuable than the cached disk
blocks.
If you run "free" or "top" on your system (assuming linux)... Soon after
booting, you'll see lots of free memory. But if your system has been up for
a week, you'll see zero free memory, and all the rest is consumed by
buffers.
During the time when there is still "free" memory available, you will get no
performance boost by having swap available (and obviously there would be no
reason to consume any swap). But after the system is up for a long time,
and the kernel has filled all the ram with buffers... Then you get a
performance boost by using swap.
The kernel actually can do this to a certain degree without using swap.
The text segment of a process is actually not eligible to be paged out,
since it's already on disk. The kernel will just instruct the VMM to
free up the physical addresses and point the virtual addresses at the
disk blocks in the file system itself. The kernel will only page out
addresses in the data segment of a process.
--
-- Skylar Thompson (***@cs.earlham.edu)
-- http://www.cs.earlham.edu/~skylar/
Yves Dorfsman
2009-09-06 04:45:24 UTC
Permalink
Post by Edward Ned Harvey
I know this seems counterintuitive - but - I have experience to the
contrary. In traditional thinking, of course, swap is slower so you don't
want to use it, but in modern thinking, having swap available boosts your
system performance because the system can trade swap for cache.
At all times, the kernel will grow to the maximum available ram
(buffering/caching disk reads/writes). So obviously the more memory
available, the better, and the less required in user space the better...
but ... This means at all times the kernel is choosing which disk blocks to
keep in cache, as user processes grow, whatever is deemed to be the least
valuable cached disk block is dropped out of ram.
If you have plenty of swap available, it gives the kernel another degree of
freedom to work with. The kernel now has the option available to page out
some idle process that it deems to be less valuable than the cached disk
blocks.
Right, and that's why I mentioned the swappiness mess on Linux (and I think
"maxperm" on AIX, it's been a long time).

As far as I am concerned, what you are talking about is fine on a server
that is primarily used as a file server, assuming those still exist, but if
I run any kind of application, then:

I am more than happy for the kernel to use any memory NOT used by an apps to
cache the file system I do not want, under any circumstances, the kernel to
free the application from memory at the profit of cache.


I ended learning about these memory tweak when I witnessed these behaviours:

-applications servers being slow every week, or every few weeks, for what
seems a few hours in the morning. After investigation, it seemed to
correlate with full backup, the morning after the full backup, the apps
would be slow. Killing the backup client in the morning, would not help.
Think about it, the backup reads the disk, a lot, while the applications are
not used, so the kernel frees up both apps and data from memory so it can
cache the file system really well (cache using LRU make this problem worse).

-Linux workstations with default value for swappiness: you minimize your
email client, do a bunch of work, try to bring the email client up, and, it
takes 4 or 5 seconds to come up, while you can hear the disk chugging.
Worse, you take a phone call for ten minutes, your screen saver/lock kicks
in, once you are done the machine seems frozen for a few minutes (all the
apps were idle, even without any i/o activity, the memory for the apps got
freed up).

The only case I can think of swappiness > 0 making any sense is if you start
start a lot of applications, but only use a few, and do not change from apps
to apps very often.
--
Yves.
http://www.sollers.ca/
d***@lang.hm
2009-09-06 04:59:17 UTC
Permalink
Post by Yves Dorfsman
Post by Edward Ned Harvey
I know this seems counterintuitive - but - I have experience to the
contrary. In traditional thinking, of course, swap is slower so you don't
want to use it, but in modern thinking, having swap available boosts your
system performance because the system can trade swap for cache.
At all times, the kernel will grow to the maximum available ram
(buffering/caching disk reads/writes). So obviously the more memory
available, the better, and the less required in user space the better...
but ... This means at all times the kernel is choosing which disk blocks to
keep in cache, as user processes grow, whatever is deemed to be the least
valuable cached disk block is dropped out of ram.
If you have plenty of swap available, it gives the kernel another degree of
freedom to work with. The kernel now has the option available to page out
some idle process that it deems to be less valuable than the cached disk
blocks.
Right, and that's why I mentioned the swappiness mess on Linux (and I think
"maxperm" on AIX, it's been a long time).
As far as I am concerned, what you are talking about is fine on a server
that is primarily used as a file server, assuming those still exist, but if
I am more than happy for the kernel to use any memory NOT used by an apps to
cache the file system I do not want, under any circumstances, the kernel to
free the application from memory at the profit of cache.
-applications servers being slow every week, or every few weeks, for what
seems a few hours in the morning. After investigation, it seemed to
correlate with full backup, the morning after the full backup, the apps
would be slow. Killing the backup client in the morning, would not help.
Think about it, the backup reads the disk, a lot, while the applications are
not used, so the kernel frees up both apps and data from memory so it can
cache the file system really well (cache using LRU make this problem worse).
note that there is a flag that the backup software should be using to tell
the system that it's not going to be accessing this data again.
Post by Yves Dorfsman
-Linux workstations with default value for swappiness: you minimize your
email client, do a bunch of work, try to bring the email client up, and, it
takes 4 or 5 seconds to come up, while you can hear the disk chugging.
Worse, you take a phone call for ten minutes, your screen saver/lock kicks
in, once you are done the machine seems frozen for a few minutes (all the
apps were idle, even without any i/o activity, the memory for the apps got
freed up).
what else is running on the system that is asking for memory? the kernel
won't throw away memory unless something else is asking for it.
Post by Yves Dorfsman
The only case I can think of swappiness > 0 making any sense is if you start
start a lot of applications, but only use a few, and do not change from apps
to apps very often.
I don't think 0 is the right value, but for a long time the kernel did
default to a much to high value, within the last year or so the efault was
greatly reduced.

David Lang
Yves Dorfsman
2009-09-06 16:05:48 UTC
Permalink
Post by d***@lang.hm
note that there is a flag that the backup software should be using to tell
the system that it's not going to be accessing this data again.
Which flag, on which function ?
At the end of the day, aren't all functions reading from disk mapping to an
read(3) ?
Post by d***@lang.hm
Post by Yves Dorfsman
in, once you are done the machine seems frozen for a few minutes (all the
apps were idle, even without any i/o activity, the memory for the apps got
freed up).
what else is running on the system that is asking for memory? the kernel
won't throw away memory unless something else is asking for it.
I agree with you, but my understanding is that with a high value for
swappiness the kernel will swap out processes in order to make space for the
file system cache. Look at this test:
http://lwn.net/Articles/100978/

Just doing dd's they get the vm to swap memory out, which confirms my
understanding of it.

If I am right, then to obtain the result you are talking about ("the kernel
won't throw away memory unless something else is asking for it"), you need
to set swappiness to zero.
Post by d***@lang.hm
Post by Yves Dorfsman
The only case I can think of swappiness > 0 making any sense is if you start
start a lot of applications, but only use a few, and do not change from apps
to apps very often.
I don't think 0 is the right value, but for a long time the kernel did
default to a much to high value, within the last year or so the efault
was greatly reduced.
This is the latest (2.6.31) kernel from L. Torvalds, and it still has
swappiness=60

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=mm/vmscan.c;h=94e86dd6954c295830478011fd8e71465f1a9f2d;hb=e07cccf4046978df10f2e13fe2b99b2f9b3a65db#l127

So which distribution do you use (I use Fedora 10, Ubuntu 9.10 and CentOS
5.3, they leave the default of 60) ? What value to they put by default for
swappiness ?


Anyway, I have been setting swappiness to zero by default on all the systems
I take care of for a few years, so if there is a reason why I should not,
I'd love to hear it.

Why do you set it at a value different than zero (what is the expected
outcome, how is it different than if it were set at zero) ?

If neither 0 nor 60 are the right values, what is the right value ? If it
depends on the load, how do you make an objective decision ?


Thanks.
--
Yves.
http://www.sollers.ca/
Skylar Thompson
2009-09-06 16:22:23 UTC
Permalink
Post by Yves Dorfsman
Post by d***@lang.hm
note that there is a flag that the backup software should be using to tell
the system that it's not going to be accessing this data again.
Which flag, on which function ?
At the end of the day, aren't all functions reading from disk mapping to an
read(3) ?
The only flag I can think of that would do this is O_DIRECT. I'm not
sure you'd actually want this for a backup client though, since AFAIK it
disables read-ahead as well. Without a cache there's nothing to prefetch
into.
--
-- Skylar Thompson (***@cs.earlham.edu)
-- http://www.cs.earlham.edu/~skylar/
d***@lang.hm
2009-09-06 17:32:14 UTC
Permalink
Post by Yves Dorfsman
Post by d***@lang.hm
note that there is a flag that the backup software should be using to tell
the system that it's not going to be accessing this data again.
Which flag, on which function ?
At the end of the day, aren't all functions reading from disk mapping to an
read(3) ?
sorry I can't pinpoint it more (I don't do much C progrmming nowdays), but
I have seen it mentioned on many kernel threads where people have
complained about this behavior, it's an O_ something flag. I'll do a
little digging and see if I can find it. I believe that what actually
happens is that the pages still go into the cache, but are inserted in the
other end of the (normally) LRU queue so that they are the first to be
discarded when memory is needed (including by the same process reading the
next batch of pages from disk)
Post by Yves Dorfsman
Post by d***@lang.hm
Post by Yves Dorfsman
in, once you are done the machine seems frozen for a few minutes (all the
apps were idle, even without any i/o activity, the memory for the apps got
freed up).
what else is running on the system that is asking for memory? the kernel
won't throw away memory unless something else is asking for it.
I agree with you, but my understanding is that with a high value for
swappiness the kernel will swap out processes in order to make space for the
http://lwn.net/Articles/100978/
Just doing dd's they get the vm to swap memory out, which confirms my
understanding of it.
If I am right, then to obtain the result you are talking about ("the kernel
won't throw away memory unless something else is asking for it"), you need
to set swappiness to zero.
by doing the dd you are asking for memory implicitly.

the kernel is trying to balance the need for memory to hold several
different catergories of things

1. pages of code that it can re-read from the binary on disk

2. pages of application generated data that it can swap out

3. pages of data read from disk that may be used again (disk read cache)

4. pages of data being written to disk (disk write cache, doesn't go to
swap when written, but there are knobs to adjust how quickly and how hard
the kernel works to write these pages out, after which they become clean
cache pages like the read cache)

5. pages of kernel generated data that it can swap out

_many_ programs nowdays are huge, but when people are using them
they seldom use more than a tiny fraction of the capibilities included
(and therefor seldom touch a large portion of the code). keeping all that
unused code in memory can be significant in terms of the amount of data
that can be cached that you actually use.

yes, if too much gets swapped out (and especially if your disk is
extrememly slow like a laptop), you can suffer when it gets swapped back
in, but usually you don't have to pull much in at any time.
Post by Yves Dorfsman
Post by d***@lang.hm
Post by Yves Dorfsman
The only case I can think of swappiness > 0 making any sense is if you start
start a lot of applications, but only use a few, and do not change from apps
to apps very often.
I don't think 0 is the right value, but for a long time the kernel did
default to a much to high value, within the last year or so the efault
was greatly reduced.
This is the latest (2.6.31) kernel from L. Torvalds, and it still has
swappiness=60
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=mm/vmscan.c;h=94e86dd6954c295830478011fd8e71465f1a9f2d;hb=e07cccf4046978df10f2e13fe2b99b2f9b3a65db#l127
So which distribution do you use (I use Fedora 10, Ubuntu 9.10 and CentOS
5.3, they leave the default of 60) ? What value to they put by default for
swappiness ?
I could be mixing up the swappieness value and the writeback agressivness
values. I know that one of them chanbed recently
Post by Yves Dorfsman
Anyway, I have been setting swappiness to zero by default on all the systems
I take care of for a few years, so if there is a reason why I should not,
I'd love to hear it.
Why do you set it at a value different than zero (what is the expected
outcome, how is it different than if it were set at zero) ?
If neither 0 nor 60 are the right values, what is the right value ? If it
depends on the load, how do you make an objective decision ?
since part of what is involved here is your particular workload and
prefrences (do you prefer to be faster most of the time at the cost of
occasional slowdowns, or are you willing to be a little slower all the
time, but not have the hicups) I think it's like every other tuning
parameter, there is no one right answer for everyone.

some of the issues that you have run into (the backup pushing things into
swap) can be addressed in a way that will do what you want, but for other
things it's not nearly as clear.

with the default at 60 and you setting it to 0, I owuld suggest trying it
set at 10 or 20 and see if you notice any difference. if you like the
change, keep tinkering, if you hate the change switch it back (I suspect
that going to a low value will make little noticable difference, but I
could easily be wrong)

there are some applications (like firefox) that appear to have memory
leaks in them. saying that application data should _never_ be swapped out
means that that leaked memory directly fights with disk caches. if
intstead it gets swapped out you probably never need to swap it back in
again (at least before it's time to shut down), so that would be a case
where swappiness of 0 would hurt you.

David Lang
Phil Pennock
2009-09-05 03:11:18 UTC
Permalink
Post by d***@lang.hm
for historical reasons (since it used to be the limit on swap partition
size), I have fallen in the habit of creating a 2G swap partition on all
my systems. If I was going to change it I would probably shrink it down
(by the time a system is using 512M to 1G of swap, it's probably slowed to
unusuable levels anyway and so I would just as soon have the system crash
so that my clustering HA solution can kick in instead)
While I mostly agree about the limited utility of swap, on FreeBSD I
still went (go, on my personal box) for swap >= RAM for one simple
reason: kernel crash dumps.

If you want to be able to figure out *why* a kernel has fubar'd, it's
good to be able to get a crash dump and since the swap partition is used
for writing that out, you need enough swap to hold the contents of RAM.

I've debugged a few issues this way. Given the tendency of the awkward
problems to only show up in production systems, no matter *how* good
your staging and load test environments, I'd be very loath to give it
up.

I tend to peruse Linux Weekly News to keep vaguely up-to-date on what's
going on in Linux kernel work and I understand that there's a project
working on Linux kernel crash dumps too. A search engine yielded:
http://lkcd.sourceforge.net/

So, given that you're unlikely to be using all the disk on the systems,
it might be worth creating the swap partition, even if you don't enable
it now, so that two years from now you don't need to sort out
re-partitioning across your cluster so that you can get the dumps to
debug the strange problem that keeps killing nodes.

Regards,
-Phil
d***@lang.hm
2009-09-05 03:18:50 UTC
Permalink
Post by Phil Pennock
Post by d***@lang.hm
for historical reasons (since it used to be the limit on swap partition
size), I have fallen in the habit of creating a 2G swap partition on all
my systems. If I was going to change it I would probably shrink it down
(by the time a system is using 512M to 1G of swap, it's probably slowed to
unusuable levels anyway and so I would just as soon have the system crash
so that my clustering HA solution can kick in instead)
While I mostly agree about the limited utility of swap, on FreeBSD I
still went (go, on my personal box) for swap >= RAM for one simple
reason: kernel crash dumps.
good point. another use is suspend-to-disk (on linux at least that writes
to swap unless you work hard to send it elsewhere)

under linux there are ways to send a crash dump to an unused partition
(and given that setting up the crash dump is work in the first place, it's
not much more work to send them elsewhere)
Post by Phil Pennock
If you want to be able to figure out *why* a kernel has fubar'd, it's
good to be able to get a crash dump and since the swap partition is used
for writing that out, you need enough swap to hold the contents of RAM.
I've debugged a few issues this way. Given the tendency of the awkward
problems to only show up in production systems, no matter *how* good
your staging and load test environments, I'd be very loath to give it
up.
I tend to peruse Linux Weekly News to keep vaguely up-to-date on what's
going on in Linux kernel work and I understand that there's a project
http://lkcd.sourceforge.net/
So, given that you're unlikely to be using all the disk on the systems,
it might be worth creating the swap partition, even if you don't enable
it now, so that two years from now you don't need to sort out
re-partitioning across your cluster so that you can get the dumps to
debug the strange problem that keeps killing nodes.
it all depends on what you are using for disks.

I have some systems with 144G of disk and 128G of ram. I definantly won't
be doing it there (and I won't spend the extra money on more disks just
for swap), this system uses 2.5" SAS drives, so adding more or larger
capacity drives is not that cheap.

David Lang
Luke S Crawford
2009-09-05 04:16:35 UTC
Permalink
Post by Phil Pennock
While I mostly agree about the limited utility of swap, on FreeBSD I
still went (go, on my personal box) for swap >= RAM for one simple
reason: kernel crash dumps.
The linux guys are very anti-dump. Personally, I find that 95% of the
time console output (and I think a logging serial console is essential)
gives you all the info you need anyhow, and the other 5% of the time,
I don't seem to be able to get anyone else to help me.

The current crashdump tool, kdump, lets you copy the kernel to a partition,
or even over the network to somewhere else.

But personally, I don't save dumps under Linux, just 'cause they change
crashdump utilities every two years, and because nobody seems to know how to
do anything with them.

FreeBSD is superior in this regard, getting a backtrace out of a FreeBSD
dump is trivial.
--
Luke S. Crawford
http://prgmr.com/xen/ - Hosting for the technically adept
http://nostarch.com/xen.htm - We don't assume you are stupid.
Robert Brockway
2009-09-12 04:57:10 UTC
Permalink
Post by d***@lang.hm
you can change this default to disable overcommit. in which case if you do
not have the address space available to fully support all possible COW
splits. if you don't have enough swap allocated to support the possible
Hi David. Catching up on mail from last week :)

The overcommitt accounting would be very inefficient if it worked as you
describe above (assuming I'm not misunderstanding what you've written).

The overcommit accounting rules can be found here:

http://www.mjmwired.net/kernel/Documentation/vm/overcommit-accounting

In particular non-private shared memory pages (that could be subject to
COW) count only once for purposes of overcommit.

Unlimited overcommit comes in to play for applications that ask for large
memory allocations but do not use them.

Certain DB apps are quite notable here. They ask for large memory
allocations which they will probably never use. Without unlimited
overcommit the application startup will fail even though it would run fine
were it allowed to start.

Cheers,

Rob
--
I tried to change the world but they had a no-return policy
http://www.practicalsysadmin.com
John Stoffel
2009-09-04 20:10:59 UTC
Permalink
Matthias> At $work we've been having a discussion about what the right
Matthias> amount of swap is for a given amount of RAM in our standard
Matthias> linux image and I'm looking for additional input.

Oh goody, fun topics on Friday!

Matthias> The "old school" conventional wisdom says "swap = 2x RAM".
Matthias> The more modern conventional wisdom seems to vary from "swap
Matthias> = 1x RAM + 4G" to "swap = 4G regardless of RAM".

Matthias> So if you're running/managing a Linux HPC cluster, or you
Matthias> have strong opinions on the subject, or you just want to
Matthias> comment :), I love to hear you're thoughts.

Matthias> Some info about our environment... We have several HPC
Matthias> clusters scattered around the globe with anywhere from 100
Matthias> to somewhat over 1000 systems in each cluster. Workload in
Matthias> the clusters is managed using LSF and typically they are
Matthias> configured to have one job-slot per cpu. The memory configs
Matthias> in each system ranges from 4G RAM up to 512G. Not sure if
Matthias> the OS version matters but in case it does, we're primarily
Matthias> running RHEL4u5 and starting a migration to RHEL5u3.

We're running something similiar, though smaller, 20-30 systems upto
120 or so in our biggest center. We migrated from LSF to NC
(http://www.rtda.com) and it's been fairly painless. Some issues, but
nothing we haven't worked around.

On our systems, we tend to make a big swap partition and then mount
/tmp on top of it. So we use swap (and VM) to cache /tmp usage as
needed.

We've got mostly dual CPU Opterons with 16Gb RAM, plus more dual/quad
core, dual and quad CPU systems with 32 to 256Gb of RAM. Mostly
Opterons, but the newer stuff is all Xeons.

I agree with all the comments which state if you swap, your dead. No
arguement there. But what about when you have a small or medium
memory job which writes a bunch of /tmp files? Do people have /tmp
local, or do they write over the network via NFS?

John
Doug Hughes
2009-09-04 20:33:09 UTC
Permalink
Post by John Stoffel
I agree with all the comments which state if you swap, your dead. No
arguement there. But what about when you have a small or medium
memory job which writes a bunch of /tmp files? Do people have /tmp
local, or do they write over the network via NFS?
each of our HPC nodes has 200+ GB of disk and 80MB or less of OS, so, we
keep /tmp local (sometimes striping it, depending upon need)

But, our HPC jobs typically don't make a lot of /tmp files. Most are
compute and memory latency bound.
Michael Tiernan
2009-09-05 20:14:28 UTC
Permalink
First off, to everyone in this discussion, thank you for actually
explaining things, it's very helpful to learn *why* certain things are
done.
Date: Fri, 4 Sep 2009 21:37:01 -0400
Subject: Re: [lopsa-tech] Swap sizing in Linux HPC cluster nodes.
If you see the number of buffers has gone to zero
[...]
Take a bunch of samples on live systems.
If someone was to do samplings like this, what tools would you
suggest? I am sure the usual tools, top, free, sar, etc. are useful
but I'm just wondering if there's anything else that you prefer when
doing this sort of data collection.
--
<< MCT >> Michael C Tiernan.
http://www.linkedin.com/in/mtiernan
Skylar Thompson
2009-09-05 20:18:13 UTC
Permalink
Post by Michael Tiernan
First off, to everyone in this discussion, thank you for actually
explaining things, it's very helpful to learn *why* certain things are
done.
Date: Fri, 4 Sep 2009 21:37:01 -0400
Subject: Re: [lopsa-tech] Swap sizing in Linux HPC cluster nodes.
If you see the number of buffers has gone to zero
[...]
Take a bunch of samples on live systems.
If someone was to do samplings like this, what tools would you
suggest? I am sure the usual tools, top, free, sar, etc. are useful
but I'm just wondering if there's anything else that you prefer when
doing this sort of data collection.
"sar -r" would be my vote. sar has the advantage of giving historical
information. The other tools only give point information, so you'd have
to pipe the information to a file and in the end reinvent the sar wheel.
--
-- Skylar Thompson (***@cs.earlham.edu)
-- http://www.cs.earlham.edu/~skylar/
Michael D. Parker
2009-09-05 23:40:18 UTC
Permalink
I have just started a gig and have inherited a large collection of
heterogeneous UNIX systems of the following flavors all running: hpux
(11.11, 11.31), aix, mpras, sun solaris (sun 8 9 10), redhat (as3, as4, as5)
, and suse (9 10 11). What would be the ideal from management's point of
view is to have all of these systems configuration controlled and managed
from hopefully one program. It is understood that each of these operating
systems will have different base configurations. The items to be managed
include patches, packages, and configuration files.

I have just started looking at cfengine, and looking at some type of
do-it-yourself hybrid using subversion.

Management would prefer to use a commercial package if possible and was
wondering if you all had any ideas of this type of application or vendors?

Thanks for your ideas, pointers, experiences, etc.

Michael Parker
Daniel Pittman
2009-09-06 00:19:16 UTC
Permalink
Post by Michael D. Parker
I have just started a gig and have inherited a large collection of
heterogeneous UNIX systems of the following flavors all running: hpux
(11.11, 11.31), aix, mpras, sun solaris (sun 8 9 10), redhat (as3, as4, as5)
, and suse (9 10 11). What would be the ideal from management's point of
view is to have all of these systems configuration controlled and managed
from hopefully one program. It is understood that each of these operating
systems will have different base configurations. The items to be managed
include patches, packages, and configuration files.
I have just started looking at cfengine, and looking at some type of
do-it-yourself hybrid using subversion.
Management would prefer to use a commercial package if possible and was
wondering if you all had any ideas of this type of application or vendors?
I would strongly suggest puppet, if you are looking at the open source
options, is a vastly better choice than either of the two you listed.
(...and, yes, I /do/ still have the scars from using both in production.)

You can also purchase commercial support for puppet, although it doesn't have
the standard "giant company" attached that your management probably want.

Regards,
Daniel
--
✣ Daniel Pittman ✉ ***@rimspace.net ☎ +61 401 155 707
♽ made with 100 percent post-consumer electrons
Looking for work? Love Perl? In Melbourne, Australia? We are hiring.
Esther Filderman
2009-09-06 01:03:22 UTC
Permalink
With all due respect to my esteemed colleagues, as someone who as also
played and/or worked with a small variety of config mgmt systems:
They're all, basically, the same.

Oh, don't get me wrong, they all have different syntaxes, layouts and
systems. But in the end, here are your collections, here's how you
get it on your systems, here's your config files, film at 11.

I'm sure I'll get lambasted by various converts, if not the people who
write these systems, if not tossed on the BBQ and roasted for these
words. (please use a spicy sauce, people.)

What you need from people here isn't "I like XYZ" it's "Can XYZ do
this?" However you will probably need to get some of the more major
ones and at the very very least look at the documentation, if not do a
test setup and see what works best.

Do you want a push or pull configuration? Do you want the client to
be able to have editable config file locally or should everything be
on the master server? Do you care if the backend is written in C or a
scripting language? Do you want fries with that?

With all seriousness you really are going to have to try things out
yourself. I believe the most common software packages are cfengine,
bfcg, puppet and, maybe, radmind. I'm sure proponents of others will
speak up. Remember the lesser known ones may have less support, but
there's an argument for 5 dedicated coders over 50 squabbling shmoes.


Good luck, and remember, if all else fails, nuke everything from
orbit. It's the only way to be sure.

Moose
Michael D. Parker
2009-09-06 16:26:35 UTC
Permalink
Hi...I thank you for your views. Can you provide the names of some of the
other options that I can examine?

Thanks


Michael Parker
***@unixwizard.com
+1 760 598 4793

-----Original Message-----
From: tech-***@lopsa.org [mailto:tech-***@lopsa.org] On Behalf Of
Esther Filderman
Sent: Saturday, September 05, 2009 6:03 PM
To: ***@lopsa.org
Subject: Re: [lopsa-tech] UNIX/Linux Site Management Tool?

With all due respect to my esteemed colleagues, as someone who as also
played and/or worked with a small variety of config mgmt systems:
They're all, basically, the same.

Oh, don't get me wrong, they all have different syntaxes, layouts and
systems. But in the end, here are your collections, here's how you
get it on your systems, here's your config files, film at 11.

I'm sure I'll get lambasted by various converts, if not the people who
write these systems, if not tossed on the BBQ and roasted for these
words. (please use a spicy sauce, people.)

What you need from people here isn't "I like XYZ" it's "Can XYZ do
this?" However you will probably need to get some of the more major
ones and at the very very least look at the documentation, if not do a
test setup and see what works best.

Do you want a push or pull configuration? Do you want the client to
be able to have editable config file locally or should everything be
on the master server? Do you care if the backend is written in C or a
scripting language? Do you want fries with that?

With all seriousness you really are going to have to try things out
yourself. I believe the most common software packages are cfengine,
bfcg, puppet and, maybe, radmind. I'm sure proponents of others will
speak up. Remember the lesser known ones may have less support, but
there's an argument for 5 dedicated coders over 50 squabbling shmoes.


Good luck, and remember, if all else fails, nuke everything from
orbit. It's the only way to be sure.

Moose
_______________________________________________
Tech mailing list
***@lopsa.org
http://lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
http://lopsa.org/
Esther Filderman
2009-09-07 23:58:37 UTC
Permalink
On Sun, Sep 6, 2009 at 12:26 PM, Michael D.
Hi...I thank you for your views.  Can you provide the names of some of the
other options that I can examine?
Other than these? The only other one I'm personally familiar is one I
wouldn't recommend - CMU's depot, which hasn't been updated since
Hector was a pup.

woof!
Esther Filderman
Sent: Saturday, September 05, 2009 6:03 PM
Subject: Re: [lopsa-tech] UNIX/Linux Site Management Tool?
With all seriousness you really are going to have to try things out
yourself.  I believe the most common software packages are cfengine,
bfcg, puppet and, maybe, radmind.  I'm sure proponents of others will
speak up.  Remember the lesser known ones may have less support, but
there's an argument for 5 dedicated coders over 50 squabbling shmoes.
Josh Smift
2009-09-06 17:00:30 UTC
Permalink
MDP == Michael D Parker <***@unixwizard.com>

MDP> I have just started looking at cfengine, and looking at some type of
MDP> do-it-yourself hybrid using subversion.

These aren't exclusive: You can store your config files in a Subversion
repository, so you can keep track of what you've changed, roll back to
previous versions, etc; and then use Cfengine (or Puppet, or whatever) to
actually distribute the files (and perform whatever other actions you need
to perform) on your systems.

I highly recommend that sort of approach.

-Josh (***@infersys.com)
Aleksey Tsalolikhin
2009-09-07 23:29:14 UTC
Permalink
Hi, Michael. Just wanted to make sure you were aware commercial
support for cfengine is available from cfengine.com.

I have been running cfengine 2 for a couple of years, managing HP-UX
and Red Hat and Red Hat -like servers.

I haven't gotten onto cfengine 3 yet because its fairly new and I want
it to get shaken out first.

Because of the wide variation in cfengine 2 syntax, I've ended up
doing most of my configuration policies in shell scripts (the
universal language of the Unix sys admin). cfengine provides the
infrastructure to make sure they don't run too often and don't
mailbomb me, but otherwise about 80% of my config is in shell scripts.

Well done for turning to an automated configuration management
solution, whatever you choose.

Best,
Aleksey
Yves Dorfsman
2009-09-06 04:48:01 UTC
Permalink
Post by Michael Tiernan
First off, to everyone in this discussion, thank you for actually
explaining things, it's very helpful to learn *why* certain things are
done.
Date: Fri, 4 Sep 2009 21:37:01 -0400
Subject: Re: [lopsa-tech] Swap sizing in Linux HPC cluster nodes.
If you see the number of buffers has gone to zero
[...]
Take a bunch of samples on live systems.
If someone was to do samplings like this, what tools would you
suggest? I am sure the usual tools, top, free, sar, etc. are useful
but I'm just wondering if there's anything else that you prefer when
doing this sort of data collection.
I quite like orcallator / procallator + orca, I typically run them on each
machine, and push the data, including the html files, to one web server.
--
Yves.
http://www.sollers.ca/
Edward Ned Harvey
2009-09-06 14:18:11 UTC
Permalink
Post by Michael Tiernan
Post by Edward Ned Harvey
If you see the number of buffers has gone to zero
[...]
Post by Edward Ned Harvey
Take a bunch of samples on live systems.
If someone was to do samplings like this, what tools would you
suggest? I am sure the usual tools, top, free, sar, etc. are useful
but I'm just wondering if there's anything else that you prefer when
doing this sort of data collection.
cron, (date ; free) >> somefile.txt
;-)

Optionally grep if you want to limit just the info you want.
Loading...