Discussion:
[lopsa-tech] backing up your VMs
(too old to reply)
Adam Levin
2015-10-27 13:14:55 UTC
Permalink
Hey all, I've got a question about how you backup your VM environment?

We're using vSphere 5.5 and NetApp NAS for datastores. We have about 75
8TB datastores, and about 2500 VMs. The VMs are not distributed evenly
because of service levels associated with the datastores.

We're being told by various backup vendors that the main issue is the
number of VMs per datastore, because quiescing lots of VMs and then taking
a datastore snapshot can produce long wait times when rolling the qieusced
images back in to the running VM.

Our VM team is telling us that there is no current tool to manage the
number of VMs per datastore, just the size of the datastores.

So I'm curious what some common methods are for managing backups in a large
VM environment. Do you just use agents and backup from within the VM? Do
you bother doing app consistent backups of the VMs or just snapshot the
datastore and not worry about consistency? Have you found a product that
manages qiuescing and snapshots in a reasonable way?

We've looked at NetBackup, Commvault and Veeam so far.

Thanks,
-Adam
Matt Simmons
2015-10-27 13:20:03 UTC
Permalink
When on a NetApp, I've seen most people use the NetApp VMware connector for
snapvaulting. I don't know how it operates at that scale, but I imagine it
could scale out. You have 75 datastores... I just don't know what would be
required to make it performant at that extreme.

The NetApp dude at our company is on vacation right now, but when he gets
back, I can ask.

--Matt


On Tue, Oct 27, 2015 at 6:14 AM, Adam Levin <***@gmail.com> wrote:

> Hey all, I've got a question about how you backup your VM environment?
>
> We're using vSphere 5.5 and NetApp NAS for datastores. We have about 75
> 8TB datastores, and about 2500 VMs. The VMs are not distributed evenly
> because of service levels associated with the datastores.
>
> We're being told by various backup vendors that the main issue is the
> number of VMs per datastore, because quiescing lots of VMs and then taking
> a datastore snapshot can produce long wait times when rolling the qieusced
> images back in to the running VM.
>
> Our VM team is telling us that there is no current tool to manage the
> number of VMs per datastore, just the size of the datastores.
>
> So I'm curious what some common methods are for managing backups in a
> large VM environment. Do you just use agents and backup from within the
> VM? Do you bother doing app consistent backups of the VMs or just snapshot
> the datastore and not worry about consistency? Have you found a product
> that manages qiuescing and snapshots in a reasonable way?
>
> We've looked at NetBackup, Commvault and Veeam so far.
>
> Thanks,
> -Adam
>
>
>
> _______________________________________________
> Tech mailing list
> ***@lists.lopsa.org
> https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
> This list provided by the League of Professional System Administrators
> http://lopsa.org/
>
>
Adam Levin
2015-10-27 13:26:43 UTC
Permalink
Thanks, Matt.

The challenge right now isn't so much on the NetApp side but on the VMWare
side.

Typical sequence of events:
1) get list of VMs on datastore X
2) quiesce all VMs on datastore X
3) snapshot datastore X via NetApp mechanism
4) un-quiesce all VMs on datastore X

What happens is that step 2 takes about 30 seconds per VM. While the VMs
are quiesced, they are effectively using VMWare's snapshot mechanism to
store changed blocks until the NetApp snapshot is done. Step 3 takes a
couple of seconds -- not an issue. Step 4, then, has to roll through each
VM and remove the VMWare snapshot. The problem here is that the longer
they are quiesced, the longer they take to come back. By the time we're
near the last few VMs, they are taking a long time to roll forward and
commit those VM snapshot changes.

We have the option of not quiescing (man, that word gets harder to type
every time) the VMs and just taking a netapp snapshot, which may or may not
be fully restorable. I'm curious if anyone else is doing that.

-Adam

On Tue, Oct 27, 2015 at 9:20 AM, Matt Simmons <***@lopsa.org> wrote:

> When on a NetApp, I've seen most people use the NetApp VMware connector
> for snapvaulting. I don't know how it operates at that scale, but I imagine
> it could scale out. You have 75 datastores... I just don't know what would
> be required to make it performant at that extreme.
>
> The NetApp dude at our company is on vacation right now, but when he gets
> back, I can ask.
>
> --Matt
>
>
> On Tue, Oct 27, 2015 at 6:14 AM, Adam Levin <***@gmail.com> wrote:
>
>> Hey all, I've got a question about how you backup your VM environment?
>>
>> We're using vSphere 5.5 and NetApp NAS for datastores. We have about 75
>> 8TB datastores, and about 2500 VMs. The VMs are not distributed evenly
>> because of service levels associated with the datastores.
>>
>> We're being told by various backup vendors that the main issue is the
>> number of VMs per datastore, because quiescing lots of VMs and then taking
>> a datastore snapshot can produce long wait times when rolling the qieusced
>> images back in to the running VM.
>>
>> Our VM team is telling us that there is no current tool to manage the
>> number of VMs per datastore, just the size of the datastores.
>>
>> So I'm curious what some common methods are for managing backups in a
>> large VM environment. Do you just use agents and backup from within the
>> VM? Do you bother doing app consistent backups of the VMs or just snapshot
>> the datastore and not worry about consistency? Have you found a product
>> that manages qiuescing and snapshots in a reasonable way?
>>
>> We've looked at NetBackup, Commvault and Veeam so far.
>>
>> Thanks,
>> -Adam
>>
>>
>>
>> _______________________________________________
>> Tech mailing list
>> ***@lists.lopsa.org
>> https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
>> This list provided by the League of Professional System Administrators
>> http://lopsa.org/
>>
>>
>
John Stoffel
2015-10-27 13:58:12 UTC
Permalink
Adam> The challenge right now isn't so much on the NetApp side but on the VMWare side.

Adam> Typical sequence of events:
Adam> 1) get list of VMs on datastore X
Adam> 2) quiesce all VMs on datastore X
Adam> 3) snapshot datastore X via NetApp mechanism
Adam> 4) un-quiesce all VMs on datastore X

Adam> What happens is that step 2 takes about 30 seconds per VM. 
Adam> While the VMs are quiesced, they are effectively using VMWare's
Adam> snapshot mechanism to store changed blocks until the NetApp
Adam> snapshot is done.  Step 3 takes a couple of seconds -- not an
Adam> issue.  Step 4, then, has to roll through each VM and remove the
Adam> VMWare snapshot.  The problem here is that the longer they are
Adam> quiesced, the longer they take to come back.  By the time we're
Adam> near the last few VMs, they are taking a long time to roll
Adam> forward and commit those VM snapshot changes.

Adam> We have the option of not quiescing (man, that word gets harder
Adam> to type every time) the VMs and just taking a netapp snapshot,
Adam> which may or may not be fully restorable.  I'm curious if anyone
Adam> else is doing that.

We're going to have the same type of problem down the line too, and
I've used CommVault (on FC SAN volumes), a little bit of Veeam, and
we're moving to Netbackup with Snapmanager on NFS datastores.

To me, the best thing you could do is to just make more smaller
datastores so that you have fewer VMs per datastore. It's not ideal
in alot of ways, but maybe if you have so many VMs per-datastore, it's
the best option?

I also think that our VMware guys are also gambling that on most VMs
they can do a restore even if they do a backing store snapshot on the
Netapp (cDOT 8.2.x) in alot of cases. I do the Netapp side of the
house, not as much on the VMware side day to day.

Is there anyway to parallelize the ESX side, so that it finds the VMs
and then does three or four of them at a time? Esp if they are on
seperate ESX hosts, that should be doable.

But it's an interesting problem and I don't have a solution either.

John
_______________________________________________
Tech mailing list
***@lists.lopsa.org
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
http://lopsa.org/
Ray Van Dolson
2015-10-27 14:06:02 UTC
Permalink
On Tue, Oct 27, 2015 at 09:58:12AM -0400, John Stoffel wrote:

<snip>

> We're going to have the same type of problem down the line too, and
> I've used CommVault (on FC SAN volumes), a little bit of Veeam, and
> we're moving to Netbackup with Snapmanager on NFS datastores.

<snip>

Out of curiosity, what made you move away from Veeam?

Ray
_______________________________________________
Tech mailing list
***@lists.lopsa.org
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
http://lopsa.org/
John Stoffel
2015-10-27 15:42:31 UTC
Permalink
>>>>> "Ray" == Ray Van Dolson <***@esri.com> writes:

Ray> On Tue, Oct 27, 2015 at 09:58:12AM -0400, John Stoffel wrote:
Ray> <snip>

>> We're going to have the same type of problem down the line too, and
>> I've used CommVault (on FC SAN volumes), a little bit of Veeam, and
>> we're moving to Netbackup with Snapmanager on NFS datastores.

Ray> Out of curiosity, what made you move away from Veeam?

Politics, licensing, trying to consolidate down to one tool (if
possible). The usual.


_______________________________________________
Tech mailing list
***@lists.lopsa.org
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
http://lopsa.org/
Adam Levin
2015-10-27 15:47:19 UTC
Permalink
The more we look into this, the more I think that trying to use just one
tool is going to mean that some part of the environment isn't going to work
well. Different tools have different strengths. Our management is pushing
for this one tool solution as well, but it's causing some difficulties
because of the limitations.

-Adam

On Tue, Oct 27, 2015 at 11:42 AM, John Stoffel <***@stoffel.org> wrote:

> >>>>> "Ray" == Ray Van Dolson <***@esri.com> writes:
>
> Ray> On Tue, Oct 27, 2015 at 09:58:12AM -0400, John Stoffel wrote:
> Ray> <snip>
>
> >> We're going to have the same type of problem down the line too, and
> >> I've used CommVault (on FC SAN volumes), a little bit of Veeam, and
> >> we're moving to Netbackup with Snapmanager on NFS datastores.
>
> Ray> Out of curiosity, what made you move away from Veeam?
>
> Politics, licensing, trying to consolidate down to one tool (if
> possible). The usual.
>
>
>
John Stoffel
2015-10-27 15:55:45 UTC
Permalink
Yeah, it's a hard balance to strike. Having one tool to do it all
makes training and support simpler and easier. But... if that tool
can't do it all as well as a specific tool, then maybe it's not a good
tradeoff to make.

I don't have a good answer, but in some cases just pure $$$ costs
argues against going with more than one tool. Dunno...

But getting back to the root cause, I think going with smaller
datastores is the best track here.

Adam> The more we look into this, the more I think that trying to use
Adam> just one tool is going to mean that some part of the environment
Adam> isn't going to work well.  Different tools have different
Adam> strengths.  Our management is pushing for this one tool solution
Adam> as well, but it's causing some difficulties because of the
Adam> limitations.

Adam> -Adam

Adam> On Tue, Oct 27, 2015 at 11:42 AM, John Stoffel <***@stoffel.org> wrote:

>>>>>> "Ray" == Ray Van Dolson <***@esri.com> writes:

Ray> On Tue, Oct 27, 2015 at 09:58:12AM -0400, John Stoffel wrote:
Ray> <snip>

>>> We're going to have the same type of problem down the line too, and
>>> I've used CommVault (on FC SAN volumes), a little bit of Veeam, and
>>> we're moving to Netbackup with Snapmanager on NFS datastores.

Ray> Out of curiosity, what made you move away from Veeam?

Adam> Politics, licensing, trying to consolidate down to one tool (if
Adam> possible).  The usual.

_______________________________________________
Tech mailing list
***@lists.lopsa.org
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
http://lopsa.org/
Adam Levin
2015-10-27 15:57:38 UTC
Permalink
Yeah that seems to be the easiest answer, even if it's not ideal. That'll
naturally limit the number of VMs per datastore. If we can manage to
change policies to go with crash-consistent instead of app-consistent on
most of our service levels, that'll help a lot too.

-Adam

On Tue, Oct 27, 2015 at 11:55 AM, John Stoffel <***@stoffel.org> wrote:

>
> Yeah, it's a hard balance to strike. Having one tool to do it all
> makes training and support simpler and easier. But... if that tool
> can't do it all as well as a specific tool, then maybe it's not a good
> tradeoff to make.
>
> I don't have a good answer, but in some cases just pure $$$ costs
> argues against going with more than one tool. Dunno...
>
> But getting back to the root cause, I think going with smaller
> datastores is the best track here.
>
> Adam> The more we look into this, the more I think that trying to use
> Adam> just one tool is going to mean that some part of the environment
> Adam> isn't going to work well. Different tools have different
> Adam> strengths. Our management is pushing for this one tool solution
> Adam> as well, but it's causing some difficulties because of the
> Adam> limitations.
>
> Adam> -Adam
>
> Adam> On Tue, Oct 27, 2015 at 11:42 AM, John Stoffel <***@stoffel.org>
> wrote:
>
> >>>>>> "Ray" == Ray Van Dolson <***@esri.com> writes:
>
> Ray> On Tue, Oct 27, 2015 at 09:58:12AM -0400, John Stoffel wrote:
> Ray> <snip>
>
> >>> We're going to have the same type of problem down the line too, and
> >>> I've used CommVault (on FC SAN volumes), a little bit of Veeam, and
> >>> we're moving to Netbackup with Snapmanager on NFS datastores.
>
> Ray> Out of curiosity, what made you move away from Veeam?
>
> Adam> Politics, licensing, trying to consolidate down to one tool (if
> Adam> possible). The usual.
>
>
John Stoffel
2015-10-27 19:52:50 UTC
Permalink
With Clustered OnTap, it actually makes sense to have lots of volumes
and lots of IPs, one per datastore, so you can move them around the
cluster and also move the interface to follow the volume as well. You
burn through IPs, but that's what the 192.168.x.y space is for, right?
Just dedicate a class C or two to DataStore IPs and you should be all
set.

This isn't as big an issue with 7-mode though it's probably still good
to have an IP address per-head so that the load splits nicely.

John

Adam> Yeah that seems to be the easiest answer, even if it's not
Adam> ideal.  That'll naturally limit the number of VMs per
Adam> datastore.  If we can manage to change policies to go with
Adam> crash-consistent instead of app-consistent on most of our
Adam> service levels, that'll help a lot too.

Adam> -Adam

Adam> On Tue, Oct 27, 2015 at 11:55 AM, John Stoffel <***@stoffel.org> wrote:

Adam> Yeah, it's a hard balance to strike.  Having one tool to do it all
Adam> makes training and support simpler and easier.  But... if that tool
Adam> can't do it all as well as a specific tool, then maybe it's not a good
Adam> tradeoff to make.

Adam> I don't have a good answer, but in some cases just pure $$$ costs
Adam> argues against going with more than one tool.  Dunno...

Adam> But getting back to the root cause, I think going with smaller
Adam> datastores is the best track here.

Adam> The more we look into this, the more I think that trying to use
Adam> just one tool is going to mean that some part of the environment
Adam> isn't going to work well.  Different tools have different
Adam> strengths.  Our management is pushing for this one tool solution
Adam> as well, but it's causing some difficulties because of the
Adam> limitations.

Adam> -Adam

Adam> On Tue, Oct 27, 2015 at 11:42 AM, John Stoffel <***@stoffel.org> wrote:

>>>>>>> "Ray" == Ray Van Dolson <***@esri.com> writes:

Ray> On Tue, Oct 27, 2015 at 09:58:12AM -0400, John Stoffel wrote:
Ray> <snip>

>>>> We're going to have the same type of problem down the line too, and
>>>> I've used CommVault (on FC SAN volumes), a little bit of Veeam, and
>>>> we're moving to Netbackup with Snapmanager on NFS datastores.

Ray> Out of curiosity, what made you move away from Veeam?

Adam>      Politics, licensing, trying to consolidate down to one tool (if
Adam>      possible).  The usual.

_______________________________________________
Tech mailing list
***@lists.lopsa.org
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
http://lopsa.org/
Adam Levin
2015-10-27 20:12:07 UTC
Permalink
One of the problems with using lots of datastores is the IP issue with
cDOT, which as you point out isn't a problem with RFC 1918 address spaces...

...unless your network team long ago got really tired of mergers and
acquisitions causing all sorts of problems with overlapping address spaces,
and decided to just use public IP spaces, including already allocated
spaces, because nobody ever connects to DOD addresses anyway, right?

But, ah, I think that's a story for another time...

-Adam

On Tue, Oct 27, 2015 at 3:52 PM, John Stoffel <***@stoffel.org> wrote:

>
> With Clustered OnTap, it actually makes sense to have lots of volumes
> and lots of IPs, one per datastore, so you can move them around the
> cluster and also move the interface to follow the volume as well. You
> burn through IPs, but that's what the 192.168.x.y space is for, right?
> Just dedicate a class C or two to DataStore IPs and you should be all
> set.
>
> This isn't as big an issue with 7-mode though it's probably still good
> to have an IP address per-head so that the load splits nicely.
>
> John
>
> Adam> Yeah that seems to be the easiest answer, even if it's not
> Adam> ideal. That'll naturally limit the number of VMs per
> Adam> datastore. If we can manage to change policies to go with
> Adam> crash-consistent instead of app-consistent on most of our
> Adam> service levels, that'll help a lot too.
>
> Adam> -Adam
>
> Adam> On Tue, Oct 27, 2015 at 11:55 AM, John Stoffel <***@stoffel.org>
> wrote:
>
> Adam> Yeah, it's a hard balance to strike. Having one tool to do it
> all
> Adam> makes training and support simpler and easier. But... if that
> tool
> Adam> can't do it all as well as a specific tool, then maybe it's not
> a good
> Adam> tradeoff to make.
>
> Adam> I don't have a good answer, but in some cases just pure $$$ costs
> Adam> argues against going with more than one tool. Dunno...
>
> Adam> But getting back to the root cause, I think going with smaller
> Adam> datastores is the best track here.
>
> Adam> The more we look into this, the more I think that trying to use
> Adam> just one tool is going to mean that some part of the environment
> Adam> isn't going to work well. Different tools have different
> Adam> strengths. Our management is pushing for this one tool solution
> Adam> as well, but it's causing some difficulties because of the
> Adam> limitations.
>
> Adam> -Adam
>
> Adam> On Tue, Oct 27, 2015 at 11:42 AM, John Stoffel <***@stoffel.org>
> wrote:
>
> >>>>>>> "Ray" == Ray Van Dolson <***@esri.com> writes:
>
> Ray> On Tue, Oct 27, 2015 at 09:58:12AM -0400, John Stoffel wrote:
> Ray> <snip>
>
> >>>> We're going to have the same type of problem down the line too, and
> >>>> I've used CommVault (on FC SAN volumes), a little bit of Veeam, and
> >>>> we're moving to Netbackup with Snapmanager on NFS datastores.
>
> Ray> Out of curiosity, what made you move away from Veeam?
>
> Adam> Politics, licensing, trying to consolidate down to one tool (if
> Adam> possible). The usual.
>
>
John Stoffel
2015-10-27 22:10:25 UTC
Permalink
Adam> One of the problems with using lots of datastores is the IP
Adam> issue with cDOT, which as you point out isn't a problem with RFC
Adam> 1918 address spaces...

Adam> ...unless your network team long ago got really tired of mergers
Adam> and acquisitions causing all sorts of problems with overlapping
Adam> address spaces, and decided to just use public IP spaces,
Adam> including already allocated spaces, because nobody ever connects
Adam> to DOD addresses anyway, right?

Adam> But, ah, I think that's a story for another time...

LOL! Been there! We're using lots of 10.x stuff internally, and even
as we merge related OpCos in, we have to deal with overlaps and other
painful stuff.

And using public IP spaces... really dumb outside a lab environment.
I mean how hard is it to use 10.x.x.x for everything these days? And
esp for ESX to Netapp stuff, a private VLAN and private IP space is
the best way forward. I haven't touched it yet, but 8.3 looks to
allow you to mix the same IP spaces across VServers, which is really
neat, and might address alot of these issues.

_______________________________________________
Tech mailing list
***@lists.lopsa.org
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
http://lopsa.org/
David Lang
2015-10-28 20:41:32 UTC
Permalink
On Tue, 27 Oct 2015, John Stoffel wrote:

> And using public IP spaces... really dumb outside a lab environment.
> I mean how hard is it to use 10.x.x.x for everything these days?

that depends, how hard is it to change your IPs when you merge with someone else
who is already using the 10.x.x.x and you now need to deconflict things.

or how hard is it to setup a VPN to a business partner who's using 10.x.x.x
internally and has conflicting addresses. Especially if the two servers that
need to connect are both 10.1.1.1

Now, think about a company that's connecting to hundreds or thousands of
businesses that all have "just use 10.x.x.x" as their policy.

that's where things get ugly and using publicly allocated addresses for internal
stuff is attractive because it already de-dups the ranges.

David Lang
_______________________________________________
Tech mailing list
***@lists.lopsa.org
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
http://lopsa.org/
Adam Levin
2015-10-28 20:52:05 UTC
Permalink
Yeah, I'm not sure there's a great answer to this. Even just choosing
random blocks of public IP's can get you into trouble if the other company
has guys that think just like you. :)

-Adam

On Wed, Oct 28, 2015 at 4:41 PM, David Lang <***@lang.hm> wrote:

> On Tue, 27 Oct 2015, John Stoffel wrote:
>
> And using public IP spaces... really dumb outside a lab environment.
>> I mean how hard is it to use 10.x.x.x for everything these days?
>>
>
> that depends, how hard is it to change your IPs when you merge with
> someone else who is already using the 10.x.x.x and you now need to
> deconflict things.
>
> or how hard is it to setup a VPN to a business partner who's using
> 10.x.x.x internally and has conflicting addresses. Especially if the two
> servers that need to connect are both 10.1.1.1
>
> Now, think about a company that's connecting to hundreds or thousands of
> businesses that all have "just use 10.x.x.x" as their policy.
>
> that's where things get ugly and using publicly allocated addresses for
> internal stuff is attractive because it already de-dups the ranges.
>
> David Lang
>
Jason Barbier
2015-10-27 13:20:18 UTC
Permalink
Right now at $work we use a combination of Veeam weeklies,  because
Veeam takes a week to do all of our backups,  ans ZFS snapshots for
nightlies.  Getting onto nextenta's SAN product with ZFS helped fix a
lot of small issues we had around backups and SAN migration.

--
Jason Barbier | E: ***@serversave.us
GPG Key-ID: B5F75B47(http://kusuriya.devio.us/pubkey.asc)


On Tue, Oct 27, 2015, at 06:14 AM, Adam Levin wrote:
> Hey all, I've got a question about how you backup your VM environment?
>
> We're using vSphere 5.5 and NetApp NAS for datastores.  We have about 75 8TB datastores, and about 2500 VMs.  The VMs are not distributed evenly because of service levels associated with the datastores.
>
> We're being told by various backup vendors that the main issue is the number of VMs per datastore, because quiescing lots of VMs and then taking a datastore snapshot can produce long wait times when rolling the qieusced images back in to the running VM.
>
> Our VM team is telling us that there is no current tool to manage the number of VMs per datastore, just the size of the datastores.
>
> So I'm curious what some common methods are for managing backups in a large VM environment. Do you just use agents and backup from within the VM?  Do you bother doing app consistent backups of the VMs or just snapshot the datastore and not worry about consistency?  Have you found a product that manages qiuescing and snapshots in a reasonable way?
>
> We've looked at NetBackup, Commvault and Veeam so far.
>
> Thanks,
> -Adam
>
>
> _________________________________________________
> Tech mailing list
> ***@lists.lopsa.org
> https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
> This list provided by the League of Professional System Administrators
> http://lopsa.org/
Adam Levin
2015-10-27 13:41:36 UTC
Permalink
Interesting. How is that different from just taking regular snapshots?
Don't you still have to quiesce the VMs before taking the flexclone? How
many VMs per volume do you have?

-Adam

On Tue, Oct 27, 2015 at 9:32 AM, Ray Van Dolson <***@esri.com> wrote:

> We use CommVault + NetApp + FlexClone (Intellisnap). As long as you
> schedule things so you only end up taking one FlexClone per volume per
> cycle, it works quite well. The time to roll back the VMware level
> snapshots becomes very small.
>
> Ray
>
> On Tue, Oct 27, 2015 at 06:20:18AM -0700, Jason Barbier wrote:
> > Right now at $work we use a combination of Veeam weeklies, because
> Veeam takes
> > a week to do all of our backups, ans ZFS snapshots for nightlies.
> Getting
> > onto nextenta's SAN product with ZFS helped fix a lot of small issues we
> had
> > around backups and SAN migration.
> >
> > --
> > Jason Barbier | E: ***@serversave.us
> > GPG Key-ID: B5F75B47(http://kusuriya.devio.us/pubkey.asc)
> >
> >
> > On Tue, Oct 27, 2015, at 06:14 AM, Adam Levin wrote:
> >
> > Hey all, I've got a question about how you backup your VM
> environment?
> >
> > We're using vSphere 5.5 and NetApp NAS for datastores. We have
> about 75
> > 8TB datastores, and about 2500 VMs. The VMs are not distributed
> evenly
> > because of service levels associated with the datastores.
> >
> > We're being told by various backup vendors that the main issue is the
> > number of VMs per datastore, because quiescing lots of VMs and then
> taking
> > a datastore snapshot can produce long wait times when rolling the
> qieusced
> > images back in to the running VM.
> >
> > Our VM team is telling us that there is no current tool to manage the
> > number of VMs per datastore, just the size of the datastores.
> >
> > So I'm curious what some common methods are for managing backups in
> a large
> > VM environment. Do you just use agents and backup from within the
> VM? Do
> > you bother doing app consistent backups of the VMs or just snapshot
> the
> > datastore and not worry about consistency? Have you found a product
> that
> > manages qiuescing and snapshots in a reasonable way?
> >
> > We've looked at NetBackup, Commvault and Veeam so far.
> >
> > Thanks,
> > -Adam
>
Ray Van Dolson
2015-10-27 13:54:35 UTC
Permalink
Yes, we still do quiesce the VM's -- but perhaps avoid some of the
issues you've seen by having more numerous, smaller FlexVol datastores
(usually around 5TB max). We had to work with our Compute team to kind
of re-work things to accommodate backup workflows to minimize
disruption.

I'd guess there are on average around 50 VM's per volume. All are not
necessarily backed up, so the backup footprint is fairly distributed
across volumes.

Note: We do complement this by having a couple days worth of "dirty"
(e.g. non-quiesced) storage snaps of each volume. In practice this has
worked well for spot recovery. System typically boots up as if it had
crashed which we've found is typically fine for all but the most
sensitive workloads. Then we can just FlexClone the volume to get
things back up and running quickly and Storage vMotion the "recovered"
VM out.

On Tue, Oct 27, 2015 at 09:41:36AM -0400, Adam Levin wrote:
> Interesting. How is that different from just taking regular snapshots? Don't
> you still have to quiesce the VMs before taking the flexclone? How many VMs per
> volume do you have?
>
> -Adam
>
> On Tue, Oct 27, 2015 at 9:32 AM, Ray Van Dolson <***@esri.com> wrote:
>
> We use CommVault + NetApp + FlexClone (Intellisnap). As long as you
> schedule things so you only end up taking one FlexClone per volume per
> cycle, it works quite well. The time to roll back the VMware level
> snapshots becomes very small.
>
> Ray
>
_______________________________________________
Tech mailing list
***@lists.lopsa.org
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
http://lopsa.org/
Adam Levin
2015-10-27 13:57:51 UTC
Permalink
Thanks, this is what we were thinking of doing, so it's good to hear it's
working for others. I'm hearing from vendors that 50 VM's per datastore is
a good number. We were hoping for larger datastores (most of our VMs are
Windows <50GB or Linux <100GB), but that would probably end up with too
many VM's to handle effectively.

I'm thinking weekly quiesced backups and daily basic snapshots would
probably work well. We have a bunch of MS SQL VM's -- they need special
handling, naturally, but they have their own datastores anyway.

-Adam

On Tue, Oct 27, 2015 at 9:54 AM, Ray Van Dolson <***@esri.com> wrote:

> Yes, we still do quiesce the VM's -- but perhaps avoid some of the
> issues you've seen by having more numerous, smaller FlexVol datastores
> (usually around 5TB max). We had to work with our Compute team to kind
> of re-work things to accommodate backup workflows to minimize
> disruption.
>
> I'd guess there are on average around 50 VM's per volume. All are not
> necessarily backed up, so the backup footprint is fairly distributed
> across volumes.
>
> Note: We do complement this by having a couple days worth of "dirty"
> (e.g. non-quiesced) storage snaps of each volume. In practice this has
> worked well for spot recovery. System typically boots up as if it had
> crashed which we've found is typically fine for all but the most
> sensitive workloads. Then we can just FlexClone the volume to get
> things back up and running quickly and Storage vMotion the "recovered"
> VM out.
>
> On Tue, Oct 27, 2015 at 09:41:36AM -0400, Adam Levin wrote:
> > Interesting. How is that different from just taking regular snapshots?
> Don't
> > you still have to quiesce the VMs before taking the flexclone? How many
> VMs per
> > volume do you have?
> >
> > -Adam
> >
> > On Tue, Oct 27, 2015 at 9:32 AM, Ray Van Dolson <***@esri.com>
> wrote:
> >
> > We use CommVault + NetApp + FlexClone (Intellisnap). As long as you
> > schedule things so you only end up taking one FlexClone per volume
> per
> > cycle, it works quite well. The time to roll back the VMware level
> > snapshots becomes very small.
> >
> > Ray
> >
>
Ray Van Dolson
2015-10-27 14:02:09 UTC
Permalink
Curious what your approach with the SQL VM's are. We've struggled
getting this right w/ CommVault. Backups are consistently fast, but
restore speeds can vary *wildly* (whereas other datasets don't seem to
have this variability). Our DBA's are pferring to stick with the
native SQL backup/restore tools as they seem to get better performance
there.

On Tue, Oct 27, 2015 at 09:57:51AM -0400, Adam Levin wrote:
> Thanks, this is what we were thinking of doing, so it's good to hear
> it's working for others. I'm hearing from vendors that 50 VM's per
> datastore is a good number. We were hoping for larger datastores
> (most of our VMs are Windows <50GB or Linux <100GB), but that would
> probably end up with too many VM's to handle effectively.
>
> I'm thinking weekly quiesced backups and daily basic snapshots would
> probably work well. We have a bunch of MS SQL VM's -- they need
> special handling, naturally, but they have their own datastores
> anyway.
>
> -Adam
>
> On Tue, Oct 27, 2015 at 9:54 AM, Ray Van Dolson <***@esri.com> wrote:
>
> Yes, we still do quiesce the VM's -- but perhaps avoid some of the
> issues you've seen by having more numerous, smaller FlexVol datastores
> (usually around 5TB max). We had to work with our Compute team to kind
> of re-work things to accommodate backup workflows to minimize
> disruption.
>
> I'd guess there are on average around 50 VM's per volume. All are not
> necessarily backed up, so the backup footprint is fairly distributed
> across volumes.
>
> Note: We do complement this by having a couple days worth of "dirty"
> (e.g. non-quiesced) storage snaps of each volume. In practice this has
> worked well for spot recovery. System typically boots up as if it had
> crashed which we've found is typically fine for all but the most
> sensitive workloads. Then we can just FlexClone the volume to get
> things back up and running quickly and Storage vMotion the "recovered"
> VM out.
_______________________________________________
Tech mailing list
***@lists.lopsa.org
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
http://lopsa.org/
Adam Levin
2015-10-27 14:04:38 UTC
Permalink
At the moment we are using the native tools too. We are looking into doing
the *really important* MS SQL systems with our Actifio appliance, but so
far we haven't done enough testing for me to form an opinion, and it's
certainly not a cheap solution. :)

-Adam

On Tue, Oct 27, 2015 at 10:02 AM, Ray Van Dolson <***@esri.com>
wrote:

> Curious what your approach with the SQL VM's are. We've struggled
> getting this right w/ CommVault. Backups are consistently fast, but
> restore speeds can vary *wildly* (whereas other datasets don't seem to
> have this variability). Our DBA's are pferring to stick with the
> native SQL backup/restore tools as they seem to get better performance
> there.
>
> On Tue, Oct 27, 2015 at 09:57:51AM -0400, Adam Levin wrote:
> > Thanks, this is what we were thinking of doing, so it's good to hear
> > it's working for others. I'm hearing from vendors that 50 VM's per
> > datastore is a good number. We were hoping for larger datastores
> > (most of our VMs are Windows <50GB or Linux <100GB), but that would
> > probably end up with too many VM's to handle effectively.
> >
> > I'm thinking weekly quiesced backups and daily basic snapshots would
> > probably work well. We have a bunch of MS SQL VM's -- they need
> > special handling, naturally, but they have their own datastores
> > anyway.
> >
> > -Adam
> >
> > On Tue, Oct 27, 2015 at 9:54 AM, Ray Van Dolson <***@esri.com>
> wrote:
> >
> > Yes, we still do quiesce the VM's -- but perhaps avoid some of the
> > issues you've seen by having more numerous, smaller FlexVol
> datastores
> > (usually around 5TB max). We had to work with our Compute team to
> kind
> > of re-work things to accommodate backup workflows to minimize
> > disruption.
> >
> > I'd guess there are on average around 50 VM's per volume. All are
> not
> > necessarily backed up, so the backup footprint is fairly distributed
> > across volumes.
> >
> > Note: We do complement this by having a couple days worth of "dirty"
> > (e.g. non-quiesced) storage snaps of each volume. In practice this
> has
> > worked well for spot recovery. System typically boots up as if it
> had
> > crashed which we've found is typically fine for all but the most
> > sensitive workloads. Then we can just FlexClone the volume to get
> > things back up and running quickly and Storage vMotion the
> "recovered"
> > VM out.
>
Page, Jeremy
2015-10-27 14:23:20 UTC
Permalink
We use largish NFS data stores (mostly on NetApp) for our ESX servers & have had good results just taking raw dumps of the most recent snapshots. For stuff that we're more concerned about consistancy (SQL, AD etc) we dump using their native tools to an extra disk attached to said VM.

It has worked quite well for us (been like this for 8 years now). Never had trouble restoring something even though we're not doing any voodoo on the client OS side to ensure consistency. Maybe we're just lucky although with the LARC/NVRAM I think this is a reasonably safe thing to do. No flexclone'ing involved.

Now if my guys would just get comfortable mounting VMDK snapshots (RO) on the loopback so they could do single file restores with out restoring the entire VMDK file...

________________________________
From: tech-***@lists.lopsa.org [tech-***@lists.lopsa.org] on behalf of Adam Levin [***@gmail.com]
Sent: Tuesday, October 27, 2015 10:04 AM
To: Ray Van Dolson
Cc: ***@lists.lopsa.org
Subject: Re: [lopsa-tech] backing up your VMs

At the moment we are using the native tools too. We are looking into doing the *really important* MS SQL systems with our Actifio appliance, but so far we haven't done enough testing for me to form an opinion, and it's certainly not a cheap solution. :)

-Adam

On Tue, Oct 27, 2015 at 10:02 AM, Ray Van Dolson <***@esri.com<mailto:***@esri.com>> wrote:
Curious what your approach with the SQL VM's are. We've struggled
getting this right w/ CommVault. Backups are consistently fast, but
restore speeds can vary *wildly* (whereas other datasets don't seem to
have this variability). Our DBA's are pferring to stick with the
native SQL backup/restore tools as they seem to get better performance
there.

On Tue, Oct 27, 2015 at 09:57:51AM -0400, Adam Levin wrote:
> Thanks, this is what we were thinking of doing, so it's good to hear
> it's working for others. I'm hearing from vendors that 50 VM's per
> datastore is a good number. We were hoping for larger datastores
> (most of our VMs are Windows <50GB or Linux <100GB), but that would
> probably end up with too many VM's to handle effectively.
>
> I'm thinking weekly quiesced backups and daily basic snapshots would
> probably work well. We have a bunch of MS SQL VM's -- they need
> special handling, naturally, but they have their own datastores
> anyway.
>
> -Adam
>
> On Tue, Oct 27, 2015 at 9:54 AM, Ray Van Dolson <***@esri.com<mailto:***@esri.com>> wrote:
>
> Yes, we still do quiesce the VM's -- but perhaps avoid some of the
> issues you've seen by having more numerous, smaller FlexVol datastores
> (usually around 5TB max). We had to work with our Compute team to kind
> of re-work things to accommodate backup workflows to minimize
> disruption.
>
> I'd guess there are on average around 50 VM's per volume. All are not
> necessarily backed up, so the backup footprint is fairly distributed
> across volumes.
>
> Note: We do complement this by having a couple days worth of "dirty"
> (e.g. non-quiesced) storage snaps of each volume. In practice this has
> worked well for spot recovery. System typically boots up as if it had
> crashed which we've found is typically fine for all but the most
> sensitive workloads. Then we can just FlexClone the volume to get
> things back up and running quickly and Storage vMotion the "recovered"
> VM out.

Please be advised that this email may contain confidential information. If you are not the intended recipient, please notify us by email by replying to the sender and delete this message. The sender disclaims that the content of this email constitutes an offer to enter into, or the acceptance of, any agreement; provided that the foregoing does not invalidate the binding effect of any digital or other electronic reproduction of a manual signature that is included in any attachment.
Ski Kacoroski
2015-10-27 14:40:07 UTC
Permalink
Adam,

I am a much smaller shop, but we really like Unitrends Virtual Backup.
It is like Veeam, was less expensive for us, and just plain works. I
have no idea how if it could handle your solution.

cheers,

ski

On 10/27/2015 06:14 AM, Adam Levin wrote:
> Hey all, I've got a question about how you backup your VM environment?
>
> We're using vSphere 5.5 and NetApp NAS for datastores. We have about 75
> 8TB datastores, and about 2500 VMs. The VMs are not distributed evenly
> because of service levels associated with the datastores.
>
> We're being told by various backup vendors that the main issue is the
> number of VMs per datastore, because quiescing lots of VMs and then
> taking a datastore snapshot can produce long wait times when rolling the
> qieusced images back in to the running VM.
>
> Our VM team is telling us that there is no current tool to manage the
> number of VMs per datastore, just the size of the datastores.
>
> So I'm curious what some common methods are for managing backups in a
> large VM environment. Do you just use agents and backup from within the
> VM? Do you bother doing app consistent backups of the VMs or just
> snapshot the datastore and not worry about consistency? Have you found
> a product that manages qiuescing and snapshots in a reasonable way?
>
> We've looked at NetBackup, Commvault and Veeam so far.
>
> Thanks,
> -Adam
>
>
>
>
> _______________________________________________
> Tech mailing list
> ***@lists.lopsa.org
> https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
> This list provided by the League of Professional System Administrators
> http://lopsa.org/
>

--
"When we try to pick out anything by itself, we find it
connected to the entire universe" John Muir

Chris "Ski" Kacoroski, ***@lopsa.org, 206-501-9803
or ski98033 on most IM services
_______________________________________________
Tech mailing list
***@lists.lopsa.org
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
http://lopsa.org/
Adam Levin
2015-10-27 15:27:50 UTC
Permalink
Thanks Ski. We also recently spoke to a new, young vendor named Rubrik.
It's an interesting product, but our company tends to avoid technology
that's less than 5 years or so in the market (pah! boring! :) ).

-Adam

On Tue, Oct 27, 2015 at 10:40 AM, Ski Kacoroski <***@gmail.com> wrote:

> Adam,
>
> I am a much smaller shop, but we really like Unitrends Virtual Backup. It
> is like Veeam, was less expensive for us, and just plain works. I have no
> idea how if it could handle your solution.
>
> cheers,
>
> ski
>
>
> On 10/27/2015 06:14 AM, Adam Levin wrote:
>
>> Hey all, I've got a question about how you backup your VM environment?
>>
>> We're using vSphere 5.5 and NetApp NAS for datastores. We have about 75
>> 8TB datastores, and about 2500 VMs. The VMs are not distributed evenly
>> because of service levels associated with the datastores.
>>
>> We're being told by various backup vendors that the main issue is the
>> number of VMs per datastore, because quiescing lots of VMs and then
>> taking a datastore snapshot can produce long wait times when rolling the
>> qieusced images back in to the running VM.
>>
>> Our VM team is telling us that there is no current tool to manage the
>> number of VMs per datastore, just the size of the datastores.
>>
>> So I'm curious what some common methods are for managing backups in a
>> large VM environment. Do you just use agents and backup from within the
>> VM? Do you bother doing app consistent backups of the VMs or just
>> snapshot the datastore and not worry about consistency? Have you found
>> a product that manages qiuescing and snapshots in a reasonable way?
>>
>> We've looked at NetBackup, Commvault and Veeam so far.
>>
>> Thanks,
>> -Adam
>>
>>
>>
>>
>> _______________________________________________
>> Tech mailing list
>> ***@lists.lopsa.org
>> https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
>> This list provided by the League of Professional System Administrators
>> http://lopsa.org/
>>
>>
> --
> "When we try to pick out anything by itself, we find it
> connected to the entire universe" John Muir
>
> Chris "Ski" Kacoroski, ***@lopsa.org, 206-501-9803
> or ski98033 on most IM services
>
Dave Caplinger
2015-10-27 15:49:22 UTC
Permalink
We use Unitrends Virtual Backup as well; it has been around for quite a while since it was previously known as PHD Virtual Backup before Unitrends acquired them. It snapshots and backs up individual VM's virtual disks at a time (with de-duplication and compression) so we don't have the issue of trying to quiesce the entire datastore at once. If I recall correctly there are agents to handle cases like SQL or Exchange as well. It also supports replicating completed backups to a secondary (perhaps offsite) destination. We've been very happy with it overall.

- Dave

> On Oct 27, 2015, at 10:27 AM, Adam Levin <***@gmail.com> wrote:
>
> Thanks Ski. We also recently spoke to a new, young vendor named Rubrik. It's an interesting product, but our company tends to avoid technology that's less than 5 years or so in the market (pah! boring! :) ).
>
> -Adam
>
> On Tue, Oct 27, 2015 at 10:40 AM, Ski Kacoroski <***@gmail.com> wrote:
> Adam,
>
> I am a much smaller shop, but we really like Unitrends Virtual Backup. It is like Veeam, was less expensive for us, and just plain works. I have no idea how if it could handle your solution.
>
> cheers,
>
> ski
>
>
> On 10/27/2015 06:14 AM, Adam Levin wrote:
> Hey all, I've got a question about how you backup your VM environment?
>
> We're using vSphere 5.5 and NetApp NAS for datastores. We have about 75
> 8TB datastores, and about 2500 VMs. The VMs are not distributed evenly
> because of service levels associated with the datastores.
>
> We're being told by various backup vendors that the main issue is the
> number of VMs per datastore, because quiescing lots of VMs and then
> taking a datastore snapshot can produce long wait times when rolling the
> qieusced images back in to the running VM.
>
> Our VM team is telling us that there is no current tool to manage the
> number of VMs per datastore, just the size of the datastores.
>
> So I'm curious what some common methods are for managing backups in a
> large VM environment. Do you just use agents and backup from within the
> VM? Do you bother doing app consistent backups of the VMs or just
> snapshot the datastore and not worry about consistency? Have you found
> a product that manages qiuescing and snapshots in a reasonable way?
>
> We've looked at NetBackup, Commvault and Veeam so far.
>
> Thanks,
> -Adam
>
>
>
>
> _______________________________________________
> Tech mailing list
> ***@lists.lopsa.org
> https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
> This list provided by the League of Professional System Administrators
> http://lopsa.org/
>
>
> --
> "When we try to pick out anything by itself, we find it
> connected to the entire universe" John Muir
>
> Chris "Ski" Kacoroski, ***@lopsa.org, 206-501-9803
> or ski98033 on most IM services
>
> _______________________________________________
> Tech mailing list
> ***@lists.lopsa.org
> https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
> This list provided by the League of Professional System Administrators
> http://lopsa.org/

_______________________________________________
Tech mailing list
***@lists.lopsa.org
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
http://lopsa.org/
Adam Levin
2015-10-27 15:55:40 UTC
Permalink
Cool, thanks for the pointer. Can I ask what the size of your environment
is? Is the product scaling well?

Thanks,
-Adam

On Tue, Oct 27, 2015 at 11:49 AM, Dave Caplinger <
***@solutionary.com> wrote:

> We use Unitrends Virtual Backup as well; it has been around for quite a
> while since it was previously known as PHD Virtual Backup before Unitrends
> acquired them. It snapshots and backs up individual VM's virtual disks at
> a time (with de-duplication and compression) so we don't have the issue of
> trying to quiesce the entire datastore at once. If I recall correctly
> there are agents to handle cases like SQL or Exchange as well. It also
> supports replicating completed backups to a secondary (perhaps offsite)
> destination. We've been very happy with it overall.
>
> - Dave
>
> > On Oct 27, 2015, at 10:27 AM, Adam Levin <***@gmail.com> wrote:
> >
> > Thanks Ski. We also recently spoke to a new, young vendor named
> Rubrik. It's an interesting product, but our company tends to avoid
> technology that's less than 5 years or so in the market (pah! boring! :) ).
> >
> > -Adam
> >
> > On Tue, Oct 27, 2015 at 10:40 AM, Ski Kacoroski <***@gmail.com>
> wrote:
> > Adam,
> >
> > I am a much smaller shop, but we really like Unitrends Virtual Backup.
> It is like Veeam, was less expensive for us, and just plain works. I have
> no idea how if it could handle your solution.
> >
> > cheers,
> >
> > ski
> >
> >
> > On 10/27/2015 06:14 AM, Adam Levin wrote:
> > Hey all, I've got a question about how you backup your VM environment?
> >
> > We're using vSphere 5.5 and NetApp NAS for datastores. We have about 75
> > 8TB datastores, and about 2500 VMs. The VMs are not distributed evenly
> > because of service levels associated with the datastores.
> >
> > We're being told by various backup vendors that the main issue is the
> > number of VMs per datastore, because quiescing lots of VMs and then
> > taking a datastore snapshot can produce long wait times when rolling the
> > qieusced images back in to the running VM.
> >
> > Our VM team is telling us that there is no current tool to manage the
> > number of VMs per datastore, just the size of the datastores.
> >
> > So I'm curious what some common methods are for managing backups in a
> > large VM environment. Do you just use agents and backup from within the
> > VM? Do you bother doing app consistent backups of the VMs or just
> > snapshot the datastore and not worry about consistency? Have you found
> > a product that manages qiuescing and snapshots in a reasonable way?
> >
> > We've looked at NetBackup, Commvault and Veeam so far.
> >
> > Thanks,
> > -Adam
> >
> >
> >
> >
> > _______________________________________________
> > Tech mailing list
> > ***@lists.lopsa.org
> > https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
> > This list provided by the League of Professional System Administrators
> > http://lopsa.org/
> >
> >
> > --
> > "When we try to pick out anything by itself, we find it
> > connected to the entire universe" John Muir
> >
> > Chris "Ski" Kacoroski, ***@lopsa.org, 206-501-9803
> > or ski98033 on most IM services
> >
> > _______________________________________________
> > Tech mailing list
> > ***@lists.lopsa.org
> > https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
> > This list provided by the League of Professional System Administrators
> > http://lopsa.org/
>
>
Edward Ned Harvey (lopser)
2015-10-28 10:52:33 UTC
Permalink
I'm hearing a lot of people here saying "quiesce" the VM, and how many VM's do you have per volume... I am surprised by both of these.

What I've always done was to make individual zvol's in ZFS, and export them over iscsi. Then vmware simply uses that "disk" as the disk for the VM. Let ZFS do snapshotting, and don't worry about vmware. Every guest OS (at least every one I've had to deal with) is designed to be able to survive a power failure (or kernel halt or whatever) so if you ever need to rollback or restore a ZFS snapshot and reboot the guest, you're effectively booting that guest as if the power had been interrupted at the time of the snapshot.

_______________________________________________
Tech mailing list
***@lists.lopsa.org
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
http://lopsa.org/
Adam Levin
2015-10-28 12:32:32 UTC
Permalink
I'm not sure I understand exactly what you're doing. Are you using RDMs
and giving each VM a direct LUN to the storage system, or are you
presenting datastores via iSCSI? Are you saying you're presenting one
datastore per VM?

Managing RDMs for 2500 VMs is simply impractical, and there's a limit to
the number of datastores VMWare supports anyway.

As for the filesystems, it's true that most filesystems today can survive a
hard reboot, but the applications may or may not. It's not 100%
guaranteed. Crash consistency provides for the OS to come back, do some
filesystem repairs, and hopefully most of your data is intact, but it
doesn't guarantee it. For some non-trivial number of user applications,
they require a higher level SLA. In our case, it might make sense to just
perform standard client-side agent-based backups for those to guarantee
restorability, but of course that has its own issues with time and amount
of data.

In Windows, there are VSS writer processes that control the I/O, and you
need to communicate with the VSS writers to tell them to pause while a
snapshot is taken to guarantee filesystem integrity (to make sure all I/Os
are written and verified). So not only does VMWare need to be aware of the
snapshot, but Windows does as well. VMWare Tools allows VMWare to tell the
VM, through VSS, to quiesce, and then VMWare can take its snapshot -- it
knows to quiesce when it takes its own snapshot. Once that snapshot
exists, it's 100% safe for the NetApp (or ZFS) to snap. The added
complexity of VMWare and virtualized datastores doesn't help this process.

If you constantly hard reboot a busy MS SQL server, chances are it'll
eventually not come back the way you want it.

Now, with vSphere 6 and vvols, this should be less of an issue because each
VM gets a virtual datastore and can be individually controlled,
snapshotted, and backed up.

-Adam

On Wed, Oct 28, 2015 at 6:52 AM, Edward Ned Harvey (lopser) <
***@nedharvey.com> wrote:

> I'm hearing a lot of people here saying "quiesce" the VM, and how many
> VM's do you have per volume... I am surprised by both of these.
>
> What I've always done was to make individual zvol's in ZFS, and export
> them over iscsi. Then vmware simply uses that "disk" as the disk for the
> VM. Let ZFS do snapshotting, and don't worry about vmware. Every guest OS
> (at least every one I've had to deal with) is designed to be able to
> survive a power failure (or kernel halt or whatever) so if you ever need to
> rollback or restore a ZFS snapshot and reboot the guest, you're effectively
> booting that guest as if the power had been interrupted at the time of the
> snapshot.
>
>
Edward Ned Harvey (lopser)
2015-10-28 13:41:54 UTC
Permalink
> From: Adam Levin [mailto:***@gmail.com]
>
> I'm not sure I understand exactly what you're doing.  Are you using RDMs
> and giving each VM a direct LUN to the storage system, or are you presenting
> datastores via iSCSI?  Are you saying you're presenting one datastore per
> VM?

Yeah, iscsi, one datastore per VM. There's no requirement to have separate datastores per VM - it's just that it's nice to have each VM independent of the other. So you can snapshot/rollback/destroy VM's without any relation to the others.


> Managing RDMs for 2500 VMs is simply impractical, and there's a limit to the
> number of datastores VMWare supports anyway.

When it's automated, there's no work impact on me, that would make it impractical. I don't know how many datastores vmware supports - Thanks for mentioning it. Looks like:

Virtual disks per datastore cluster 9000
Datastores per datastore cluster 64
Datastore clusters per vCenter 256
http://www.vmware.com/pdf/vsphere5/r55/vsphere-55-configuration-maximums.pdf


> As for the filesystems, it's true that most filesystems today can survive a hard
> reboot, but the applications may or may not.

Dunno what filesystems or applications you support, but these aren't concerns for the *filesystems* ext3/4, btrfs, ntfs, xfs, zfs, hfs+... Which is all the filesystems I can think of, in current usage anywhere I've ever worked.

As for applications - Most applications other than databases have no problems. (Depends on what applications you're supporting - transactional credit card processing, for example, and probably some other applications, should be handled with care. Haven't been a concern to me.) For databases, you make sure it's ACID compliant, and then it's not a problem. Plus, you're backing up databases by other means in addition to OS snapshots, right? So there's a multiple safety net.

SQLite is ACID compliant.
MySql is ACID compliant when using InnoDB. Prior to 5.5, that was not the default (but you could choose InnoDB if you wanted). But mysql >= 5.5 they made InnoDB the default.
Postgres is ACID compliant.

I haven't checked into MS SQL or Oracle. I would be very shocked if they weren't.


> Crash consistency provides for the OS to come back, do some
> filesystem repairs, and hopefully most of your data is intact,

I would describe it differently: After a hard crash, a journaling or intent logging filesystem is able to (without any effort) instantly detect which, if any, write operations had been interrupted, and then either back it out as if it never existed, or complete it as if it were never interrupted, thus guaranteeing the filesystem is always in a consistent state - meaning - A state through which the filesystem had passed, but suddenly got interrupted, during normal operation.

After reading the rest of your post, it's clear that the difference between your and my ideas boils down to this: I think the filesystems and applications can survive a crash. You don't believe that - but you also believe that quiescing addresses that problem - I disagree on both points. Quiescing is just flushing buffers. But the filesystem itself, and any ACID compliant databases, already have awareness of which writes need to be flushed to ensure consistency - And they are already flushing those particular writes, just in case of power loss or kernel crash, while allowing the other writes to exist in write buffers. They aren't counting on *you* to trigger a quiesce before a crash. They assume a crash could occur at any time, without any warning.

If you snapshot the storage without flushing the buffers, yes there will be data in memory that wasn't included in the snapshot, but no there won't be an inconsistent filesystem or unusable database after recovery. Yes, it's true that the 15 seconds of in-memory buffered writes will be excluded from the snapshot, but it's also true that the 3 hours and 59 minutes of writes that will occur before your next snapshot will also be excluded from the current snapshot. Suddenly the 15 seconds of buffered writes you might save by flushing buffers prior to snapshot becomes less relevant.
_______________________________________________
Tech mailing list
***@lists.lopsa.org
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
http:/
Brandon Allbery
2015-10-28 13:47:40 UTC
Permalink
On Wed, Oct 28, 2015 at 9:41 AM, Edward Ned Harvey (lopser) <
***@nedharvey.com> wrote:

> Dunno what filesystems or applications you support, but these aren't
> concerns for the *filesystems* ext3/4, btrfs, ntfs, xfs, zfs, hfs+... Which
> is all the filesystems I can think of, in current usage anywhere I've ever
> worked.


Sadly HFS+ *is* known to sometimes corrupt itself in unfixable ways on hard
powerdown.

--
brandon s allbery kf8nh sine nomine associates
***@gmail.com ***@sinenomine.net
unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net
Edward Ned Harvey (lopser)
2015-10-28 13:51:03 UTC
Permalink
> From: Brandon Allbery [mailto:***@gmail.com]
>
> Sadly HFS+ *is* known to sometimes corrupt itself in unfixable ways on hard
> powerdown.

Link?

I've never experienced that, and I haven't been able to find any supporting information from the hive mind.
_______________________________________________
Tech mailing list
***@lists.lopsa.org
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
http://lopsa.org/
Brandon Allbery
2015-10-28 13:57:48 UTC
Permalink
On Wed, Oct 28, 2015 at 9:51 AM, Edward Ned Harvey (lopser) <
***@nedharvey.com> wrote:

> Link?
>
> I've never experienced that, and I haven't been able to find any
> supporting information from the hive mind.
>

Mostly discussion/"help plz!" in #macports IRC. It's not especially common
but there've been enough (3-4) instances to make me wary of relying on it.

xfs has been known to eat itself under some circumstances as well; that one
has been discussed in #lopsa IRC.

And in general, relying on being able to walk away from a bad landing just
seems like an open invitation for things to go wrong. *Especially* for
backups.

--
brandon s allbery kf8nh sine nomine associates
***@gmail.com ***@sinenomine.net
unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net
Edward Ned Harvey (lopser)
2015-10-28 14:22:58 UTC
Permalink
> From: Brandon Allbery [mailto:***@gmail.com]
>
> And in general, relying on being able to walk away from a bad landing just
> seems like an open invitation for things to go wrong. *Especially* for
> backups.

I think the right approach is to snapshot and replicate the machines in their running state, but *also* run backups inside the guests. For example, mysqldump, pg_dump, rsnapshot/rsync, etc, that send backups to additional offsite locations, and regularly rotate with airgap.
_______________________________________________
Tech mailing list
***@lists.lopsa.org
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
http://lopsa.org/
Edward Ned Harvey (lopser)
2015-10-28 20:17:50 UTC
Permalink
> From: Brandon Allbery [mailto:***@gmail.com]
>
> Mostly discussion/"help plz!" in #macports IRC. It's not especially common
> but there've been enough (3-4) instances to make me wary of relying on it.
>
> xfs has been known to eat itself under some circumstances as well; that one
> has been discussed in #lopsa IRC.

Unless I miss my guess, the discussions you're remembering are *not* filesystem-eats-itself-because-of-power-failure. Every filesystem can become corrupt via hardware failure (CPU or memory errors, etc), or software failures (malware gobbles up critical disk sectors), or human failures. But that's not a reason to believe that snapshotting a running system, or hard-cutting the power leads to filesystem corruption of any kind.
_______________________________________________
Tech mailing list
***@lists.lopsa.org
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
http://lopsa.org/
Adam Levin
2015-10-28 20:30:37 UTC
Permalink
This is a very interesting discussion for me, and probably warrants some
more research and testing. I readily admit that I've always worked under
the operating assumption that pulling the plug *could* lead to corruption,
even after "upgrading" from ufs to xfs those many years ago. It certainly
deserves a second look as to whether this quiescing stuff is necessary.
Many in the industry, including the backup vendors, seem to think it's
required.

Thanks for the thought food.

-Adam

On Wed, Oct 28, 2015 at 4:17 PM, Edward Ned Harvey (lopser) <
***@nedharvey.com> wrote:

> > From: Brandon Allbery [mailto:***@gmail.com]
> >
> > Mostly discussion/"help plz!" in #macports IRC. It's not especially
> common
> > but there've been enough (3-4) instances to make me wary of relying on
> it.
> >
> > xfs has been known to eat itself under some circumstances as well; that
> one
> > has been discussed in #lopsa IRC.
>
> Unless I miss my guess, the discussions you're remembering are *not*
> filesystem-eats-itself-because-of-power-failure. Every filesystem can
> become corrupt via hardware failure (CPU or memory errors, etc), or
> software failures (malware gobbles up critical disk sectors), or human
> failures. But that's not a reason to believe that snapshotting a running
> system, or hard-cutting the power leads to filesystem corruption of any
> kind.
>
Steve VanDevender
2015-10-28 20:42:41 UTC
Permalink
Adam Levin writes:
> This is a very interesting discussion for me, and probably warrants
> some more research and testing.  I readily admit that I've always
> worked under the operating assumption that pulling the plug *could*
> lead to corruption, even after "upgrading" from ufs to xfs those many
> years ago.  It certainly deserves a second look as to whether this
> quiescing stuff is necessary.  Many in the industry, including the
> backup vendors, seem to think it's required.

I think this is just a holdover from the days when the early UNIX
filesystem implementations were much more fragile and unclean shutdowns
frequently led to filesystem damage. Since then filesystem
implementations have improved a great deal and techniques like BSD soft
updates or journaling make major corruption after an unclean shutdown
much less likely (although some data loss is always a possibility, and
the possibility of corruption can't be entirely eliminated).

Database systems still often seem to have this problem, though, and
doing filesystem-level backups of systems with running databases will
often get inconsistent database state.
_______________________________________________
Tech mailing list
***@lists.lopsa.org
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
http://lopsa.org/
Edward Ned Harvey (lopser)
2015-10-28 22:35:05 UTC
Permalink
> From: tech-***@lists.lopsa.org [mailto:tech-***@lists.lopsa.org]
> On Behalf Of Steve VanDevender
>
> Database systems still often seem to have this problem, though, and
> doing filesystem-level backups of systems with running databases will
> often get inconsistent database state.

You definitely *can't* do filesystem-level backups including databases, and expect the databases to be ok. This is because the db daemons keep the file open, and you never know precisely when it's being updated.

Block level snapshots of the underlying device, provided that the db is ACID compliant, don't result in db corruption - but of course, changes after the snapshot aren't included in the snapshot.

Even if you have snapshots, it's smart to do filesystem-level or other more granular backups (database backups) in addition to the snapshots - because sometimes you don't want to rollback the whole machine. You just want to restore the database, or whatever.
_______________________________________________
Tech mailing list
***@lists.lopsa.org
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
http://lopsa.org/
Edward Ned Harvey (lopser)
2015-10-28 22:47:54 UTC
Permalink
> From: Adam Levin [mailto:***@gmail.com]
>
> It certainly deserves
> a second look as to whether this quiescing stuff is necessary.

FWIW, I don't advise *not* quiescing. At worst, it does no harm, and at best, it might be important. But I don't do snapshots in vmware - and don't do quiescing myself, basically because I'm confident in all the other stuff, and only have the tiny littlest miniscule tiny little bit of additional confidence added by quiescing.

For example: Everyone always talks about quiescing as it relates to VSS. This is kind of questionable, because how many applications register as VSS writers, but don't provide consistency just as a general design characteristic? What kind of application would provide a mechanism to ensure consistency for *graceful* crashes or snapshots, but fail to provide consistency for *ungraceful* crashes? The number is probably nonzero, but I have doubts that it's very significant. And what about non-windows VM's? Technically, anywhere vmware tools is running, it might theoretically have hooks into the OS to flush write buffers and so on. But does it really apply to non-windows guests? To what extent? I know you can cause the linux kernel to flush buffers by echoing some magic into /proc, and I would assume vmware tools in linux probably does that upon quiesce request. But I would also assert that doing so is usually *counter* productive, causing a miniscule performance hit without any g
ain - because the data that exists in buffers was not harmful to exist in buffers. The stuff that needed to be flushed to disk in order to maintain filesystem and database consistency had already been flushed to disk.
_______________________________________________
Tech mailing list
***@lists.lopsa.org
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
http://lopsa.org/
Brandon Allbery
2015-10-28 20:35:45 UTC
Permalink
On Wed, Oct 28, 2015 at 4:17 PM, Edward Ned Harvey (lopser) <
***@nedharvey.com> wrote:

> Unless I miss my guess, the discussions you're remembering are *not*
> filesystem-eats-itself-because-of-power-failure. Every filesystem can
> become corrupt via hardware failure (CPU or memory errors, etc), or
> software failures (malware gobbles up critical disk sectors), or human
> failures. But that's not a reason to believe that snapshotting a running
> system, or hard-cutting the power leads to filesystem corruption of any
> kind.
>

Glad you're in a position to make such expert guesses. So spurious
nonrepeatable hardware failures are just happening to occur at loss of
power, you say?
Evidently your great expertise defeats logic.

--
brandon s allbery kf8nh sine nomine associates
***@gmail.com ***@sinenomine.net
unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net
Edward Ned Harvey (lopser)
2015-10-28 22:50:34 UTC
Permalink
> From: Brandon Allbery [mailto:***@gmail.com]
>
>> On Wed, Oct 28, 2015 at 4:17 PM, Edward Ned Harvey (lopser)
>> <***@nedharvey.com> wrote:
>> Unless I miss my guess, the discussions you're remembering are *not*
>> filesystem-eats-itself-because-of-power-failure. Every filesystem can
>> become corrupt via hardware failure (CPU or memory errors, etc), or
>> software failures (malware gobbles up critical disk sectors), or human
>> failures. But that's not a reason to believe that snapshotting a running
>> system, or hard-cutting the power leads to filesystem corruption of any kind.
>
> Glad you're in a position to make such expert guesses. So spurious
> nonrepeatable hardware failures are just happening to occur at loss of
> power, you say?
> Evidently your great expertise defeats logic.

Look, if you can't provide a reference to support your belief that HFS+ and XFS get corrupted by power outages, don't expect anyone to offer more than a guess as to why that information is incorrect.

As to insulting my expertise - Brandon doesn't like Ned. We get it. Thanks, moving on.
_______________________________________________
Tech mailing list
***@lists.lopsa.org
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
http://lopsa.org/
Edward Ned Harvey (lopser)
2015-10-28 14:18:13 UTC
Permalink
> From: Adam Levin [mailto:***@gmail.com]
>
> VMWare Tools allows VMWare to tell
> the VM, through VSS, to quiesce, and then VMWare can take its snapshot --
> it knows to quiesce when it takes its own snapshot.  Once that snapshot
> exists, it's 100% safe

Actually, this is incorrect.

In order for quiescing to do anything at all, an application must register with VSS as a writer. There may be some applications out there, that register as VSS writers, but don't provide crash consistency without VSS. You might gain some consistency in those cases, by quiescing, but it's only a marginal gain over snapshotting without quiescing, and it's definitely not correct to assume it's 100% guarantee with quiescing.

With or without quiescing, the filesystem itself will be 100% consistent, and the ACID databases will be 100% consistent, but let's imagine some application was recording audio from a microphone and that application will stream 5 KB/sec to a file indefinitely. Then with or without quiescing, if the system rolls back, the file will be truncated, and may not be a valid mp3 file (or whatever).

The only way you can have a 100% guarantee about all your files is to gracefully shutdown the VM and then snapshot, but nobody wants to do that. In reality, the risk of online snapshotting, with or without quiescing, are manageable.
Brandon Allbery
2015-10-28 12:45:00 UTC
Permalink
On Wed, Oct 28, 2015 at 6:52 AM, Edward Ned Harvey (lopser) <
***@nedharvey.com> wrote:

> What I've always done was to make individual zvol's in ZFS, and export
> them over iscsi. Then vmware simply uses that "disk" as the disk for the
> VM. Let ZFS do snapshotting, and don't worry about vmware. Every guest OS
> (at least every one I've had to deal with) is designed to be able to
> survive a power failure (or kernel halt or whatever) so if you ever need to
> rollback or restore a ZFS snapshot and reboot the guest, you're effectively
> booting that guest as if the power had been interrupted at the time of the
> snapshot.


OSes, maybe ("designed to" and "it works" are often not on speaking terms
with each other). Applications, far too often not so much.

--
brandon s allbery kf8nh sine nomine associates
***@gmail.com ***@sinenomine.net
unix, openafs, kerberos, infrastructure, xmonad http://sinenomine.net
Edward Ned Harvey (lopser)
2015-10-28 13:47:47 UTC
Permalink
> From: Brandon Allbery [mailto:***@gmail.com]
>
> OSes, maybe ("designed to" and "it works" are often not on speaking terms
> with each other). Applications, far too often not so much.

Perhaps "Designed and tested" would be a more compelling way to phrase that? I know crash consistency testing is included in SQLite, in order to provide assurance that their ACID compliance is done correctly. I have a hard time believing other databases claim ACID compliance and *don't* test for it. But I haven't specifically checked them.

I also haven't specifically checked for the existence of crash tests in extfs, zfs, ntfs, hfs+. But I know they all have journaling or intent logging, and I find it implausible to believe they *don't* do tests. And I know I haven't seen an inconsistent filesystem since FAT, ext2, and OS9. It's been a *very* long time since I thought there was anything to be concerned about, WRT system crashes resulting in a broken filesystem.
_______________________________________________
Tech mailing list
***@lists.lopsa.org
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
http://lopsa.org/
Brad Beyenhof
2015-10-28 15:33:33 UTC
Permalink
At a previous employer we used Avamar for this and I recall it working well. I didn't operate it myself, but restores and clones I requested from backups always came out just as expected.

I believe the product is now owned by EMC.

--
Brad Beyenhof . . . . . . . . . . . . . . . . http://augmentedfourth.com
Every man takes the limits of his own field of vision for the limits
of the world.
~ Arthur Schopenhauer, German philosopher (1788-1860)


> On Oct 27, 2015, at 6:14 AM, Adam Levin <***@gmail.com> wrote:
>
> Hey all, I've got a question about how you backup your VM environment?
>
> We're using vSphere 5.5 and NetApp NAS for datastores. We have about 75 8TB datastores, and about 2500 VMs. The VMs are not distributed evenly because of service levels associated with the datastores.
>
> We're being told by various backup vendors that the main issue is the number of VMs per datastore, because quiescing lots of VMs and then taking a datastore snapshot can produce long wait times when rolling the qieusced images back in to the running VM.
>
> Our VM team is telling us that there is no current tool to manage the number of VMs per datastore, just the size of the datastores.
>
> So I'm curious what some common methods are for managing backups in a large VM environment. Do you just use agents and backup from within the VM? Do you bother doing app consistent backups of the VMs or just snapshot the datastore and not worry about consistency? Have you found a product that manages qiuescing and snapshots in a reasonable way?
>
> We've looked at NetBackup, Commvault and Veeam so far.
>
> Thanks,
> -Adam
>
>
> _______________________________________________
> Tech mailing list
> ***@lists.lopsa.org
> https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
> This list provided by the League of Professional System Administrators
> http://lopsa.org/

_______________________________________________
Tech mailing list
***@lists.lopsa.org
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
http://lopsa.org/
Michael Ryder
2015-10-29 16:11:15 UTC
Permalink
Adam

Have you looked at Tivoli (Now Spectrum Protect)? My environment isn't as
large as yours, but from what I've heard it scales well. We use it to
backup both Linux and Windows VMs and physical hosts.

The add-on Tivoli Data Protector for Virtual Environments is the magic
smoke that adds features like block-level incremental backups, instant
restores and other goodies.

https://www-01.ibm.com/support/knowledgecenter/mobile/#!/SS8TDQ_7.1.3/ve.user/c_ve_overview_tsmnode.html

Scaling out involving multiple data movers per vCenter is possible, and
there are many ways to tune and filter how they operate, such as max number
of simultaneous jobs per datastore, per esxi host, etc.

Let me know if you have any questions- I'll try to answer them or can put
you in touch with a great Tivoli community where you can find folks with
larger installations, more answers etc.

Mike

On Tuesday, October 27, 2015, Adam Levin <***@gmail.com> wrote:

> Hey all, I've got a question about how you backup your VM environment?
>
> We're using vSphere 5.5 and NetApp NAS for datastores. We have about 75
> 8TB datastores, and about 2500 VMs. The VMs are not distributed evenly
> because of service levels associated with the datastores.
>
> We're being told by various backup vendors that the main issue is the
> number of VMs per datastore, because quiescing lots of VMs and then taking
> a datastore snapshot can produce long wait times when rolling the qieusced
> images back in to the running VM.
>
> Our VM team is telling us that there is no current tool to manage the
> number of VMs per datastore, just the size of the datastores.
>
> So I'm curious what some common methods are for managing backups in a
> large VM environment. Do you just use agents and backup from within the
> VM? Do you bother doing app consistent backups of the VMs or just snapshot
> the datastore and not worry about consistency? Have you found a product
> that manages qiuescing and snapshots in a reasonable way?
>
> We've looked at NetBackup, Commvault and Veeam so far.
>
> Thanks,
> -Adam
>
>
>
Adam Levin
2015-10-29 16:26:26 UTC
Permalink
Thanks, Mike. We were a TSM shop until we switched to NBU 6 years ago. I
don't think they're looking to go back, but there's no question it scales.
It's probably the biggest, baddest backup system there is. :)

-Adam

On Thu, Oct 29, 2015 at 12:11 PM, Michael Ryder <***@gmail.com>
wrote:

> Adam
>
> Have you looked at Tivoli (Now Spectrum Protect)? My environment isn't as
> large as yours, but from what I've heard it scales well. We use it to
> backup both Linux and Windows VMs and physical hosts.
>
> The add-on Tivoli Data Protector for Virtual Environments is the magic
> smoke that adds features like block-level incremental backups, instant
> restores and other goodies.
>
>
> https://www-01.ibm.com/support/knowledgecenter/mobile/#!/SS8TDQ_7.1.3/ve.user/c_ve_overview_tsmnode.html
>
> Scaling out involving multiple data movers per vCenter is possible, and
> there are many ways to tune and filter how they operate, such as max number
> of simultaneous jobs per datastore, per esxi host, etc.
>
> Let me know if you have any questions- I'll try to answer them or can put
> you in touch with a great Tivoli community where you can find folks with
> larger installations, more answers etc.
>
> Mike
>
>
> On Tuesday, October 27, 2015, Adam Levin <***@gmail.com> wrote:
>
>> Hey all, I've got a question about how you backup your VM environment?
>>
>> We're using vSphere 5.5 and NetApp NAS for datastores. We have about 75
>> 8TB datastores, and about 2500 VMs. The VMs are not distributed evenly
>> because of service levels associated with the datastores.
>>
>> We're being told by various backup vendors that the main issue is the
>> number of VMs per datastore, because quiescing lots of VMs and then taking
>> a datastore snapshot can produce long wait times when rolling the qieusced
>> images back in to the running VM.
>>
>> Our VM team is telling us that there is no current tool to manage the
>> number of VMs per datastore, just the size of the datastores.
>>
>> So I'm curious what some common methods are for managing backups in a
>> large VM environment. Do you just use agents and backup from within the
>> VM? Do you bother doing app consistent backups of the VMs or just snapshot
>> the datastore and not worry about consistency? Have you found a product
>> that manages qiuescing and snapshots in a reasonable way?
>>
>> We've looked at NetBackup, Commvault and Veeam so far.
>>
>> Thanks,
>> -Adam
>>
>>
>>
Derek J. Balling
2015-10-29 16:28:00 UTC
Permalink
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

One more vote of confidence for NBU. It's been a while since I've used
it, but it was most definitely The Awesome.

D


On 10/29/2015 12:26 PM, Adam Levin wrote:
> Thanks, Mike. We were a TSM shop until we switched to NBU 6 years
> ago. I don't think they're looking to go back, but there's no
> question it scales. It's probably the biggest, baddest backup
> system there is. :)
>
> -Adam
>
> On Thu, Oct 29, 2015 at 12:11 PM, Michael Ryder
> <***@gmail.com <mailto:***@gmail.com>> wrote:
>
> Adam
>
> Have you looked at Tivoli (Now Spectrum Protect)? My environment
> isn't as large as yours, but from what I've heard it scales well.
> We use it to backup both Linux and Windows VMs and physical hosts.
>
> The add-on Tivoli Data Protector for Virtual Environments is the
> magic smoke that adds features like block-level incremental
> backups, instant restores and other goodies.
>
> https://www-01.ibm.com/support/knowledgecenter/mobile/#!/SS8TDQ_7.1.3/
ve.user/c_ve_overview_tsmnode.html
>
> Scaling out involving multiple data movers per vCenter is
> possible, and there are many ways to tune and filter how they
> operate, such as max number of simultaneous jobs per datastore, per
> esxi host, etc.
>
> Let me know if you have any questions- I'll try to answer them or
> can put you in touch with a great Tivoli community where you can
> find folks with larger installations, more answers etc.
>
> Mike
>
>
> On Tuesday, October 27, 2015, Adam Levin <***@gmail.com
> <mailto:***@gmail.com>> wrote:
>
> Hey all, I've got a question about how you backup your VM
> environment?
>
> We're using vSphere 5.5 and NetApp NAS for datastores. We have
> about 75 8TB datastores, and about 2500 VMs. The VMs are not
> distributed evenly because of service levels associated with the
> datastores.
>
> We're being told by various backup vendors that the main issue is
> the number of VMs per datastore, because quiescing lots of VMs and
> then taking a datastore snapshot can produce long wait times when
> rolling the qieusced images back in to the running VM.
>
> Our VM team is telling us that there is no current tool to manage
> the number of VMs per datastore, just the size of the datastores.
>
> So I'm curious what some common methods are for managing backups in
> a large VM environment. Do you just use agents and backup from
> within the VM? Do you bother doing app consistent backups of the
> VMs or just snapshot the datastore and not worry about consistency?
> Have you found a product that manages qiuescing and snapshots in a
> reasonable way?
>
> We've looked at NetBackup, Commvault and Veeam so far.
>
> Thanks, -Adam
>
>
>
>
>
> _______________________________________________ Tech mailing list
> ***@lists.lopsa.org
> https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech This list
> provided by the League of Professional System Administrators
> http://lopsa.org/
>

- --
I prefer to use encrypted mail. My public key fingerprint is FD6A 6990
F035 DE9E 3713 B4F1 661B 3AD6 D82A BBD0. You can download it at
http://www.megacity.org/gpg_dballing.txt

Learn how to encrypt your email with the E-Mail Self Defense Guide:
https://emailselfdefense.fsf.org/en/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0
Comment: GPGTools - https://gpgtools.org

iQIcBAEBCgAGBQJWMkkQAAoJEGYbOtbYKrvQi5wQAMa2r5RonQKCeUxcxeN/uGvg
LgHoLOARXSdm3a6/Leg/WwExTs9e4LG/3ssM8MaPxHWPEm1MCTHkqv+YIHSqp1iy
4DVqhL5BcFIBXGoPTm/CxflbHnLwpjqizoh11p75l+VNyQai/q7OCNEMqWycIzHG
NsX3NIlp+Z8ze+S+3RrXdRT5/JEP3sWa4cnvSuQwNc8KcyPrHHdScBUFOzD1ffdE
gBZHm1lCT4JVscmeG+8Mx6J9K0LXLpZ5Zq4EgmV+leQR5JrU3c5YoOrvw306k2D0
xbYOwc18K7YgqiM/F3K9ZJyHK4YFknGmIrDiKQG46TcaH/bYeqdiWvAqy8Gk6NJX
NMR9psZfMb3Mji/qSj8kfq8hednqKLy41w6gxuo09dVt1EnEm5vEjsG/Kr5/9guR
6QB3ENevXj2HQfxR+vu6vaDZPrCNdynTWj4e2ofhh9MgKQrDsWhW3B1LIyxphsUk
wPMYPEWJv+w9dROzOUzap6ZP/oCXvv1M9RC2ji8/Y+10368rJLZT7cjx8VePVzuv
WSqE9qcNMJ8AU+9+EYHJCUvR8ai6gbiO5kpU6/Beo2oiJXJRBCzo2vDA57qNE8HW
g25cnE4+tONec/4q6YMU6xLo8xNlepgrH3Zp/vZ1xPF1rOIyZHpQTK7Gq/80dt5e
H64PRVU3Pdb9fRIDlVx0
=l8Ae
-----END PGP SIGNATURE-----
_______________________________________________
Tech mailing list
***@lists.lopsa.org
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
http://lopsa.org/
Adam Levin
2015-10-29 16:33:07 UTC
Permalink
The big problem with NBU has been that they are always playing catchup to
the VMWare feature set, and still haven't fully caught up to doing good
snapshot-based backups integrated with NetApp in 7.6. 7.7 supposedly fixes
some of that, but that's what we've been hearing since 7.0.

-Adam

On Thu, Oct 29, 2015 at 12:28 PM, Derek J. Balling <***@megacity.org>
wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA512
>
> One more vote of confidence for NBU. It's been a while since I've used
> it, but it was most definitely The Awesome.
>
> D
>
>
> On 10/29/2015 12:26 PM, Adam Levin wrote:
> > Thanks, Mike. We were a TSM shop until we switched to NBU 6 years
> > ago. I don't think they're looking to go back, but there's no
> > question it scales. It's probably the biggest, baddest backup
> > system there is. :)
> >
> > -Adam
> >
> > On Thu, Oct 29, 2015 at 12:11 PM, Michael Ryder
> > <***@gmail.com <mailto:***@gmail.com>> wrote:
> >
> > Adam
> >
> > Have you looked at Tivoli (Now Spectrum Protect)? My environment
> > isn't as large as yours, but from what I've heard it scales well.
> > We use it to backup both Linux and Windows VMs and physical hosts.
> >
> > The add-on Tivoli Data Protector for Virtual Environments is the
> > magic smoke that adds features like block-level incremental
> > backups, instant restores and other goodies.
> >
> > https://www-01.ibm.com/support/knowledgecenter/mobile/#!/SS8TDQ_7.1.3/
> ve.user/c_ve_overview_tsmnode.html
> >
> > Scaling out involving multiple data movers per vCenter is
> > possible, and there are many ways to tune and filter how they
> > operate, such as max number of simultaneous jobs per datastore, per
> > esxi host, etc.
> >
> > Let me know if you have any questions- I'll try to answer them or
> > can put you in touch with a great Tivoli community where you can
> > find folks with larger installations, more answers etc.
> >
> > Mike
> >
> >
> > On Tuesday, October 27, 2015, Adam Levin <***@gmail.com
> > <mailto:***@gmail.com>> wrote:
> >
> > Hey all, I've got a question about how you backup your VM
> > environment?
> >
> > We're using vSphere 5.5 and NetApp NAS for datastores. We have
> > about 75 8TB datastores, and about 2500 VMs. The VMs are not
> > distributed evenly because of service levels associated with the
> > datastores.
> >
> > We're being told by various backup vendors that the main issue is
> > the number of VMs per datastore, because quiescing lots of VMs and
> > then taking a datastore snapshot can produce long wait times when
> > rolling the qieusced images back in to the running VM.
> >
> > Our VM team is telling us that there is no current tool to manage
> > the number of VMs per datastore, just the size of the datastores.
> >
> > So I'm curious what some common methods are for managing backups in
> > a large VM environment. Do you just use agents and backup from
> > within the VM? Do you bother doing app consistent backups of the
> > VMs or just snapshot the datastore and not worry about consistency?
> > Have you found a product that manages qiuescing and snapshots in a
> > reasonable way?
> >
> > We've looked at NetBackup, Commvault and Veeam so far.
> >
> > Thanks, -Adam
> >
> >
> >
> >
> >
> > _______________________________________________ Tech mailing list
> > ***@lists.lopsa.org
> > https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech This list
> > provided by the League of Professional System Administrators
> > http://lopsa.org/
> >
>
> - --
> I prefer to use encrypted mail. My public key fingerprint is FD6A 6990
> F035 DE9E 3713 B4F1 661B 3AD6 D82A BBD0. You can download it at
> http://www.megacity.org/gpg_dballing.txt
>
> Learn how to encrypt your email with the E-Mail Self Defense Guide:
> https://emailselfdefense.fsf.org/en/
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG/MacGPG2 v2.0
> Comment: GPGTools - https://gpgtools.org
>
> iQIcBAEBCgAGBQJWMkkQAAoJEGYbOtbYKrvQi5wQAMa2r5RonQKCeUxcxeN/uGvg
> LgHoLOARXSdm3a6/Leg/WwExTs9e4LG/3ssM8MaPxHWPEm1MCTHkqv+YIHSqp1iy
> 4DVqhL5BcFIBXGoPTm/CxflbHnLwpjqizoh11p75l+VNyQai/q7OCNEMqWycIzHG
> NsX3NIlp+Z8ze+S+3RrXdRT5/JEP3sWa4cnvSuQwNc8KcyPrHHdScBUFOzD1ffdE
> gBZHm1lCT4JVscmeG+8Mx6J9K0LXLpZ5Zq4EgmV+leQR5JrU3c5YoOrvw306k2D0
> xbYOwc18K7YgqiM/F3K9ZJyHK4YFknGmIrDiKQG46TcaH/bYeqdiWvAqy8Gk6NJX
> NMR9psZfMb3Mji/qSj8kfq8hednqKLy41w6gxuo09dVt1EnEm5vEjsG/Kr5/9guR
> 6QB3ENevXj2HQfxR+vu6vaDZPrCNdynTWj4e2ofhh9MgKQrDsWhW3B1LIyxphsUk
> wPMYPEWJv+w9dROzOUzap6ZP/oCXvv1M9RC2ji8/Y+10368rJLZT7cjx8VePVzuv
> WSqE9qcNMJ8AU+9+EYHJCUvR8ai6gbiO5kpU6/Beo2oiJXJRBCzo2vDA57qNE8HW
> g25cnE4+tONec/4q6YMU6xLo8xNlepgrH3Zp/vZ1xPF1rOIyZHpQTK7Gq/80dt5e
> H64PRVU3Pdb9fRIDlVx0
> =l8Ae
> -----END PGP SIGNATURE-----
> _______________________________________________
> Tech mailing list
> ***@lists.lopsa.org
> https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
> This list provided by the League of Professional System Administrators
> http://lopsa.org/
>
Michael Ryder
2015-10-29 21:05:36 UTC
Permalink
I haven't heard *anything* nice about NBU, sorry.

Are you able to say why they dropped TSM? The DB2 backend implementation
is much tighter now, so database update speed has been vastly improved
along with stability. Also implemented in the past 6 years was incremental
block-level backup and stable dedupe, which together cut backup times down
by 75% and at least halved the space usage.

I easily peg the Ethernet and FC infrastructure during block-level
incremental image backups.

Mike

On Thursday, October 29, 2015, Adam Levin <***@gmail.com> wrote:

> Thanks, Mike. We were a TSM shop until we switched to NBU 6 years ago. I
> don't think they're looking to go back, but there's no question it scales.
> It's probably the biggest, baddest backup system there is. :)
>
> -Adam
>
> On Thu, Oct 29, 2015 at 12:11 PM, Michael Ryder <***@gmail.com
> <javascript:_e(%7B%7D,'cvml','***@gmail.com');>> wrote:
>
>> Adam
>>
>> Have you looked at Tivoli (Now Spectrum Protect)? My environment isn't
>> as large as yours, but from what I've heard it scales well. We use it to
>> backup both Linux and Windows VMs and physical hosts.
>>
>> The add-on Tivoli Data Protector for Virtual Environments is the magic
>> smoke that adds features like block-level incremental backups, instant
>> restores and other goodies.
>>
>>
>> https://www-01.ibm.com/support/knowledgecenter/mobile/#!/SS8TDQ_7.1.3/ve.user/c_ve_overview_tsmnode.html
>>
>> Scaling out involving multiple data movers per vCenter is possible, and
>> there are many ways to tune and filter how they operate, such as max number
>> of simultaneous jobs per datastore, per esxi host, etc.
>>
>> Let me know if you have any questions- I'll try to answer them or can put
>> you in touch with a great Tivoli community where you can find folks with
>> larger installations, more answers etc.
>>
>> Mike
>>
>>
>> On Tuesday, October 27, 2015, Adam Levin <***@gmail.com
>> <javascript:_e(%7B%7D,'cvml','***@gmail.com');>> wrote:
>>
>>> Hey all, I've got a question about how you backup your VM environment?
>>>
>>> We're using vSphere 5.5 and NetApp NAS for datastores. We have about 75
>>> 8TB datastores, and about 2500 VMs. The VMs are not distributed evenly
>>> because of service levels associated with the datastores.
>>>
>>> We're being told by various backup vendors that the main issue is the
>>> number of VMs per datastore, because quiescing lots of VMs and then taking
>>> a datastore snapshot can produce long wait times when rolling the qieusced
>>> images back in to the running VM.
>>>
>>> Our VM team is telling us that there is no current tool to manage the
>>> number of VMs per datastore, just the size of the datastores.
>>>
>>> So I'm curious what some common methods are for managing backups in a
>>> large VM environment. Do you just use agents and backup from within the
>>> VM? Do you bother doing app consistent backups of the VMs or just snapshot
>>> the datastore and not worry about consistency? Have you found a product
>>> that manages qiuescing and snapshots in a reasonable way?
>>>
>>> We've looked at NetBackup, Commvault and Veeam so far.
>>>
>>> Thanks,
>>> -Adam
>>>
>>>
>>>
>
Adam Levin
2015-10-29 21:28:07 UTC
Permalink
Honestly, NBU isn't a bad product. It's been around a long time, and if
you use it in the traditional backup and recovery sense, it works quite
well. Its database doesn't scale the way DB2 does, naturally.

When we were rebuilding our datacenters 6 years ago, TSM was in the
running. We got the new version in to test, and a specialist from IBM came
in to configure it in our lab. By the end of the five days, it still
wasn't working. NBU was set up and running in about 2 hours. It's not
nearly as powerful as a data management platform, and can't scale, but it's
ridiculously simple compared to what TSM was.

The other issue was that at the time, we were looking at dedupe tech and
moving away from tape. TSM just wasn't there yet. NBU was pretty much
leading at the time, especially with integration with dedup.

And, to be clear, NBU is actually working great in our environment when we
use it for traditional client-based backups. It's only the NetApp snapshot
integration with VMWare that's lacking, but that's where we want to go.

-Adam

On Thu, Oct 29, 2015 at 5:05 PM, Michael Ryder <***@gmail.com> wrote:

> I haven't heard *anything* nice about NBU, sorry.
>
> Are you able to say why they dropped TSM? The DB2 backend implementation
> is much tighter now, so database update speed has been vastly improved
> along with stability. Also implemented in the past 6 years was incremental
> block-level backup and stable dedupe, which together cut backup times down
> by 75% and at least halved the space usage.
>
> I easily peg the Ethernet and FC infrastructure during block-level
> incremental image backups.
>
> Mike
>
>
> On Thursday, October 29, 2015, Adam Levin <***@gmail.com> wrote:
>
>> Thanks, Mike. We were a TSM shop until we switched to NBU 6 years ago.
>> I don't think they're looking to go back, but there's no question it
>> scales. It's probably the biggest, baddest backup system there is. :)
>>
>> -Adam
>>
>> On Thu, Oct 29, 2015 at 12:11 PM, Michael Ryder <***@gmail.com>
>> wrote:
>>
>>> Adam
>>>
>>> Have you looked at Tivoli (Now Spectrum Protect)? My environment isn't
>>> as large as yours, but from what I've heard it scales well. We use it to
>>> backup both Linux and Windows VMs and physical hosts.
>>>
>>> The add-on Tivoli Data Protector for Virtual Environments is the magic
>>> smoke that adds features like block-level incremental backups, instant
>>> restores and other goodies.
>>>
>>>
>>> https://www-01.ibm.com/support/knowledgecenter/mobile/#!/SS8TDQ_7.1.3/ve.user/c_ve_overview_tsmnode.html
>>>
>>> Scaling out involving multiple data movers per vCenter is possible, and
>>> there are many ways to tune and filter how they operate, such as max number
>>> of simultaneous jobs per datastore, per esxi host, etc.
>>>
>>> Let me know if you have any questions- I'll try to answer them or can
>>> put you in touch with a great Tivoli community where you can find folks
>>> with larger installations, more answers etc.
>>>
>>> Mike
>>>
>>>
>>> On Tuesday, October 27, 2015, Adam Levin <***@gmail.com> wrote:
>>>
>>>> Hey all, I've got a question about how you backup your VM environment?
>>>>
>>>> We're using vSphere 5.5 and NetApp NAS for datastores. We have about
>>>> 75 8TB datastores, and about 2500 VMs. The VMs are not distributed evenly
>>>> because of service levels associated with the datastores.
>>>>
>>>> We're being told by various backup vendors that the main issue is the
>>>> number of VMs per datastore, because quiescing lots of VMs and then taking
>>>> a datastore snapshot can produce long wait times when rolling the qieusced
>>>> images back in to the running VM.
>>>>
>>>> Our VM team is telling us that there is no current tool to manage the
>>>> number of VMs per datastore, just the size of the datastores.
>>>>
>>>> So I'm curious what some common methods are for managing backups in a
>>>> large VM environment. Do you just use agents and backup from within the
>>>> VM? Do you bother doing app consistent backups of the VMs or just snapshot
>>>> the datastore and not worry about consistency? Have you found a product
>>>> that manages qiuescing and snapshots in a reasonable way?
>>>>
>>>> We've looked at NetBackup, Commvault and Veeam so far.
>>>>
>>>> Thanks,
>>>> -Adam
>>>>
>>>>
>>>>
>>
Adam Levin
2015-10-30 12:33:35 UTC
Permalink
One of the issues we've been seeing with using NBU with the NetApp is that
while NBU can orchestrate the snapshot on the NetApp, it doesn't catalog
which VMs are on which datastores, so if a vMotion occurs and moves a VM,
and then a restore is required, someone has to go hunt for it. I have not
experienced this myself, because I'm not operationally in charge of the
restore process, but I'm told it's a pain. :)

-Adam

On Fri, Oct 30, 2015 at 8:29 AM, Jim Ennis <***@ucf.edu> wrote:

> We use Netbackup to do API backups of 95% of our virtual machines (VMWARE)
> down here, in part since our storage backend is Netapp based and has some
> support for snapshotting/backups.
>
>
>
> We add an attribute to the VMWARE host description that allows it to use
> the API backup and it all happens in the background and works well.
>
>
>
> Jim Ennis
>
> Director Systems and Operations
>
> University of Central Florida
>
> 12716 Pegasus Drive
>
> CSB 308
>
> Orlando, FL 32816
>
>
>
> E-mail: ***@ucf.edu
>
> Voice: 407-823-1701
>
> Fax: 407-882-9017
>
>
>
> *From:* tech-***@lists.lopsa.org [mailto:tech-***@lists.lopsa.org]
> *On Behalf Of *Adam Levin
> *Sent:* Thursday, October 29, 2015 5:28 PM
> *To:* Michael Ryder <***@gmail.com>
> *Cc:* ***@lists.lopsa.org
> *Subject:* Re: [lopsa-tech] backing up your VMs
>
>
>
> Honestly, NBU isn't a bad product. It's been around a long time, and if
> you use it in the traditional backup and recovery sense, it works quite
> well. Its database doesn't scale the way DB2 does, naturally.
>
>
>
> When we were rebuilding our datacenters 6 years ago, TSM was in the
> running. We got the new version in to test, and a specialist from IBM came
> in to configure it in our lab. By the end of the five days, it still
> wasn't working. NBU was set up and running in about 2 hours. It's not
> nearly as powerful as a data management platform, and can't scale, but it's
> ridiculously simple compared to what TSM was.
>
>
>
> The other issue was that at the time, we were looking at dedupe tech and
> moving away from tape. TSM just wasn't there yet. NBU was pretty much
> leading at the time, especially with integration with dedup.
>
>
>
> And, to be clear, NBU is actually working great in our environment when we
> use it for traditional client-based backups. It's only the NetApp snapshot
> integration with VMWare that's lacking, but that's where we want to go.
>
>
>
> -Adam
>
>
>
> On Thu, Oct 29, 2015 at 5:05 PM, Michael Ryder <***@gmail.com>
> wrote:
>
> I haven't heard *anything* nice about NBU, sorry.
>
>
>
> Are you able to say why they dropped TSM? The DB2 backend implementation
> is much tighter now, so database update speed has been vastly improved
> along with stability. Also implemented in the past 6 years was incremental
> block-level backup and stable dedupe, which together cut backup times down
> by 75% and at least halved the space usage.
>
>
>
> I easily peg the Ethernet and FC infrastructure during block-level
> incremental image backups.
>
>
>
> Mike
>
>
>
> On Thursday, October 29, 2015, Adam Levin <***@gmail.com> wrote:
>
> Thanks, Mike. We were a TSM shop until we switched to NBU 6 years ago. I
> don't think they're looking to go back, but there's no question it scales.
> It's probably the biggest, baddest backup system there is. :)
>
>
>
> -Adam
>
>
>
> On Thu, Oct 29, 2015 at 12:11 PM, Michael Ryder <***@gmail.com>
> wrote:
>
> Adam
>
>
>
> Have you looked at Tivoli (Now Spectrum Protect)? My environment isn't as
> large as yours, but from what I've heard it scales well. We use it to
> backup both Linux and Windows VMs and physical hosts.
>
>
>
> The add-on Tivoli Data Protector for Virtual Environments is the magic
> smoke that adds features like block-level incremental backups, instant
> restores and other goodies.
>
>
>
>
> https://www-01.ibm.com/support/knowledgecenter/mobile/#!/SS8TDQ_7.1.3/ve.user/c_ve_overview_tsmnode.html
>
>
>
> Scaling out involving multiple data movers per vCenter is possible, and
> there are many ways to tune and filter how they operate, such as max number
> of simultaneous jobs per datastore, per esxi host, etc.
>
>
>
> Let me know if you have any questions- I'll try to answer them or can put
> you in touch with a great Tivoli community where you can find folks with
> larger installations, more answers etc.
>
>
>
> Mike
>
>
>
> On Tuesday, October 27, 2015, Adam Levin <***@gmail.com> wrote:
>
> Hey all, I've got a question about how you backup your VM environment?
>
>
>
> We're using vSphere 5.5 and NetApp NAS for datastores. We have about 75
> 8TB datastores, and about 2500 VMs. The VMs are not distributed evenly
> because of service levels associated with the datastores.
>
>
>
> We're being told by various backup vendors that the main issue is the
> number of VMs per datastore, because quiescing lots of VMs and then taking
> a datastore snapshot can produce long wait times when rolling the qieusced
> images back in to the running VM.
>
>
>
> Our VM team is telling us that there is no current tool to manage the
> number of VMs per datastore, just the size of the datastores.
>
>
>
> So I'm curious what some common methods are for managing backups in a
> large VM environment. Do you just use agents and backup from within the
> VM? Do you bother doing app consistent backups of the VMs or just snapshot
> the datastore and not worry about consistency? Have you found a product
> that manages qiuescing and snapshots in a reasonable way?
>
>
>
> We've looked at NetBackup, Commvault and Veeam so far.
>
>
>
> Thanks,
>
> -Adam
>
>
>
>
>
>
>
>
>
Michael Ryder
2015-10-30 17:12:48 UTC
Permalink
Wait a minute... vMotion is allowed to move VMs around, but then your
backup software is expected to put them right back where they were found?
Why would it matter? That... almost doesn't sound fair!

By default, if vMotion is enabled, vCenter will determine where to place
the VM when it is restored, based on performance of the VMware cluster.

On Fri, Oct 30, 2015 at 8:33 AM, Adam Levin <***@gmail.com> wrote:

> One of the issues we've been seeing with using NBU with the NetApp is that
> while NBU can orchestrate the snapshot on the NetApp, it doesn't catalog
> which VMs are on which datastores, so if a vMotion occurs and moves a VM,
> and then a restore is required, someone has to go hunt for it. I have not
> experienced this myself, because I'm not operationally in charge of the
> restore process, but I'm told it's a pain. :)
>
> -Adam
>
> On Fri, Oct 30, 2015 at 8:29 AM, Jim Ennis <***@ucf.edu> wrote:
>
>> We use Netbackup to do API backups of 95% of our virtual machines
>> (VMWARE) down here, in part since our storage backend is Netapp based and
>> has some support for snapshotting/backups.
>>
>>
>>
>> We add an attribute to the VMWARE host description that allows it to use
>> the API backup and it all happens in the background and works well.
>>
>>
>>
>> Jim Ennis
>>
>> Director Systems and Operations
>>
>> University of Central Florida
>>
>> 12716 Pegasus Drive
>>
>> CSB 308
>>
>> Orlando, FL 32816
>>
>>
>>
>> E-mail: ***@ucf.edu
>>
>> Voice: 407-823-1701
>>
>> Fax: 407-882-9017
>>
>>
>>
>> *From:* tech-***@lists.lopsa.org [mailto:tech-***@lists.lopsa.org]
>> *On Behalf Of *Adam Levin
>> *Sent:* Thursday, October 29, 2015 5:28 PM
>> *To:* Michael Ryder <***@gmail.com>
>> *Cc:* ***@lists.lopsa.org
>> *Subject:* Re: [lopsa-tech] backing up your VMs
>>
>>
>>
>> Honestly, NBU isn't a bad product. It's been around a long time, and if
>> you use it in the traditional backup and recovery sense, it works quite
>> well. Its database doesn't scale the way DB2 does, naturally.
>>
>>
>>
>> When we were rebuilding our datacenters 6 years ago, TSM was in the
>> running. We got the new version in to test, and a specialist from IBM came
>> in to configure it in our lab. By the end of the five days, it still
>> wasn't working. NBU was set up and running in about 2 hours. It's not
>> nearly as powerful as a data management platform, and can't scale, but it's
>> ridiculously simple compared to what TSM was.
>>
>>
>>
>> The other issue was that at the time, we were looking at dedupe tech and
>> moving away from tape. TSM just wasn't there yet. NBU was pretty much
>> leading at the time, especially with integration with dedup.
>>
>>
>>
>> And, to be clear, NBU is actually working great in our environment when
>> we use it for traditional client-based backups. It's only the NetApp
>> snapshot integration with VMWare that's lacking, but that's where we want
>> to go.
>>
>>
>>
>> -Adam
>>
>>
>>
>> On Thu, Oct 29, 2015 at 5:05 PM, Michael Ryder <***@gmail.com>
>> wrote:
>>
>> I haven't heard *anything* nice about NBU, sorry.
>>
>>
>>
>> Are you able to say why they dropped TSM? The DB2 backend implementation
>> is much tighter now, so database update speed has been vastly improved
>> along with stability. Also implemented in the past 6 years was incremental
>> block-level backup and stable dedupe, which together cut backup times down
>> by 75% and at least halved the space usage.
>>
>>
>>
>> I easily peg the Ethernet and FC infrastructure during block-level
>> incremental image backups.
>>
>>
>>
>> Mike
>>
>>
>>
>> On Thursday, October 29, 2015, Adam Levin <***@gmail.com> wrote:
>>
>> Thanks, Mike. We were a TSM shop until we switched to NBU 6 years ago.
>> I don't think they're looking to go back, but there's no question it
>> scales. It's probably the biggest, baddest backup system there is. :)
>>
>>
>>
>> -Adam
>>
>>
>>
>> On Thu, Oct 29, 2015 at 12:11 PM, Michael Ryder <***@gmail.com>
>> wrote:
>>
>> Adam
>>
>>
>>
>> Have you looked at Tivoli (Now Spectrum Protect)? My environment isn't
>> as large as yours, but from what I've heard it scales well. We use it to
>> backup both Linux and Windows VMs and physical hosts.
>>
>>
>>
>> The add-on Tivoli Data Protector for Virtual Environments is the magic
>> smoke that adds features like block-level incremental backups, instant
>> restores and other goodies.
>>
>>
>>
>>
>> https://www-01.ibm.com/support/knowledgecenter/mobile/#!/SS8TDQ_7.1.3/ve.user/c_ve_overview_tsmnode.html
>>
>>
>>
>> Scaling out involving multiple data movers per vCenter is possible, and
>> there are many ways to tune and filter how they operate, such as max number
>> of simultaneous jobs per datastore, per esxi host, etc.
>>
>>
>>
>> Let me know if you have any questions- I'll try to answer them or can put
>> you in touch with a great Tivoli community where you can find folks with
>> larger installations, more answers etc.
>>
>>
>>
>> Mike
>>
>>
>>
>> On Tuesday, October 27, 2015, Adam Levin <***@gmail.com> wrote:
>>
>> Hey all, I've got a question about how you backup your VM environment?
>>
>>
>>
>> We're using vSphere 5.5 and NetApp NAS for datastores. We have about 75
>> 8TB datastores, and about 2500 VMs. The VMs are not distributed evenly
>> because of service levels associated with the datastores.
>>
>>
>>
>> We're being told by various backup vendors that the main issue is the
>> number of VMs per datastore, because quiescing lots of VMs and then taking
>> a datastore snapshot can produce long wait times when rolling the qieusced
>> images back in to the running VM.
>>
>>
>>
>> Our VM team is telling us that there is no current tool to manage the
>> number of VMs per datastore, just the size of the datastores.
>>
>>
>>
>> So I'm curious what some common methods are for managing backups in a
>> large VM environment. Do you just use agents and backup from within the
>> VM? Do you bother doing app consistent backups of the VMs or just snapshot
>> the datastore and not worry about consistency? Have you found a product
>> that manages qiuescing and snapshots in a reasonable way?
>>
>>
>>
>> We've looked at NetBackup, Commvault and Veeam so far.
>>
>>
>>
>> Thanks,
>>
>> -Adam
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
>
Adam Levin
2015-10-30 17:34:47 UTC
Permalink
No no, I'm saying that when someone needs a restore, it can be difficult to
find the VM if it moved.

In other words, if I have backups for the past month and the VM was on
datastore 1, but then last week it moved to datastore 2, then when the
restore is needed they'll grab it from datastore 2, but if they don't get
back what they need, and have to go farther back, there's no record that it
was on datastore 1 so they have to go looking for it, which takes extra
time (admittedly not a lot, but apparently it's a common enough problem
that they want a different solution -- and it has to be a vendor, not an
in-house script (don't get me started)).

-Adam

On Fri, Oct 30, 2015 at 1:12 PM, Michael Ryder <***@gmail.com> wrote:

> Wait a minute... vMotion is allowed to move VMs around, but then your
> backup software is expected to put them right back where they were found?
> Why would it matter? That... almost doesn't sound fair!
>
> By default, if vMotion is enabled, vCenter will determine where to place
> the VM when it is restored, based on performance of the VMware cluster.
>
> On Fri, Oct 30, 2015 at 8:33 AM, Adam Levin <***@gmail.com> wrote:
>
>> One of the issues we've been seeing with using NBU with the NetApp is
>> that while NBU can orchestrate the snapshot on the NetApp, it doesn't
>> catalog which VMs are on which datastores, so if a vMotion occurs and moves
>> a VM, and then a restore is required, someone has to go hunt for it. I
>> have not experienced this myself, because I'm not operationally in charge
>> of the restore process, but I'm told it's a pain. :)
>>
>> -Adam
>>
>> On Fri, Oct 30, 2015 at 8:29 AM, Jim Ennis <***@ucf.edu> wrote:
>>
>>> We use Netbackup to do API backups of 95% of our virtual machines
>>> (VMWARE) down here, in part since our storage backend is Netapp based and
>>> has some support for snapshotting/backups.
>>>
>>>
>>>
>>> We add an attribute to the VMWARE host description that allows it to use
>>> the API backup and it all happens in the background and works well.
>>>
>>>
>>>
>>> Jim Ennis
>>>
>>> Director Systems and Operations
>>>
>>> University of Central Florida
>>>
>>> 12716 Pegasus Drive
>>>
>>> CSB 308
>>>
>>> Orlando, FL 32816
>>>
>>>
>>>
>>> E-mail: ***@ucf.edu
>>>
>>> Voice: 407-823-1701
>>>
>>> Fax: 407-882-9017
>>>
>>>
>>>
>>> *From:* tech-***@lists.lopsa.org [mailto:
>>> tech-***@lists.lopsa.org] *On Behalf Of *Adam Levin
>>> *Sent:* Thursday, October 29, 2015 5:28 PM
>>> *To:* Michael Ryder <***@gmail.com>
>>> *Cc:* ***@lists.lopsa.org
>>> *Subject:* Re: [lopsa-tech] backing up your VMs
>>>
>>>
>>>
>>> Honestly, NBU isn't a bad product. It's been around a long time, and if
>>> you use it in the traditional backup and recovery sense, it works quite
>>> well. Its database doesn't scale the way DB2 does, naturally.
>>>
>>>
>>>
>>> When we were rebuilding our datacenters 6 years ago, TSM was in the
>>> running. We got the new version in to test, and a specialist from IBM came
>>> in to configure it in our lab. By the end of the five days, it still
>>> wasn't working. NBU was set up and running in about 2 hours. It's not
>>> nearly as powerful as a data management platform, and can't scale, but it's
>>> ridiculously simple compared to what TSM was.
>>>
>>>
>>>
>>> The other issue was that at the time, we were looking at dedupe tech and
>>> moving away from tape. TSM just wasn't there yet. NBU was pretty much
>>> leading at the time, especially with integration with dedup.
>>>
>>>
>>>
>>> And, to be clear, NBU is actually working great in our environment when
>>> we use it for traditional client-based backups. It's only the NetApp
>>> snapshot integration with VMWare that's lacking, but that's where we want
>>> to go.
>>>
>>>
>>>
>>> -Adam
>>>
>>>
>>>
>>> On Thu, Oct 29, 2015 at 5:05 PM, Michael Ryder <***@gmail.com>
>>> wrote:
>>>
>>> I haven't heard *anything* nice about NBU, sorry.
>>>
>>>
>>>
>>> Are you able to say why they dropped TSM? The DB2 backend
>>> implementation is much tighter now, so database update speed has been
>>> vastly improved along with stability. Also implemented in the past 6 years
>>> was incremental block-level backup and stable dedupe, which together cut
>>> backup times down by 75% and at least halved the space usage.
>>>
>>>
>>>
>>> I easily peg the Ethernet and FC infrastructure during block-level
>>> incremental image backups.
>>>
>>>
>>>
>>> Mike
>>>
>>>
>>>
>>> On Thursday, October 29, 2015, Adam Levin <***@gmail.com> wrote:
>>>
>>> Thanks, Mike. We were a TSM shop until we switched to NBU 6 years ago.
>>> I don't think they're looking to go back, but there's no question it
>>> scales. It's probably the biggest, baddest backup system there is. :)
>>>
>>>
>>>
>>> -Adam
>>>
>>>
>>>
>>> On Thu, Oct 29, 2015 at 12:11 PM, Michael Ryder <***@gmail.com>
>>> wrote:
>>>
>>> Adam
>>>
>>>
>>>
>>> Have you looked at Tivoli (Now Spectrum Protect)? My environment isn't
>>> as large as yours, but from what I've heard it scales well. We use it to
>>> backup both Linux and Windows VMs and physical hosts.
>>>
>>>
>>>
>>> The add-on Tivoli Data Protector for Virtual Environments is the magic
>>> smoke that adds features like block-level incremental backups, instant
>>> restores and other goodies.
>>>
>>>
>>>
>>>
>>> https://www-01.ibm.com/support/knowledgecenter/mobile/#!/SS8TDQ_7.1.3/ve.user/c_ve_overview_tsmnode.html
>>>
>>>
>>>
>>> Scaling out involving multiple data movers per vCenter is possible, and
>>> there are many ways to tune and filter how they operate, such as max number
>>> of simultaneous jobs per datastore, per esxi host, etc.
>>>
>>>
>>>
>>> Let me know if you have any questions- I'll try to answer them or can
>>> put you in touch with a great Tivoli community where you can find folks
>>> with larger installations, more answers etc.
>>>
>>>
>>>
>>> Mike
>>>
>>>
>>>
>>> On Tuesday, October 27, 2015, Adam Levin <***@gmail.com> wrote:
>>>
>>> Hey all, I've got a question about how you backup your VM environment?
>>>
>>>
>>>
>>> We're using vSphere 5.5 and NetApp NAS for datastores. We have about 75
>>> 8TB datastores, and about 2500 VMs. The VMs are not distributed evenly
>>> because of service levels associated with the datastores.
>>>
>>>
>>>
>>> We're being told by various backup vendors that the main issue is the
>>> number of VMs per datastore, because quiescing lots of VMs and then taking
>>> a datastore snapshot can produce long wait times when rolling the qieusced
>>> images back in to the running VM.
>>>
>>>
>>>
>>> Our VM team is telling us that there is no current tool to manage the
>>> number of VMs per datastore, just the size of the datastores.
>>>
>>>
>>>
>>> So I'm curious what some common methods are for managing backups in a
>>> large VM environment. Do you just use agents and backup from within the
>>> VM? Do you bother doing app consistent backups of the VMs or just snapshot
>>> the datastore and not worry about consistency? Have you found a product
>>> that manages qiuescing and snapshots in a reasonable way?
>>>
>>>
>>>
>>> We've looked at NetBackup, Commvault and Veeam so far.
>>>
>>>
>>>
>>> Thanks,
>>>
>>> -Adam
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>
John Stoffel
2015-10-30 18:31:54 UTC
Permalink
>>>>> "Adam" == Adam Levin <***@gmail.com> writes:

Adam> In other words, if I have backups for the past month and the VM
Adam> was on datastore 1, but then last week it moved to datastore 2,
Adam> then when the restore is needed they'll grab it from datastore
Adam> 2, but if they don't get back what they need, and have to go
Adam> farther back, there's no record that it was on datastore 1 so
Adam> they have to go looking for it, which takes extra time
Adam> (admittedly not a lot, but apparently it's a common enough
Adam> problem that they want a different solution -- and it has to be
Adam> a vendor, not an in-house script (don't get me started)). 

Commvault v9 (and older as I recall...) also has this problem. But
even worse... it expires indexes quickly and then makes it a total
pain to browse and find what to restore that you just end up crying
sometimes.

I miss how easy Networker was to browse things...

but now we're getting off the topic a bit. In general, backups are
hard, restores suck and people hate paying money to do it well. At
least until they get burned, and then it's more pitchforks and tar and
feathering of us than any meaningful change.

John
_______________________________________________
Tech mailing list
***@lists.lopsa.org
https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
This list provided by the League of Professional System Administrators
http://lopsa.org/
Skylar Thompson
2015-10-29 16:36:57 UTC
Permalink
I'll second the recommendation for TSM. We don't use TDP VE but I can
definitely vouch for the scalability. We're use it to track ~1 billion
onsite file versions and 5PB on tape, with an equivalent number offsite.
Our daily backup volumes vary from 10TB all the way to 60TB. For a storage
system of its size, the problems have been relatively minor and the support
excellent.

Skylar

On Thu, Oct 29, 2015 at 9:11 AM, Michael Ryder <***@gmail.com> wrote:

> Adam
>
> Have you looked at Tivoli (Now Spectrum Protect)? My environment isn't as
> large as yours, but from what I've heard it scales well. We use it to
> backup both Linux and Windows VMs and physical hosts.
>
> The add-on Tivoli Data Protector for Virtual Environments is the magic
> smoke that adds features like block-level incremental backups, instant
> restores and other goodies.
>
>
> https://www-01.ibm.com/support/knowledgecenter/mobile/#!/SS8TDQ_7.1.3/ve.user/c_ve_overview_tsmnode.html
>
> Scaling out involving multiple data movers per vCenter is possible, and
> there are many ways to tune and filter how they operate, such as max number
> of simultaneous jobs per datastore, per esxi host, etc.
>
> Let me know if you have any questions- I'll try to answer them or can put
> you in touch with a great Tivoli community where you can find folks with
> larger installations, more answers etc.
>
> Mike
>
> On Tuesday, October 27, 2015, Adam Levin <***@gmail.com> wrote:
>
>> Hey all, I've got a question about how you backup your VM environment?
>>
>> We're using vSphere 5.5 and NetApp NAS for datastores. We have about 75
>> 8TB datastores, and about 2500 VMs. The VMs are not distributed evenly
>> because of service levels associated with the datastores.
>>
>> We're being told by various backup vendors that the main issue is the
>> number of VMs per datastore, because quiescing lots of VMs and then taking
>> a datastore snapshot can produce long wait times when rolling the qieusced
>> images back in to the running VM.
>>
>> Our VM team is telling us that there is no current tool to manage the
>> number of VMs per datastore, just the size of the datastores.
>>
>> So I'm curious what some common methods are for managing backups in a
>> large VM environment. Do you just use agents and backup from within the
>> VM? Do you bother doing app consistent backups of the VMs or just snapshot
>> the datastore and not worry about consistency? Have you found a product
>> that manages qiuescing and snapshots in a reasonable way?
>>
>> We've looked at NetBackup, Commvault and Veeam so far.
>>
>> Thanks,
>> -Adam
>>
>>
>>
> _______________________________________________
> Tech mailing list
> ***@lists.lopsa.org
> https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
> This list provided by the League of Professional System Administrators
> http://lopsa.org/
>
>
Steven Miano
2015-10-29 16:55:07 UTC
Permalink
Be aware that the cost of TSM may be structured much differently from
Veeam. Paying for consumption with TSM is fairly costly (anecdotal).

The offset for paying by sockets on the hypervisors is that you will need
to also consume resources in the backup infrastructure (per vCenter in our
topology).

Using Veeam you can choose how you create your backup jobs and don't need
to snapshot an entire datastore, from the jobs we run - the virtual
machines get snapshotted -> CBT read/backed up -> snapshot removal.

Our current sizing is:
Processing speed: 42 MB/s
Source VMs size: 78.4 TB
Full backups: 155.8 TB
Restore points: 49.7 TB

This includes: 1 backup server, and ~9 Veeam Proxies across 3 vCenters,
using datamovers by Veeam/Exagrid.



On Thu, Oct 29, 2015 at 12:36 PM, Skylar Thompson <***@gmail.com
> wrote:

> I'll second the recommendation for TSM. We don't use TDP VE but I can
> definitely vouch for the scalability. We're use it to track ~1 billion
> onsite file versions and 5PB on tape, with an equivalent number offsite.
> Our daily backup volumes vary from 10TB all the way to 60TB. For a storage
> system of its size, the problems have been relatively minor and the support
> excellent.
>
> Skylar
>
> On Thu, Oct 29, 2015 at 9:11 AM, Michael Ryder <***@gmail.com>
> wrote:
>
>> Adam
>>
>> Have you looked at Tivoli (Now Spectrum Protect)? My environment isn't
>> as large as yours, but from what I've heard it scales well. We use it to
>> backup both Linux and Windows VMs and physical hosts.
>>
>> The add-on Tivoli Data Protector for Virtual Environments is the magic
>> smoke that adds features like block-level incremental backups, instant
>> restores and other goodies.
>>
>>
>> https://www-01.ibm.com/support/knowledgecenter/mobile/#!/SS8TDQ_7.1.3/ve.user/c_ve_overview_tsmnode.html
>>
>> Scaling out involving multiple data movers per vCenter is possible, and
>> there are many ways to tune and filter how they operate, such as max number
>> of simultaneous jobs per datastore, per esxi host, etc.
>>
>> Let me know if you have any questions- I'll try to answer them or can put
>> you in touch with a great Tivoli community where you can find folks with
>> larger installations, more answers etc.
>>
>> Mike
>>
>> On Tuesday, October 27, 2015, Adam Levin <***@gmail.com> wrote:
>>
>>> Hey all, I've got a question about how you backup your VM environment?
>>>
>>> We're using vSphere 5.5 and NetApp NAS for datastores. We have about 75
>>> 8TB datastores, and about 2500 VMs. The VMs are not distributed evenly
>>> because of service levels associated with the datastores.
>>>
>>> We're being told by various backup vendors that the main issue is the
>>> number of VMs per datastore, because quiescing lots of VMs and then taking
>>> a datastore snapshot can produce long wait times when rolling the qieusced
>>> images back in to the running VM.
>>>
>>> Our VM team is telling us that there is no current tool to manage the
>>> number of VMs per datastore, just the size of the datastores.
>>>
>>> So I'm curious what some common methods are for managing backups in a
>>> large VM environment. Do you just use agents and backup from within the
>>> VM? Do you bother doing app consistent backups of the VMs or just snapshot
>>> the datastore and not worry about consistency? Have you found a product
>>> that manages qiuescing and snapshots in a reasonable way?
>>>
>>> We've looked at NetBackup, Commvault and Veeam so far.
>>>
>>> Thanks,
>>> -Adam
>>>
>>>
>>>
>> _______________________________________________
>> Tech mailing list
>> ***@lists.lopsa.org
>> https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
>> This list provided by the League of Professional System Administrators
>> http://lopsa.org/
>>
>>
>
> _______________________________________________
> Tech mailing list
> ***@lists.lopsa.org
> https://lists.lopsa.org/cgi-bin/mailman/listinfo/tech
> This list provided by the League of Professional System Administrators
> http://lopsa.org/
>
>


--
Miano, Steven M.
http://stevenmiano.com
Continue reading on narkive:
Loading...