[SLL] Rant: ATA-over-Ethernet 0x88a2
Jesse Keating
jkeating at j2solutions.net
Thu May 5 15:02:03 PDT 2005
On Thu, 2005-05-05 at 15:12 -0700, Andrew Sweger wrote:
> > Second) Without some high level enclosure services, any type of
> array
> > you make out of these disks on the network will be extremely
> fragile.
> > If one disk goes down, your array will stop responding at all until
> you
> > completely power off and power back on minus the dead disk. Or if
> you
> > even have network congestion or a network failure, it will bring the
> > whole thing crumbling down.
>
> I don't it. The problem you describe is not a feature of the array or
> network but of the application built on top of it. When my uplink
> router
> goes nuts, I don't have to reboot all the computers in the house, nor
> do I
> lose the use of my local network. The same can be done for this type
> of
> storage design.
But it isn't the application on top of it. The problem is kernel level
MD code and SCSI code. With these network type devices, there is no
signal that says "Hey, I'm gone, stop looking for me. Fail the array
and move on please.". The machine that is managing all these network
disks and grouping them into a logical unit is the system that will have
to be rebooted. If that system shares out via SMB or NFS the clients
will just timeout until the SMB/NFS server comes back online. No big
deal there. But is it acceptable to reboot your master box every time
you have a disk failure or a network failure of some kind?
> Also, most people would most likely deploy the AoE array on a separate
> network segment ("See that wire there? That's Ethernet. But we don't
> call
> it that. It's the storage system connection." As in, who cares what
> the
> link to the storage is (IDE, USB, SCSI, I2O, FC, ATM, etc.) as long as
> it provides the storage abstraction required by the application.
Yes and no. The signals and calls that the storage link provide is what
matters.
> > Third) Network overhead. When you've got one or two of these disks
> > attached to even a gigabit network you're OK. When you've got 3TB
> worth
> > of 40~80 gig drives sitting on your network, your network overhead
> is
> > going to be huge. Thats just a ton of data to try and break up and
> send
> > out to all those disks. Even w/out the TCP/IP overhead, you've GOT
> to
> > have some sort of delivery assurance mechanism to ensure your data
> gets
> > where it is going. Thats going to require a bit of cross talk for
> each
> > write/read operation. If you're doing some sort of RAID level ( I
> can't
> > imagine NOT ) then you're going to have metadata being shifted
> around a
> > bunch as well.
>
> A bad RAID strategy could melt the network with a lot of spindles. But
> the
> same can be said for other storage topologies (I would not stripe
> across
> 15 SCSI disks if that results in swamping the channel). An GFS AoE
> array
> with multiple initiators could be a serious problem. But that's a
> corner
> case far from the problems I see AoE helping to solve.
>
> > So now you're looking at maybe a dedicated gigabit network JUST for
> the
> > storage, and even that won't scale all that well with a LOT of small
> > devices. Just too many devices to talk to.
>
> The head end doesn't talk to all the devices at once. Only those that
> it
> needs data from. LVM knows which device it needs to retrieve a block
> from
> and won't bother talking to any of the other devices (same goes for
> other
> storage networks).
Right, but unless your stripe block size is rather large, most likely
the data you're after is stored on more than one physical unit, and
there for it will need to gather it from each of those logical units at
the same time. Most often you stripe fairly small so that your files
can be evenly split across your physical units for maximum speed. If
instead you did LVM where each drive was added to the maximum storage,
and as one disk fills, then it moves to the next disk, then you could
avoid network saturation. But at this point you don't have much
redundancy and any disk failure takes out the file system. Not very
cool.
--
Jesse Keating RHCE (geek.j2solutions.net)
Fedora Legacy Team (www.fedoralegacy.org)
GPG Public Key (geek.j2solutions.net/jkeating.j2solutions.pub)
Was I helpful? Let others know:
http://svcs.affero.net/rm.php?r=jkeating
More information about the linux-list
mailing list