spacer Hacker Newsspacer new | comments | ask | jobs | submitlogin
spacer
New High I/O EC2 Instance Type - hi1.4xlarge - 2 TB of SSD-Backed Storage (aws.typepad.com)
192 points by jeffbarr 53 days ago | comments


spacer
spacer
bravura 53 days ago | link

What are good use-cases for on-demand high I/O servers?

At $3.10/hr, these instances work out to $2k/mo. There are probably many more cost-effective options if you want a 2TB SSD server.

Since the benefit of using EC2 is that you can provision instances elastically, what are the sorts of scenarios in which one needs to provision high I/O servers elastically?

[edit: A few minutes of Googling, and I can't find any dedicated servers with 2 TB of SSD.]

-----

spacer
spacer
mbreese 53 days ago | link

I do genome mapping where our indexes won't entirely fit in memory. It would be very handy to be able to spin up a few of these instances, load the indexes from an EBS volume onto the local SSDs, then run for a couple of hours or so. This is a very I/O intensive job that we need to run about once a week, but then the rest of the time could be idle.

SSDs would make our jobs run significantly faster. So much so that we've toyed with the idea of adding SSDs to our in-house cluster, but couldn't quite justify the costs. This might actually shift the cost savings to get our lab to migrate to EC2 as opposed to our in-house or university cluster.

-----

spacer
spacer
Gmo 53 days ago | link

We are also facing the same kind of problems in my company, regarding genome assembly and mapping.

That's definitely something we will look into :)

-----

spacer
spacer
revorad 53 days ago | link

I'm working on a data visualisation app, which is getting a lot of interest from biologists and bioinformaticians. I'd like to learn a bit more about your work. Can I email you somewhere? Or please drop me an email at hrishi@prettygraph.com. Thanks!

-----

spacer
spacer
zurn 52 days ago | link

How long would it take to read the 2 TB from EBS?

-----

spacer
spacer
ybother 52 days ago | link

based on this, it shouldn't take more than half a day under the worst circumstances (single EBS drive with crappy performance), and if you Raid together enough drives, you can do it in about an hour. Correct me if I'm wrong, but you pay for EBS by size, not physical disks, so the more you can split up your data in blocks, the more performance you're going to get.

stu.mp/2009/12/disk-io-and-throughput-benchmarks-on-a...

-----

spacer
spacer
cperciva 52 days ago | link

2 TB in one hour is about 4.9 Gbps. Cluster Computes have 10 GbE internally, but I'd be surprised if they have that all the way out to EBS.

-----

spacer
spacer
lonnyk 53 days ago | link

Any chance we could get an example of the data set and the calculation that needs to be done?

-----

spacer
spacer
mbreese 52 days ago | link

You can get some of the data from the 1000 genomes project directly from Amazon, so you don't need to pay to download it. There's about 200TB of data there (so far).

aws.amazon.com/1000genomes/

What I'm working on is mapping those short sequences (50-75 bases) to the genome and then either looking for mutations or expression levels (how many of those reads map to a particular location). There are a couple of ways to do the mapping, but most these days use either a big hash table or a Burrow Wheeler transform.

en.wikipedia.org/wiki/Burrows%E2%80%93Wheeler_transfo...

And that's all just to get the data that you can then do something else with (gene expression, variation modeling, etc...).

-----

spacer
spacer
Gmo 53 days ago | link

Well, the raw output of a typical so-called "next gen sequencing" (which are actually very current gen) machine is around 1TB (at least, the ones we used here).

This is raw file though, so once processed (but not yet analyzed) I believe we have sizes around 50 to 100GB (but that's not really what I work on so don't quote me fully on this).

The next steps vary on what you want to do exactly, but it usually involves alignment of base pairs (basically, trying to tie together by their ends sequences of DNA but seeing if they "fit").

-----

spacer
spacer
Gmo 53 days ago | link

I said by their end but it can also be the full sequence, depending on the job

-----

spacer
spacer
micro_cam 52 days ago | link

Essentially you sequence tons of short bits of dna and then either fit them together (assemble) or fit them to a reference (align). You can find example data sets in the Short Read Archive: www.ncbi.nlm.nih.gov/sra/

Cloudburst (a hadoop based aligner) has a good description of an algorithm: sourceforge.net/apps/mediawiki/cloudburst-bio/index.p... Though they can get much more sophisticated and there are a number of open and closed source implementations...I only link this one because of the quality of the figure.

The data sets we work with in my group can be up 400gb's of compressed text for the reads from a single individual.

Another example from biology with a similar computational profile would be searching through a hugh number of mass spectrometer outputs to identify the components in a new sample.

-----

spacer
spacer
saurik 53 days ago | link

That is only one of the benefits of EC2. If you are not using elasticity, then you have to factor in the reserved instance pricing, which drops the prices down by 71% (as in, to 29% of the list price; and I mean even if you include the up-front cost: that's overall savings). You, like most people who comment on the price of EC2, do not seem to be taking this into consideration. :(

-----

spacer
spacer
joe_bloggs 53 days ago | link

What are some of the other benefits, other than elasticity or on-demand ness, and not having to spend a huge amount upfront?

-----

spacer
spacer
saurik 53 days ago | link

One example, which stems from "on-demand ness" (which you added: I was only responding to "elasticity"), is that you can do "test runs" of migrations and deployments without even thinking about it: you can rent, for just an hour, a setup identical to your existing one, often based on a consistent and atomic snapshot of your production machine, so you can try something "likely correct but possibly horribly wrong"; then, if it works, rather than replicating the change on your "real" machine, you can just cut over to the new one and shut down the old.

Way too many people seem to believe that the only benefit of "on-demand" is "elasticity", and then make bogus arguments here that "if you can plan your traffic you shouldn't be using EC2": EC2 is cheaper than people like to claim (and is in fact quite price competitive) and your ability to turn on/off machines on a whim changes the way you look at hardware so drastically that, in all honesty, it makes traditional ways of dealing with hardware seem draconian and only worth putting up with if you are dealing with some weird corner case or have horribly special requirements.

-----

spacer
spacer
moe 52 days ago | link

EC2 is cheaper than people like to claim

How often do we have to repeat this argument here on HN.

When running 24/7 then EC2 instances are 2-3x more expensive than the cheapest equivalent rent-server options and orders of magnitude more expensive than the physical hardware if you buy it.

The numbers have been recited countless times, I'm not digging them up yet again.

So, no, EC2 is not cost effective for steady loads at the mid-range. EC2 shines at the very low and the very high end and in specific workloads, i.e. it shines when the benefits can be quantified to an amount greater than the price difference.

-----

spacer
spacer
saurik 52 days ago | link

I would be very interested in knowing what factors you are trading off for "equivalent" other than "on-demand". Many of my friends use co-location services for their businesses, and most of them purchase only on price, and their servers honestly suck: they have high latency, they are unstable, they don't have remote serial console access... they are living in a ghetto that burns tons of their time into "becoming server people".

If you find a company that has reasonable support, reliable servers, good datacenters, and the minimal features required to debug issues remotely, then you are looking at prices fairly equivalent to those offered by EC2 heavy-utilization reserved instances (and are going to end up with a similar contract length anyway). If this somehow doesn't work out: call Amazon AWS's sales division and see if you are compelling enough for them to negotiate with (they totally have a sales division, and they really do "want your business").

Regardless, your choice of quote is really bothersome: "people like to claim" that EC2 is as expensive as their on-demand list prices, and that's a fact clearly demonstrable by the person I'm responding to (who is quite clearly and obviously claiming EC2 is more expensive than it really is) and one that is not defensible as the price you should be looking at is the heavy-utilization reserved pricing; if you'd like to respond to my comment "and is in fact quite price competitive" then you should quote that and adjust your argument appropriately.

Honestly, the history of HN is not much better (as I scour around trying to find the "numbers" you claim "have been recited countless times"). It is actually difficult to find people who don't claim that Amazon EC2 is more expensive than it is; I'm almost wondering if you and I are living on different versions of the site...

"EC2 is about 10-20 times more expensive than dedicated hosting. Even if reserved instances save us 22% over 3 years, it still doesn't even come close." -- cmer

^ No, EC2 reserved instances save you 71% over 3 years.

"It costs $576/mo to run an extra large EC2 instance fulltime" -- stephenjudkins

^ No, even two years ago (before "heavy utilization reserved instances") you could drop this price by 66% to $195.84/mo.

"With EC2 prices at about $0.10 per hour, I can't imagine ever using a service with such a high premium." -- apinstein

^ Obviously: no, but the fact that this person is angry about the price of a small instance at $72/mo is quite telling; he isn't willing to go lower than $20/mo.

I found a price comparison by vladd from earlier this year, comparing a high-end VPS to EC2's largest offering, coming up with a nearly 10x difference, but the server is entirely useless: it is a consumer-level product running non-ECC RAM. Later comments claim the same hosting company has "competitively priced servers with ECC ram".

A couple months ago I found a thread that linked to a fairly detailed argument[1] stating that EC2 instances are 2-3x more expensive than a VPS. However, this person again is performing a comparison with non-ECC hardware. What damns this comparison, however, is that he is not taking advantage of 3-year reserved instances for a long-term high-end use case: his numbers seem to be based on 49% off, when he can easily get 71% off, nearly a 2x difference. <- Again, EC2 is cheaper than people like to claim.

[1] codemonkeyism.com/dark-side-virtualized-servers-cloud...

Seriously: I can't find anyone who is actually doing legitimate comparisons of Amazon's offerings. People either compare EC2 to "I spent a week of time negotiating a deal to take over a bunch of hardware from a failing company down the street" (which, for the record, will also give you a great deal on chairs and office furniture: comparing the cloud to a fire sale is inane), assume "a server is a server is a server" and find "the cheapest" option (which seems to always have unreliable RAM), or (frankly: "and") fail to take into account Amazon's reserved instance discounts.

That said, Hacker News has a really horrible search system, and I'm trying to find something kind of esoteric (as I want to search for a dollar sign, and thereby have to use proxies such as "expensive" and "cost"). I would thereby love to see an honest comparison, and am happily willing to believe that I missed it: do you have a link to such?

-----

spacer
spacer
moe 52 days ago | link

it is a consumer-level product running non-ECC RAM

Sorry to break that for you but EC2 instances are in all likelihood not running on ECC-Ram either[1]. If they had ECC-Ram then Amazon would probably prominently advertise that or at least respond when they are directly asked. If you can find a link to prove the opposite then I'll take that back.

I would thereby love to see an honest comparison, and am happily willing to believe that I missed it: do you have a link to such?

You have probably already seen any of the blog-posts I could cite here, so I'll instead just try to wrap your two claims up:

1. You claim that dedicated servers are more labor intensive (setup, hardware failures) and require more staff. This is not my experience at all. In fact the complexity and idiosyncrasies of the AWS platform are much harder to abstract in the beginning, and no less labor intensive in the long term. You're just trading one set of problems (hardware issues) for a different one (cloud issues). What you may save on the hardware management front you have to spend on adapting your application for a cloud-environment.

2. You claim that equivalent hardware to an EC2 instance (with comparable performance, good support, network, etc.) would be roughly the same price as an EC2 instance. Sorry but that is laughable, when have you last time benchmarked an EC2 instance? Even a cheap rented dedicated server (hetzner, leasweb, ovh) will normally give you twice the bang for buck on every key metric (I/O, Ram, CPU). And this quickly raises to beyond an order of magnitude when you start comparing EBS to a local array or a 256G Ram box to 256G Ram in EC2-instances. Where redundancy is a concern you can usually quite literally buy two of each and still be cheaper than EC2.

I'll say what I always say: EC2 does have its place. However for deployments in the range of 10-~50 servers you will in pretty much all cases save a lot of money by sticking with dedicated servers for the base-load. That is unless your app needs the cloud-flexibility, of course (most apps don't).

What makes you believe this flexibility would come for free anyways? As all things it comes with a price-tag, and actually quite a hefty one in this case.

[1] https://forums.aws.amazon.com/message.jspa?messageID=203167

-----

spacer
spacer
wmf 52 days ago | link

They don't advertise it because it goes without saying that servers have ECC. EC2 uses Xeons and Opterons which only support ECC. It should only be a few percent more expensive, which is nothing when you consider the premium Amazon charges (which is something I definitely agree with you about).

-----

spacer
spacer
moe 52 days ago | link

because it goes without saying

I've been dealing for long enough with hosters and hardware to know that nothing goes without saying.

Xeons and Opterons which only support ECC

Have you actually checked the CPU models they use? All I know is that amazon uses a range of different CPUs, and some Xeon/Opteron models do accept non-ECC Ram.

only be a few percent more expensive

In the past ECC DIMMs used to be significantly more expensive.

Either way, as said, I don't know whether they're using ECC Ram. I agree it should go without saying, but I don't share your optimism that it actually does. I also wonder why they explicitly mention it for their GPU-instances when it goes without saying otherwise.

-----

spacer
spacer
spartango 52 days ago | link

FYI, EC2's machines do have ECC ram. They don't advertise it, though.

-----

spacer
spacer
moe 52 days ago | link

Can you cite a source please?

A little more than an anonymous one-liner in a forum would really help my confidence...

-----

spacer
spacer
dpe82 50 days ago | link

Phoronix' benchmarking test suite has been able to detect underlying hardware: www.phoronix.com/scan.php?page=article&item=amazo...

-----

spacer
spacer
spartango 52 days ago | link

No source aside from personal experience working with them, sorry. They avoid publicizing anything about the hardware/infrastructure if possible, partly so that they can change it without customer awareness and partly because they have secret sauce in places (no, ECC isn't secret sauce).

-----

spacer
spacer
moe 52 days ago | link

Okay, I guess I'll take that as another datapoint, although honestly (no offense) I won't be basing decisions on it. ;)

-----

spacer
spacer
bluesnowmonkey 52 days ago | link

> orders of magnitude more expensive than the physical hardware

Orders of magnitude? So, like 100x or 1000x more expensive? Really?

-----

spacer
spacer
adrianpike 52 days ago | link

For us it was more like a 10x, but a few things went into that: - We found screamin' deals on hardware by snatching it up when it was available, not when we needed it. - We were at a fairly cheap colo, and haggled hard to get the cheapest rack possible. We went on a tour, noticed they had _tons_ of empty racks, and used that as some leverage. - We didn't add any additional ops overhead by having everyone responsible for ops.

We were in the 5k/month ballpark for EC2, and cut it to under $600 with a few grand outlay for hardware spread over the course of a quarter.

That said, all of my current projects are on EC2 for the provisioning flexibility, and because I hate having to drive down to a datacenter at 4AM to swap a drive.

-----

spacer
spacer
saurik 52 days ago | link

Please tell me that you were including the cost of you driving to the datacenter at 4AM to swap a drive in the cost of the hardware in your price comparisons, as if you are just talking about the cost of the raw hardware and are not including the opportunity cost of all this time and energy spent haggling and performing maintenance, then this is simply a dishonest comparison: you could easily have been spending that time doing just about anything else, from working on new features for your product to improving your sales/investment pitch to simply sleeping (which will improve the quality of all of your work the next day). I'm also curious what your replacement plan is: are you intending to do this again next year, or are you intending to wait until all of your hardware starts failing and the operational overhead starts becoming painful? Finally, "having everyone responsible for ops" might mean that you didn't have to add a new explicit hire, but you can't claim that that isn't overhead: that is now state that everyone has to keep in their head and is a liability that could cause anyone to randomly get interrupted; it might even be cheaper in the long term to hire a new, more dedicated person than to reuse existing people.

-----

spacer
spacer
moe 52 days ago | link

Yes, 10x is common. 100x is a little contrived but possible when you spec out enough Ram in EC2 instances (>2T), then compare to a physical box over 3yrs.

-----

spacer
spacer
maayank 53 days ago | link

"if it works, rather than replicating the change on your "real" machine, you can just cut over to the new one and shut down the old."

Interesting! Any other hacks enabled by EC2 and the like that make life much easier than real dedicated hardware?

-----

spacer
spacer
cwp 52 days ago | link

I do this too, not for risky migrations, but for daily updates. The app relies on a data service that's normally read-only, but gets fresh data daily. When everything was running on the bare metal, we had to schedule the updates for the middle of the night and carefully migrate the data in-place to avoid interrupting service. It would take 8 to 12 hours.

Since we moved to EC2, updating is simpler. The service runs on a micro instance. We launch a large instance to do all the CPU- and IO-intensive processing that prepares the new dataset, then launch a new micro instance, upload the dataset, run a few smoke tests, and if all is well, cut over to it. Because we're doing it off line, we were able to optimize the data processing for speed rather than low resource usage, and cut the runtime down to 45 minutes.

One thing that's often missed in discussions of IaaS versus bare metal is that the elasticity of a particular application can be affected by its design. When we were running on dedicated machines, we smoothed out the load to avoid idle hardware, but after moving to EC2 we concentrated it into spikes to get maximum productivity from running instances. In our case, spiky load is better from a business point of view, because serving data that's 1-25 hours old is better than data that's 8-32 hours old.

-----

spacer
spacer
jaylevitt 53 days ago | link

There are also the network benefits (pun intended). If the rest of your app does benefit from elasticity, you've had to choose betwen:

1. Keeping your app on EC2 and working around the lack of high I/O options 2. Keeping your app somewhere else and working around the lack of EC2-style elasticity

In TFA, Netflix had chosen #1, and they used to run an extra memcached layer + I/O on 48 instances. They were able to bring this down to 15 I/O instances with no intervening cache, and lower overall latency.

That said, I'd guess the on-demand hi1.xlarge won't get a lot of usage; I imagine they offer it just for orthogonality's sake (all other instances are available both on-demand and reserved), plus the ability to try before you buy.

What's really exciting is that Amazon clearly recognizes their lack of good I/O solutions. Maybe we'll see a whole range of options stem out of this... one can hope.

-----

spacer
spacer
XERQ 53 days ago | link

[edit: A few minutes of Googling, and I can't find any dedicated servers with 2 TB of SSD.]

I am the founder of SSD Nodes, Inc. (www.ssdnodes.com) and we offer lower-sized SSD-backed storage in addition to custom cloud and dedicated plans that range from 1-12TB of local SSD storage, at comparable performance. [/plug]

-----

spacer
spacer
corin_ 53 days ago | link

Given your "1-12tb" range doesn't list prices online, can you tell us whether your prices are comparable? At $1249/month for 2x 200GB SSD, it seems unlikely, but maybe I'm wrong?

-----

spacer
spacer
XERQ 53 days ago | link

Typically clients requiring that much SSD-backed storage have performance targets and a very specific workload, so this affects options along with pricing. With that said, ballpark for what Amazon is advertising is comparable to our pricing, except with us you are _guaranteed_ the resources whereas they are using a multi-tenant environment (you're not using the only instance on their host and your I/O is influenced by everyone else using that host).

EDIT: Downvoted? Please offer your point of view.

EDIT2: Sample pricing offered in comment below.

-----

spacer
spacer
corin_ 53 days ago | link

I haven't downvoted (and in fact couldn't if I wanted to since you replied to me), but personally my issue with your comment is that you're being as vague as your website's prices. Would be more interesting and relevant if you actually gave a price for a box comparable to the AWS specs.

-----

spacer
spacer
XERQ 53 days ago | link

I apologize, here's a sample pricing configuration:

8 x 3.4GHz E3-1270

11

gipoco.com is neither affiliated with the authors of this page nor responsible for its contents. This is a safe-cache copy of the original web site.